Survey: Choreography vs Orchestration in Microservice Architecture
(The content below are mostly summarized and quoted from materials in reference except the “My Thoughts” part)
In a microservices architecture it is not uncommon to
encounter services which are long running and stretch across the boundary of
individual microservices. There are 2 major architectures to implement such
services: Orchestration and Choreography.
Orchestration
Main idea: Have a
single god service to control the workflow by calling other services’ APIs
Other
characteristics:
Business
workflow is in one place, and it’s easy to maintain, manage and optimize.
Easy
to monitor and debug the business workflow by orchestrator’s log.
Error
handling is straight forward. Each service can just propagate errors through
orchestrator.
Possible to apply distributed transaction such as 2PC
Even
though we can call microservices’ APIs asynchronously inside orchestrator, the
orchestrator’s function still needs to wait for all APIs to be finished which
could occupy system resources such as memory and thread for a long time.
The orchestrator centrally
manages the resiliency of the workflow and it can become a single point of
failure.
Choreography
Main idea: No single god service, only a central
event bus; Each service listens to events from central event bus, handling some
events and publishes some other events to event bus which could be handled by
other services
Other
characteristics:
Most companies adopt an event-driven architecture as part of
their evolution from a monolith to microservices, and had a need for the
scaling that EDA provides.
Having well-defined boundaries in your system is very
important for EDA. Events are used to communicate outside your domain boundary.
Bad boundaries could result in too many events to be supported by each
microservice. And since it is hard to control and assume events order, the
complexity is exponential to the number of events.
Idempotency of events is critical. In production, you don't
want duplicate data or transactions to occur. It also facilitates testing, as
you can replay events into a staging environment.
This pattern could decrease coupling. Each microservice only
needs to handle and publish events and there is no god service to couple them.
Avoiding God
services and central controllers is a question of taking the responsibilities
and autonomy of the teams seriously.
The ownership for the process
and the needed flow logic can be distributed. How much will primarily depend on
your organizational structure which should also be reflected in your service
landscape (see Conway’s Law).
(My thoughts: Even
with choreography, we need a high level design of each team’s boundary and
event handling process for business cases. So for both orchestration and
choreography, we need some high level design across multiple teams’ domains. This
can be designed and owned by some experienced engineers who have knowledge
across multiple domains and teams. These engineers can come from all related
sub teams and form a special committee or team for the high level design. This
is similar to government structure where a president has his own counsels to
coordinate among different departments. )
If a service
fails to complete a business operation, it can be difficult to recover from
that failure.
It is hard to
control event order.
The choreography pattern becomes
a challenge if the number of services grows rapidly. Given the high number of
independent moving parts, the workflow between services tends to get complex.
Also, distributed tracing becomes difficult.
The role is distributed between
all services and resiliency becomes less robust. Each service isn't only
responsible for the resiliency of its operation but also the workflow. This
responsibility can be burdensome for the service and hard to implement. Each
service must retry transient, nontransient, and time-out failures, so that the
request terminates gracefully, if needed. Also, the service must be diligent
about communicating the success or failure of the operation so that other
services can act accordingly.
Since each microservice handles
and publishes events asynchronously, it is hard to monitor and debug the
overall business workflow.
When to use what:
Use the choreography pattern if
you expect to update, remove, or add new services frequently. The entire app
can be modified with lesser effort and minimal disruption to existing services.
Consider choreography pattern if
you experience performance bottlenecks in the central orchestrator.
Choreography pattern is a natural
model for the serverless architecture where all services can be short lived, or
event driven. Services can spin up because of an event, do their task, and are
removed when the task is finished.
A good rule of thumb is to use orchestration when you're
coordinating events within your bounded context, and to use event-driven
choreography for interactions across domains.
Better to start with orchestration and use choreography when
really necessary.
(My thoughts: choreography
is suitable for long running, few participants and various contexts system. Because
it doesn’t make sense to let an orchestration api wait for others for a long
time. And few participants constrain the events complexity.
Some process like
order fulfillment (from payment to order shipment) are long running process
because it takes a long time such as weeks to finish them. In such process, we
can divide things into micro service such as payment, order, checkout, order
delivery etc.
From organization perspective,
choreography is suitable for coordination of multiple faraway teams
whose domains and contexts are less coupled and very different. On the other
side orchestration is suitable for close teams.)
Reference:
https://www.infoq.com/news/2008/09/Orchestration/
https://stackoverflow.com/questions/4127241/orchestration-vs-choreography
https://docs.microsoft.com/en-us/azure/architecture/patterns/choreography
https://www.infoq.com/podcasts/event-driven-architectures-scale/
https://www.infoq.com/articles/microservice-event-choreographies/
https://theburningmonk.com/2020/08/choreography-vs-orchestration-in-the-land-of-serverless/