The Saga Pattern
When you book a flight, reserve a hotel, and rent a car in a single transaction, what happens if the car rental fails but the flight and hotel are already reserved? In a monolithic system you would roll back the entire transaction, but in a microservices architecture using separate databases for flights, hotels, and cars, traditional ACID transactions don't work.
The Saga pattern solves this by breaking long-running transactions into a sequence of local transactions, each with a compensating action to undo its effects if the overall transaction fails. This enables distributed transactions without distributed locks, maintaining eventual consistency while handling failures gracefully.
The pattern is a response to the fact that distributed transactions using two-phase commit don't scale. Sagas trade immediate consistency for availability and fault tolerance, but as we'll see, bring constraints of their own.
The Saga Pattern
A Saga is a sequence of local transactions, each of which updates a single service and publishes an event or message to trigger the next transaction. If a transaction fails, the Saga executes compensating transactions to undo the changes made by preceding transactions.
Unlike two-phase commit, which uses prepare/commit phases and locks, Sagas commit each step immediately and use compensation to handle failures. The pattern can be implemented in one of two ways.
-
Orchestration: a central coordinator tells each service what to do. This is easier to understand and monitor, but the coordinator is a coordination point.
-
Choreography: each service listens for events and decides what to do next. Its decentralization makes it more scalable, but also makes it harder to understand and monitor.
Whichever approach is used, each forward step must be define a backward compensation that is retryable and idempotent. The first requirement is self-explanatory; the second means that the compensation has the same effect on the system no matter how many times it is tried. (A simple example is multiplying a value by 1: it can be done any number of times, but always has the same result.)
Compensations must be idempotent because it's possible for a workflow to go forward and backward several times. Compensations can be implemented as negative transitions, but it is often easier to implement them by saving the state before the transition, such as the account balance, and then restoring that state. These approaches can be mixed together in a single workflow depending on which is easiest to use with a particular external service.
Core Data Structures
Let's build a travel booking saga with flights, hotels, and car rentals. We start by defining the states that the saga as a whole and the individual transactions can be in:
class SagaStatus(Enum):
"""Status of a saga execution."""
PENDING = "pending"
IN_PROGRESS = "in_progress"
COMPLETED = "completed"
COMPENSATING = "compensating"
FAILED = "failed"
class TransactionStatus(Enum):
"""Status of individual transaction."""
PENDING = "pending"
COMPLETED = "completed"
FAILED = "failed"
COMPENSATED = "compensated"
We then create structures to represent the saga's state machine. Each step has a forward transaction and a backward compensation.
@dataclass
class SagaStep:
"""A step in the saga with transaction and compensation."""
name: str
service_name: str
transaction: Callable[[], bool] # Returns True if successful
compensation: Optional[Callable[[], bool]] # Returns True if successful
def __str__(self) -> str:
return f"Step({self.name})"
@dataclass
class SagaExecution:
"""Tracks execution of a saga instance."""
saga_id: str
steps: List[SagaStep]
status: SagaStatus = SagaStatus.PENDING
current_step: int = 0
completed_steps: List[str] = field(default_factory=list)
failed_step: Optional[str] = None
context: Dict[str, Any] = field(default_factory=dict)
def __str__(self) -> str:
return f"Saga({self.saga_id}, {self.status.value}, step {self.current_step}/{len(self.steps)})"
@dataclass
class BookingRequest:
"""Travel booking request."""
booking_id: str
customer_id: str
flight_id: str
hotel_id: str
car_id: str
def __str__(self) -> str:
return f"Booking({self.booking_id})"
@dataclass
class SagaEvent:
"""Event in choreographed saga."""
event_type: str # "flight_booked", "flight_failed", etc.
saga_id: str
data: Dict[str, Any]
def __str__(self) -> str:
return f"Event({self.event_type})"
Service Implementations
We can now implement the three microservices involved in the saga. Each service manages its own local state and provides both forward (book) and backward (cancel) operations. The flight service manages seat availability; we have given it a 10% random failure rate to simulate real-world unreliability:
class FlightService(Process):
"""Microservice for booking flights."""
def init(self) -> None:
self.bookings: Dict[str, Dict[str, Any]] = {}
self.request_queue: Queue = Queue(self._env)
self.available_seats = 10
print(f"[{self.now:.1f}] FlightService started (seats: {self.available_seats})")
async def run(self) -> None:
"""Handle flight booking requests."""
while True:
await self.timeout(1.0)
def book_flight(self, booking_id: str, flight_id: str) -> bool:
"""Book a flight (forward transaction)."""
print(f"[{self.now:.1f}] FlightService: Booking flight {flight_id}")
# Simulate occasional failures
if random.random() < 0.1:
print(f"[{self.now:.1f}] FlightService: Booking FAILED - system error")
return False
if self.available_seats <= 0:
print(f"[{self.now:.1f}] FlightService: Booking FAILED - no seats")
return False
self.available_seats -= 1
self.bookings[booking_id] = {
"flight_id": flight_id,
"status": "booked",
"seats": 1,
}
print(
f"[{self.now:.1f}] FlightService: ✓ Flight booked "
f"(remaining: {self.available_seats})"
)
return True
def cancel_flight(self, booking_id: str) -> bool:
"""Cancel flight booking (compensation)."""
print(f"[{self.now:.1f}] FlightService: COMPENSATING - canceling {booking_id}")
if booking_id not in self.bookings:
print(f"[{self.now:.1f}] FlightService: No booking to cancel")
return True
seats = self.bookings[booking_id].get("seats", 1)
self.available_seats += seats
self.bookings[booking_id]["status"] = "canceled"
print(
f"[{self.now:.1f}] FlightService: ✓ Flight canceled "
f"(available: {self.available_seats})"
)
return True
The hotel service follows the same pattern with a 15% failure rate and room inventory:
class HotelService(Process):
"""Microservice for booking hotels."""
def init(self) -> None:
self.bookings: Dict[str, Dict[str, Any]] = {}
self.request_queue: Queue = Queue(self._env)
self.available_rooms = 5
print(f"[{self.now:.1f}] HotelService started (rooms: {self.available_rooms})")
async def run(self) -> None:
"""Handle hotel booking requests."""
while True:
await self.timeout(1.0)
def book_hotel(self, booking_id: str, hotel_id: str) -> bool:
"""Book a hotel (forward transaction)."""
print(f"[{self.now:.1f}] HotelService: Booking hotel {hotel_id}")
# Simulate occasional failures
if random.random() < 0.15:
print(f"[{self.now:.1f}] HotelService: Booking FAILED - no rooms")
return False
if self.available_rooms <= 0:
print(f"[{self.now:.1f}] HotelService: Booking FAILED - no rooms")
return False
self.available_rooms -= 1
self.bookings[booking_id] = {
"hotel_id": hotel_id,
"status": "booked",
"rooms": 1,
}
print(
f"[{self.now:.1f}] HotelService: ✓ Hotel booked "
f"(remaining: {self.available_rooms})"
)
return True
def cancel_hotel(self, booking_id: str) -> bool:
"""Cancel hotel booking (compensation)."""
print(f"[{self.now:.1f}] HotelService: COMPENSATING - canceling {booking_id}")
if booking_id not in self.bookings:
print(f"[{self.now:.1f}] HotelService: No booking to cancel")
return True
rooms = self.bookings[booking_id].get("rooms", 1)
self.available_rooms += rooms
self.bookings[booking_id]["status"] = "canceled"
print(
f"[{self.now:.1f}] HotelService: ✓ Hotel canceled "
f"(available: {self.available_rooms})"
)
return True
Finally, the car rental service has a 30% failure rate to demonstrate more frequent compensation:
class CarRentalService(Process):
"""Microservice for renting cars."""
def init(self) -> None:
self.bookings: Dict[str, Dict[str, Any]] = {}
self.request_queue: Queue = Queue(self._env)
self.available_cars = 3
print(
f"[{self.now:.1f}] CarRentalService started (cars: {self.available_cars})"
)
async def run(self) -> None:
"""Handle car rental requests."""
while True:
await self.timeout(1.0)
def book_car(self, booking_id: str, car_id: str) -> bool:
"""Book a car (forward transaction)."""
print(f"[{self.now:.1f}] CarRentalService: Booking car {car_id}")
# Simulate higher failure rate for demonstration
if random.random() < 0.3:
print(f"[{self.now:.1f}] CarRentalService: Booking FAILED - no cars")
return False
if self.available_cars <= 0:
print(f"[{self.now:.1f}] CarRentalService: Booking FAILED - no cars")
return False
self.available_cars -= 1
self.bookings[booking_id] = {"car_id": car_id, "status": "booked", "cars": 1}
print(
f"[{self.now:.1f}] CarRentalService: ✓ Car booked "
f"(remaining: {self.available_cars})"
)
return True
def cancel_car(self, booking_id: str) -> bool:
"""Cancel car rental (compensation)."""
print(
f"[{self.now:.1f}] CarRentalService: COMPENSATING - canceling {booking_id}"
)
if booking_id not in self.bookings:
print(f"[{self.now:.1f}] CarRentalService: No booking to cancel")
return True
cars = self.bookings[booking_id].get("cars", 1)
self.available_cars += cars
self.bookings[booking_id]["status"] = "canceled"
print(
f"[{self.now:.1f}] CarRentalService: ✓ Car canceled "
f"(available: {self.available_cars})"
)
return True
Each service is autonomous: it manages its own database and can succeed or fail independently.
Orchestration-Based Saga
The orchestrator coordinates the sequence of transactions. It stores references to each service and processes requests from a queue:
class SagaOrchestrator(Process):
"""Centralized saga coordinator (orchestration pattern)."""
def init(
self,
flight_service: FlightService,
hotel_service: HotelService,
car_service: CarRentalService,
) -> None:
self.flight_service = flight_service
self.hotel_service = hotel_service
self.car_service = car_service
self.request_queue: Queue = Queue(self._env)
self.active_sagas: Dict[str, SagaExecution] = {}
# Statistics
self.sagas_completed = 0
self.sagas_failed = 0
print(f"[{self.now:.1f}] SagaOrchestrator started\n")
async def run(self) -> None:
"""Process booking requests."""
while True:
request = await self.request_queue.get()
await self.execute_saga(request)
When a booking request arrives,
execute_saga builds the list of steps and drives them forward,
triggering compensation if any step fails:
async def execute_saga(self, booking: BookingRequest) -> None:
"""Execute travel booking saga."""
print(f"[{self.now:.1f}] {'=' * 60}")
print(f"[{self.now:.1f}] Starting saga for {booking}")
print(f"[{self.now:.1f}] {'=' * 60}")
# Define saga steps
steps = [
SagaStep(
name="book_flight",
service_name="FlightService",
transaction=lambda: self.flight_service.book_flight(
booking.booking_id, booking.flight_id
),
compensation=lambda: self.flight_service.cancel_flight(
booking.booking_id
),
),
SagaStep(
name="book_hotel",
service_name="HotelService",
transaction=lambda: self.hotel_service.book_hotel(
booking.booking_id, booking.hotel_id
),
compensation=lambda: self.hotel_service.cancel_hotel(
booking.booking_id
),
),
SagaStep(
name="book_car",
service_name="CarRentalService",
transaction=lambda: self.car_service.book_car(
booking.booking_id, booking.car_id
),
compensation=None, # Last step doesn't need compensation
),
]
saga = SagaExecution(
saga_id=booking.booking_id, steps=steps, status=SagaStatus.IN_PROGRESS
)
self.active_sagas[booking.booking_id] = saga
# Execute forward transactions
success = await self.execute_forward(saga)
if success:
saga.status = SagaStatus.COMPLETED
self.sagas_completed += 1
print(f"\n[{self.now:.1f}] ✓✓✓ Saga {saga.saga_id} COMPLETED ✓✓✓\n")
else:
# Execute compensations
saga.status = SagaStatus.COMPENSATING
await self.execute_compensation(saga)
saga.status = SagaStatus.FAILED
self.sagas_failed += 1
print(
f"\n[{self.now:.1f}] ✗✗✗ Saga {saga.saga_id} FAILED - compensated ✗✗✗\n"
)
The forward pass runs each step in sequence, stopping immediately on the first failure:
async def execute_forward(self, saga: SagaExecution) -> bool:
"""Execute forward transactions in sequence."""
for i, step in enumerate(saga.steps):
saga.current_step = i
print(
f"[{self.now:.1f}] Orchestrator: Executing step {i + 1}/"
f"{len(saga.steps)}: {step.name}"
)
# Simulate network delay
await self.timeout(0.3)
# Execute transaction
success = step.transaction()
if success:
saga.completed_steps.append(step.name)
else:
saga.failed_step = step.name
print(f"[{self.now:.1f}] Orchestrator: Step {step.name} FAILED")
return False
return True
The compensation pass runs in reverse, undoing each completed step:
async def execute_compensation(self, saga: SagaExecution) -> None:
"""Execute compensating transactions in reverse order."""
print(f"\n[{self.now:.1f}] Orchestrator: Starting compensation...")
# Compensate in reverse order
for step_name in reversed(saga.completed_steps):
# Find the step
step = next(s for s in saga.steps if s.name == step_name)
if step.compensation:
print(f"[{self.now:.1f}] Orchestrator: Compensating {step_name}")
# Simulate network delay
await self.timeout(0.2)
success = step.compensation()
if not success:
print(
f"[{self.now:.1f}] Orchestrator: WARNING - "
f"Compensation {step_name} failed! Manual intervention needed."
)
Basic Orchestration Example
Let's see orchestration in action:
def main():
env = Environment()
# Create services
flight_service = FlightService(env)
hotel_service = HotelService(env)
car_service = CarRentalService(env)
# Create orchestrator
orchestrator = SagaOrchestrator(env, flight_service, hotel_service, car_service)
# Submit booking requests
class BookingGenerator(Process):
def init(self, orch: SagaOrchestrator) -> None:
self.orch = orch
async def run(self) -> None:
for i in range(5):
booking = BookingRequest(
booking_id=f"BOOK{i + 1:03d}",
customer_id=f"CUST{i + 1}",
flight_id="FL123",
hotel_id="HTL456",
car_id="CAR789",
)
await self.orch.request_queue.put(booking)
await self.timeout(3.0)
BookingGenerator(env, orchestrator)
# Run simulation
env.run(until=40)
# Print summary
print("\n" + "=" * 60)
print("Final State:")
print("=" * 60)
print(f"Flight seats available: {flight_service.available_seats}/10")
print(f"Hotel rooms available: {hotel_service.available_rooms}/5")
print(f"Cars available: {car_service.available_cars}/3")
print(f"\nCompleted sagas: {orchestrator.sagas_completed}")
print(f"Failed sagas: {orchestrator.sagas_failed}")
[0.0] FlightService started (seats: 10)
[0.0] HotelService started (rooms: 5)
[0.0] CarRentalService started (cars: 3)
[0.0] SagaOrchestrator started
[0.0] ============================================================
[0.0] Starting saga for Booking(BOOK001)
[0.0] ============================================================
[0.0] Orchestrator: Executing step 1/3: book_flight
[0.3] FlightService: Booking flight FL123
[0.3] FlightService: ✓ Flight booked (remaining: 9)
[0.3] Orchestrator: Executing step 2/3: book_hotel
[0.6] HotelService: Booking hotel HTL456
[0.6] HotelService: ✓ Hotel booked (remaining: 4)
[0.6] Orchestrator: Executing step 3/3: book_car
[0.9] CarRentalService: Booking car CAR789
[0.9] CarRentalService: ✓ Car booked (remaining: 2)
[0.9] ✓✓✓ Saga BOOK001 COMPLETED ✓✓✓
[3.0] ============================================================
[3.0] Starting saga for Booking(BOOK002)
[3.0] ============================================================
[3.0] Orchestrator: Executing step 1/3: book_flight
[3.3] FlightService: Booking flight FL123
[3.3] FlightService: Booking FAILED - system error
[3.3] Orchestrator: Step book_flight FAILED
[3.3] Orchestrator: Starting compensation...
[3.3] ✗✗✗ Saga BOOK002 FAILED - compensated ✗✗✗
[6.0] ============================================================
[6.0] Starting saga for Booking(BOOK003)
[6.0] ============================================================
[6.0] Orchestrator: Executing step 1/3: book_flight
[6.3] FlightService: Booking flight FL123
[6.3] FlightService: Booking FAILED - system error
[6.3] Orchestrator: Step book_flight FAILED
[6.3] Orchestrator: Starting compensation...
[6.3] ✗✗✗ Saga BOOK003 FAILED - compensated ✗✗✗
[9.0] ============================================================
[9.0] Starting saga for Booking(BOOK004)
[9.0] ============================================================
[9.0] Orchestrator: Executing step 1/3: book_flight
[9.3] FlightService: Booking flight FL123
[9.3] FlightService: ✓ Flight booked (remaining: 8)
[9.3] Orchestrator: Executing step 2/3: book_hotel
[9.6] HotelService: Booking hotel HTL456
[9.6] HotelService: ✓ Hotel booked (remaining: 3)
[9.6] Orchestrator: Executing step 3/3: book_car
[9.9] CarRentalService: Booking car CAR789
[9.9] CarRentalService: Booking FAILED - no cars
[9.9] Orchestrator: Step book_car FAILED
[9.9] Orchestrator: Starting compensation...
[9.9] Orchestrator: Compensating book_hotel
[10.1] HotelService: COMPENSATING - canceling BOOK004
[10.1] HotelService: ✓ Hotel canceled (available: 4)
[10.1] Orchestrator: Compensating book_flight
[10.3] FlightService: COMPENSATING - canceling BOOK004
[10.3] FlightService: ✓ Flight canceled (available: 9)
[10.3] ✗✗✗ Saga BOOK004 FAILED - compensated ✗✗✗
[12.0] ============================================================
[12.0] Starting saga for Booking(BOOK005)
[12.0] ============================================================
[12.0] Orchestrator: Executing step 1/3: book_flight
[12.3] FlightService: Booking flight FL123
[12.3] FlightService: ✓ Flight booked (remaining: 8)
[12.3] Orchestrator: Executing step 2/3: book_hotel
[12.6] HotelService: Booking hotel HTL456
[12.6] HotelService: ✓ Hotel booked (remaining: 3)
[12.6] Orchestrator: Executing step 3/3: book_car
[12.9] CarRentalService: Booking car CAR789
[12.9] CarRentalService: Booking FAILED - no cars
[12.9] Orchestrator: Step book_car FAILED
[12.9] Orchestrator: Starting compensation...
[12.9] Orchestrator: Compensating book_hotel
[13.1] HotelService: COMPENSATING - canceling BOOK005
[13.1] HotelService: ✓ Hotel canceled (available: 4)
[13.1] Orchestrator: Compensating book_flight
[13.3] FlightService: COMPENSATING - canceling BOOK005
[13.3] FlightService: ✓ Flight canceled (available: 9)
[13.3] ✗✗✗ Saga BOOK005 FAILED - compensated ✗✗✗
============================================================
Final State:
============================================================
Flight seats available: 9/10
Hotel rooms available: 4/5
Cars available: 2/3
Completed sagas: 1
Failed sagas: 4