Saga Orchestration

Name: Saga Orchestration
Author: wshobson

wshobson/agents

8.3k installs
38.3k repo stars
Updated July 22, 2026
wshobson/agents

saga-orchestration is an agent skill that Implement saga patterns for distributed transactions and cross-aggregate workflows. Use this skill when implementing distributed transactions across microservic.

About

Implement saga patterns for distributed transactions and cross-aggregate workflows. Use this skill when implementing distributed transactions across microservices where 2PC is unavailable, designing compensating actions for failed order workflows that span inventory, payment, and shipping services, building event-driven saga coordinators for travel booking systems that must roll back hotel, flight --- name: saga-orchestration description: Implement saga patterns for distributed transactions and cross-aggregate workflows. Use this skill when implementing distributed transactions across microservices where 2PC is unavailable, designing compensating actions for failed order workflows that span inventory, payment, and shipping services, building event-driven saga coordinators for travel booking systems that must roll back hotel, flight, and car rental reservations atomically, or debugging stuck saga states in production where compensation steps never complete. --- # Saga Orchestration Patterns for managing distributed transactions and long-running business processes without two-phase commit. ## Inputs and Outputs **What you provide:** - Service boundaries and ownership (which service.

Service boundaries and ownership (which service owns which step)
Transaction requirements (which steps must be atomic, which can be eventual)
Failure modes for each step (transient vs. permanent, retry policy)
SLA requirements per step (informs timeout configuration)
Existing event/messaging infrastructure (Kafka, RabbitMQ, SQS, etc.)

Saga Orchestration by the numbers

8,318 all-time installs (skills.sh)
+166 installs in the week ending Jul 28, 2026 (Skillselion tracking)
Ranked #190 of 2,184 Testing & QA skills by installs in the Skillselion catalog
Security screen: MEDIUM risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

At a glance

saga-orchestration capabilities & compatibility

Capabilities: service boundaries and ownership (which service · transaction requirements (which steps must be at · failure modes for each step (transient vs. perma · sla requirements per step (informs timeout confi · existing event/messaging infrastructure (kafka,
Use cases: documentation

From the docs

What saga-orchestration says it does

--- name: saga-orchestration description: Implement saga patterns for distributed transactions and cross-aggregate workflows.

SKILL.md

--- # Saga Orchestration Patterns for managing distributed transactions and long-running business processes without two-phase commit.

SKILL.md

## Detailed section: Templates Moved to `references/details.md`.

SKILL.md

This means a compensation handler is throwing an unhandled exception and never publishing `SagaCompensationCompleted`.

SKILL.md

npx skills add https://github.com/wshobson/agents --skill saga-orchestration

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/wshobson/agents/saga-orchestration.svg)](https://skillselion.com/skills/wshobson/agents/saga-orchestration)

Installs	8.3k
repo stars	★ 38.3k
Security audit	2 / 3 scanners passed
Last updated	July 22, 2026
Repository	wshobson/agents ↗

What problem does saga-orchestration solve for developers using this skill?

Who is it for?

Developers who need saga-orchestration patterns described in the cached skill documentation.

Skip if: Skip when docs are empty or the task is outside the skill's documented scope.

When should I use this skill?

What you get

Actionable workflows and conventions from SKILL.md for saga-orchestration.

Saga coordinator design
Compensating transaction handlers

By the numbers

Covers order workflows spanning inventory, payment, and shipping microservices
Includes travel booking examples with hotel, flight, and car rental compensation

Files

SKILL.mdMarkdownGitHub ↗

Saga Orchestration

Patterns for managing distributed transactions and long-running business processes without two-phase commit.

Inputs and Outputs

What you provide:

Service boundaries and ownership (which service owns which step)
Transaction requirements (which steps must be atomic, which can be eventual)
Failure modes for each step (transient vs. permanent, retry policy)
SLA requirements per step (informs timeout configuration)
Existing event/messaging infrastructure (Kafka, RabbitMQ, SQS, etc.)

What this skill produces:

Saga definition with ordered steps, action commands, and compensation commands
Orchestrator or choreography implementation for your chosen pattern
Compensation logic for each participant service (idempotent, always-succeeds)
Step timeout configuration with per-step deadlines
Monitoring setup: state machine metrics, stuck saga detection, DLQ recovery

---

When to Use This Skill

Coordinating multi-service transactions without distributed locks
Implementing compensating transactions for partial failures
Managing long-running business workflows (minutes to hours)
Handling failures in distributed systems where atomicity is required
Building order fulfillment, approval, or booking processes
Replacing fragile two-phase commit with async compensation

---

Detailed section: Core Concepts

Moved to references/details.md.

Detailed section: Templates

Moved to references/details.md.

Best Practices

Do's

Make every step idempotent — Commands may be replayed on broker reconnect
Design compensations carefully — They are the most critical code path
Use correlation IDs — The saga_id must flow through every event and log
Implement per-step timeouts — Never wait indefinitely for a participant reply
Log state transitions — saga_id, step_name, old_state → new_state on every change
Test compensation paths explicitly — Inject failures at each step index in integration tests

Don'ts

Don't assume instant completion — Sagas are async and may take minutes
Don't skip compensation testing — The rollback path is the hardest to get right
Don't couple services directly — Use async messaging, never synchronous calls inside a saga step
Don't ignore partial failures — A step that partially executed still needs compensation
Don't use a global timeout — Each step has different latency characteristics

---

Troubleshooting

Saga stuck in COMPENSATING state

A saga enters compensation but never reaches FAILED. This means a compensation handler is throwing an unhandled exception and never publishing SagaCompensationCompleted. Add dead-letter queue (DLQ) handling to compensation consumers and ensure every compensation action publishes a result event even when the underlying operation was already rolled back.

async def handle_release_reservation(self, command: Dict):
    try:
        await self.release_reservation(command["original_result"]["reservation_id"])
    except ReservationNotFoundError:
        pass  # Already released — treat as success
    # Always publish completion, regardless of outcome
    await self.event_publisher.publish("SagaCompensationCompleted", {
        "saga_id": command["saga_id"],
        "step_name": "reserve_inventory"
    })

Duplicate saga executions on restart

If your orchestrator service restarts mid-saga, it may replay events and re-execute already-completed steps. Guard every step action with an idempotency key — see Template 3 above.

Choreography saga losing events

In a choreography-based saga, a downstream service may miss an event if it was offline when published. Use a durable message broker (Kafka with replication, RabbitMQ with persistence) and store the current saga state in a dedicated saga_log table so you can replay from the last known good step.

Timeout firing before a slow-but-valid step completes

A step like create_shipment might take up to 15 minutes during peak load but your global timeout is 5 minutes, causing spurious compensation. Make step timeouts configurable per step type — see references/advanced-patterns.md for the TimeoutSagaOrchestrator implementation and the STEP_TIMEOUTS dict pattern.

Compensation order not matching execution order

When two steps both complete before a failure is detected, compensation must run in strict reverse order or you leave data in an inconsistent state. Verify that _compensate() iterates from current_step - 1 down to 0, and add an integration test that deliberately fails at each step index to confirm correct rollback order.

---

Advanced Patterns

The references/ directory contains production-grade implementations not needed for most sagas:

`references/advanced-patterns.md` — Full SagaOrchestrator abstract base class, TimeoutSagaOrchestrator with per-step deadlines, detailed bank transfer compensating transaction chain, Prometheus instrumentation, stuck saga PromQL alerts, and DLQ recovery worker.

---

Related Skills

cqrs-implementation — Pair sagas with CQRS for read-model updates after each step completes
event-store-design — Store saga events in an event store for full audit trail and replay capability
workflow-orchestration-patterns — Higher-level workflow engines (Temporal, Conductor) that build on saga concepts

Saga Orchestration — Advanced Patterns

Complex implementations extracted from core skill for deeper reference.

---

Full Saga Orchestrator Base Class

The abstract base handles all state transitions, compensation ordering, and event publishing. Subclass this for every saga type in your system.

from abc import ABC, abstractmethod
from dataclasses import dataclass, field
from enum import Enum
from typing import List, Dict, Any, Optional
from datetime import datetime, timedelta
import uuid


class SagaState(Enum):
    STARTED = "started"
    PENDING = "pending"
    COMPENSATING = "compensating"
    COMPLETED = "completed"
    FAILED = "failed"


@dataclass
class SagaStep:
    name: str
    action: str
    compensation: str
    status: str = "pending"
    result: Optional[Dict] = None
    error: Optional[str] = None
    executed_at: Optional[datetime] = None
    compensated_at: Optional[datetime] = None
    timeout_at: Optional[datetime] = None


@dataclass
class Saga:
    saga_id: str
    saga_type: str
    state: SagaState
    data: Dict[str, Any]
    steps: List[SagaStep]
    current_step: int = 0
    created_at: datetime = field(default_factory=datetime.utcnow)
    updated_at: datetime = field(default_factory=datetime.utcnow)


class SagaOrchestrator(ABC):
    """Base class for all saga orchestrators.

    Responsibilities:
    - Execute steps in sequence via async command messages
    - Trigger compensation in reverse order on any failure
    - Persist saga state after every transition
    - Publish domain events on completion and failure
    """

    def __init__(self, saga_store, event_publisher):
        self.saga_store = saga_store
        self.event_publisher = event_publisher

    @abstractmethod
    def define_steps(self, data: Dict) -> List[SagaStep]:
        """Define the ordered saga steps for this workflow."""
        pass

    @property
    @abstractmethod
    def saga_type(self) -> str:
        """Unique identifier for this saga type (e.g., 'OrderFulfillment')."""
        pass

    async def start(self, data: Dict) -> Saga:
        """Start a new saga instance."""
        saga = Saga(
            saga_id=str(uuid.uuid4()),
            saga_type=self.saga_type,
            state=SagaState.STARTED,
            data=data,
            steps=self.define_steps(data)
        )
        await self.saga_store.save(saga)
        await self._execute_next_step(saga)
        return saga

    async def handle_step_completed(self, saga_id: str, step_name: str, result: Dict):
        """Handle a successful step reply from a participant service."""
        saga = await self.saga_store.get(saga_id)

        for step in saga.steps:
            if step.name == step_name:
                step.status = "completed"
                step.result = result
                step.executed_at = datetime.utcnow()
                break

        saga.current_step += 1
        saga.updated_at = datetime.utcnow()

        if saga.current_step >= len(saga.steps):
            saga.state = SagaState.COMPLETED
            await self.saga_store.save(saga)
            await self._on_saga_completed(saga)
        else:
            saga.state = SagaState.PENDING
            await self.saga_store.save(saga)
            await self._execute_next_step(saga)

    async def handle_step_failed(self, saga_id: str, step_name: str, error: str):
        """Handle a step failure and begin compensation."""
        saga = await self.saga_store.get(saga_id)

        for step in saga.steps:
            if step.name == step_name:
                step.status = "failed"
                step.error = error
                break

        saga.state = SagaState.COMPENSATING
        saga.updated_at = datetime.utcnow()
        await self.saga_store.save(saga)
        await self._compensate(saga)

    async def _execute_next_step(self, saga: Saga):
        """Publish the command for the current step."""
        if saga.current_step >= len(saga.steps):
            return

        step = saga.steps[saga.current_step]
        step.status = "executing"
        await self.saga_store.save(saga)

        await self.event_publisher.publish(
            step.action,
            {
                "saga_id": saga.saga_id,
                "step_name": step.name,
                **saga.data
            }
        )

    async def _compensate(self, saga: Saga):
        """Execute compensation steps in reverse order."""
        for i in range(saga.current_step - 1, -1, -1):
            step = saga.steps[i]
            if step.status == "completed":
                step.status = "compensating"
                await self.saga_store.save(saga)

                await self.event_publisher.publish(
                    step.compensation,
                    {
                        "saga_id": saga.saga_id,
                        "step_name": step.name,
                        "original_result": step.result,
                        **saga.data
                    }
                )

    async def handle_compensation_completed(self, saga_id: str, step_name: str):
        """Mark a compensation step done and check if all are finished."""
        saga = await self.saga_store.get(saga_id)

        for step in saga.steps:
            if step.name == step_name:
                step.status = "compensated"
                step.compensated_at = datetime.utcnow()
                break

        all_compensated = all(
            s.status in ("compensated", "pending", "failed")
            for s in saga.steps
        )

        if all_compensated:
            saga.state = SagaState.FAILED
            await self._on_saga_failed(saga)

        await self.saga_store.save(saga)

    async def _on_saga_completed(self, saga: Saga):
        await self.event_publisher.publish(
            f"{self.saga_type}Completed",
            {"saga_id": saga.saga_id, **saga.data}
        )

    async def _on_saga_failed(self, saga: Saga):
        await self.event_publisher.publish(
            f"{self.saga_type}Failed",
            {"saga_id": saga.saga_id, "error": "Saga failed after compensation", **saga.data}
        )

---

Saga Orchestrator with Per-Step Timeouts

Each step gets an independent deadline. The scheduler fires a timeout job; if the step is still executing at that point, compensation begins automatically. Use this when participant SLAs vary widely (e.g., payment = 30 s, shipping label = 15 min).

class TimeoutSagaOrchestrator(SagaOrchestrator):
    """Extends the base orchestrator with configurable per-step timeouts."""

    # Override per saga subclass as needed
    STEP_TIMEOUTS: Dict[str, timedelta] = {
        "reserve_inventory": timedelta(minutes=2),
        "process_payment":   timedelta(minutes=1),
        "create_shipment":   timedelta(minutes=15),
        "send_confirmation": timedelta(minutes=2),
    }

    def __init__(self, saga_store, event_publisher, scheduler):
        super().__init__(saga_store, event_publisher)
        self.scheduler = scheduler

    async def _execute_next_step(self, saga: Saga):
        if saga.current_step >= len(saga.steps):
            return

        step = saga.steps[saga.current_step]
        step.status = "executing"
        step.timeout_at = datetime.utcnow() + self.STEP_TIMEOUTS.get(
            step.name, timedelta(minutes=5)
        )
        await self.saga_store.save(saga)

        # Schedule the timeout watchdog
        await self.scheduler.schedule(
            job_id=f"saga_timeout_{saga.saga_id}_{step.name}",
            handler=self._check_timeout,
            payload={"saga_id": saga.saga_id, "step_name": step.name},
            run_at=step.timeout_at
        )

        await self.event_publisher.publish(
            step.action,
            {"saga_id": saga.saga_id, "step_name": step.name, **saga.data}
        )

    async def _check_timeout(self, data: Dict):
        """Called by the scheduler when a step deadline is reached."""
        saga = await self.saga_store.get(data["saga_id"])
        step = next((s for s in saga.steps if s.name == data["step_name"]), None)

        if step and step.status == "executing":
            await self.handle_step_failed(
                data["saga_id"],
                data["step_name"],
                f"Step '{data['step_name']}' timed out after {self.STEP_TIMEOUTS.get(data['step_name'])}"
            )

    async def handle_step_completed(self, saga_id: str, step_name: str, result: Dict):
        """Cancel the timeout job before processing the success reply."""
        await self.scheduler.cancel(f"saga_timeout_{saga_id}_{step_name}")
        await super().handle_step_completed(saga_id, step_name, result)

---

Detailed Compensating Transaction Chains

The pattern below shows a full compensation chain for a bank transfer saga. Each compensation is idempotent and always emits a result event — even when the underlying resource is already in the desired state.

class BankTransferSaga(SagaOrchestrator):
    """Saga for transferring funds between accounts across services."""

    @property
    def saga_type(self) -> str:
        return "BankTransfer"

    def define_steps(self, data: Dict) -> List[SagaStep]:
        return [
            SagaStep(
                name="debit_source",
                action="AccountService.DebitAccount",
                compensation="AccountService.CreditAccount"  # reverse the debit
            ),
            SagaStep(
                name="create_transfer_record",
                action="LedgerService.CreateTransfer",
                compensation="LedgerService.VoidTransfer"
            ),
            SagaStep(
                name="credit_destination",
                action="AccountService.CreditDestinationAccount",
                compensation="AccountService.DebitAccount"  # reverse the credit
            ),
            SagaStep(
                name="notify_parties",
                action="NotificationService.SendTransferConfirmation",
                compensation="NotificationService.SendTransferFailureNotice"
            ),
        ]


class AccountService:
    async def handle_debit_account(self, command: Dict):
        idempotency_key = f"debit-{command['saga_id']}-{command['account_id']}"
        existing = await self.ledger.find_by_key(idempotency_key)
        if existing:
            await self._publish_completed(command, {"transaction_id": existing.id})
            return
        try:
            txn = await self.ledger.debit(
                account_id=command["source_account_id"],
                amount=command["amount"],
                idempotency_key=idempotency_key
            )
            await self._publish_completed(command, {"transaction_id": txn.id})
        except InsufficientFundsError as e:
            await self._publish_failed(command, str(e))

    async def handle_credit_account(self, command: Dict):
        """Compensation: credit back a previously debited account."""
        idempotency_key = f"credit-comp-{command['saga_id']}-{command['account_id']}"
        existing = await self.ledger.find_by_key(idempotency_key)
        if not existing:
            await self.ledger.credit(
                account_id=command["source_account_id"],
                amount=command["amount"],
                idempotency_key=idempotency_key
            )
        # Always publish — even if already credited
        await self.event_publisher.publish("SagaCompensationCompleted", {
            "saga_id": command["saga_id"],
            "step_name": "debit_source"
        })

---

Production Monitoring Setup

Prometheus Metrics

Expose saga health metrics for alerting on stuck sagas and compensation rates.

from prometheus_client import Counter, Histogram, Gauge
import time

saga_started_total = Counter(
    "saga_started_total",
    "Total sagas started",
    ["saga_type"]
)
saga_completed_total = Counter(
    "saga_completed_total",
    "Total sagas completed successfully",
    ["saga_type"]
)
saga_failed_total = Counter(
    "saga_failed_total",
    "Total sagas that failed after compensation",
    ["saga_type"]
)
saga_compensating_total = Counter(
    "saga_compensating_total",
    "Total sagas that entered compensation",
    ["saga_type"]
)
saga_duration_seconds = Histogram(
    "saga_duration_seconds",
    "Saga execution duration",
    ["saga_type", "outcome"],
    buckets=[1, 5, 15, 30, 60, 300, 600]
)
saga_stuck_gauge = Gauge(
    "saga_stuck_count",
    "Sagas stuck in COMPENSATING or PENDING > threshold",
    ["saga_type", "state"]
)


class InstrumentedSagaOrchestrator(SagaOrchestrator):
    """Wraps base orchestrator with Prometheus instrumentation."""

    async def start(self, data: Dict) -> Saga:
        saga_started_total.labels(saga_type=self.saga_type).inc()
        saga = await super().start(data)
        saga._start_time = time.monotonic()
        return saga

    async def _on_saga_completed(self, saga: Saga):
        duration = time.monotonic() - getattr(saga, "_start_time", 0)
        saga_completed_total.labels(saga_type=self.saga_type).inc()
        saga_duration_seconds.labels(
            saga_type=self.saga_type, outcome="completed"
        ).observe(duration)
        await super()._on_saga_completed(saga)

    async def _on_saga_failed(self, saga: Saga):
        duration = time.monotonic() - getattr(saga, "_start_time", 0)
        saga_failed_total.labels(saga_type=self.saga_type).inc()
        saga_duration_seconds.labels(
            saga_type=self.saga_type, outcome="failed"
        ).observe(duration)
        await super()._on_saga_failed(saga)

Stuck Saga Detection Query (Prometheus)

Flag sagas that have been in COMPENSATING or PENDING for more than 10 minutes:

# Alert: saga stuck in compensation for > 10 min
increase(saga_compensating_total[10m]) - increase(saga_failed_total[10m]) > 0

# Alert: saga completion rate drops below 95%
(
  rate(saga_completed_total[5m]) /
  (rate(saga_completed_total[5m]) + rate(saga_failed_total[5m]))
) < 0.95

Dead Letter Queue Recovery

When a compensation handler throws an unhandled exception the message lands on a DLQ. Implement a recovery worker that replays DLQ messages with exponential backoff:

class SagaDLQRecovery:
    """Replays failed compensation messages from the dead-letter queue."""

    MAX_RETRIES = 5
    BASE_DELAY_SECONDS = 10

    async def process_dlq_message(self, message: Dict, attempt: int):
        delay = self.BASE_DELAY_SECONDS * (2 ** attempt)
        if attempt >= self.MAX_RETRIES:
            await self._move_to_poison_queue(message)
            await self._alert_on_call(message)
            return

        await asyncio.sleep(delay)
        try:
            await self.event_publisher.publish(message["original_topic"], message["payload"])
        except Exception as e:
            await self.process_dlq_message(message, attempt + 1)

---

saga-orchestration — detailed sections

Templates

Template 1: Order Fulfillment Saga (Orchestration)

Concrete subclass of the base orchestrator. Defines four steps spanning inventory, payment, shipping, and notification. See references/advanced-patterns.md for the full abstract SagaOrchestrator base class.

from saga_orchestrator import SagaOrchestrator, SagaStep
from typing import Dict, List


class OrderFulfillmentSaga(SagaOrchestrator):
    """Orchestrates order fulfillment across four participant services."""

    @property
    def saga_type(self) -> str:
        return "OrderFulfillment"

    def define_steps(self, data: Dict) -> List[SagaStep]:
        return [
            SagaStep(
                name="reserve_inventory",
                action="InventoryService.ReserveItems",
                compensation="InventoryService.ReleaseReservation"
            ),
            SagaStep(
                name="process_payment",
                action="PaymentService.ProcessPayment",
                compensation="PaymentService.RefundPayment"
            ),
            SagaStep(
                name="create_shipment",
                action="ShippingService.CreateShipment",
                compensation="ShippingService.CancelShipment"
            ),
            SagaStep(
                name="send_confirmation",
                action="NotificationService.SendOrderConfirmation",
                compensation="NotificationService.SendCancellationNotice"
            ),
        ]


# Start a saga
async def create_order(order_data: Dict, saga_store, event_publisher):
    saga = OrderFulfillmentSaga(saga_store, event_publisher)
    return await saga.start({
        "order_id": order_data["order_id"],
        "customer_id": order_data["customer_id"],
        "items": order_data["items"],
        "payment_method": order_data["payment_method"],
        "shipping_address": order_data["shipping_address"],
    })


# Participant service — handles command and publishes reply
class InventoryService:
    async def handle_reserve_items(self, command: Dict):
        try:
            reservation = await self.reserve(command["items"], command["order_id"])
            await self.event_publisher.publish("SagaStepCompleted", {
                "saga_id": command["saga_id"],
                "step_name": "reserve_inventory",
                "result": {"reservation_id": reservation.id}
            })
        except InsufficientInventoryError as e:
            await self.event_publisher.publish("SagaStepFailed", {
                "saga_id": command["saga_id"],
                "step_name": "reserve_inventory",
                "error": str(e)
            })

    async def handle_release_reservation(self, command: Dict):
        """Compensation — idempotent, always publishes completion."""
        try:
            await self.release_reservation(
                command["original_result"]["reservation_id"]
            )
        except ReservationNotFoundError:
            pass  # Already released — treat as success
        await self.event_publisher.publish("SagaCompensationCompleted", {
            "saga_id": command["saga_id"],
            "step_name": "reserve_inventory"
        })

Template 2: Choreography-Based Saga

Each service listens for the previous service's event and reacts. No central coordinator. Compensation is triggered by failure events propagating backward.

from dataclasses import dataclass
from typing import Dict, Any


@dataclass
class SagaContext:
    """Carried through all events in a choreographed saga."""
    saga_id: str
    step: int
    data: Dict[str, Any]
    completed_steps: list


class OrderChoreographySaga:
    """Choreography-based saga — services react to each other's events."""

    def __init__(self, event_bus):
        self.event_bus = event_bus
        self._register_handlers()

    def _register_handlers(self):
        # Forward path
        self.event_bus.subscribe("OrderCreated",       self._on_order_created)
        self.event_bus.subscribe("InventoryReserved",  self._on_inventory_reserved)
        self.event_bus.subscribe("PaymentProcessed",   self._on_payment_processed)
        self.event_bus.subscribe("ShipmentCreated",    self._on_shipment_created)
        # Compensation path
        self.event_bus.subscribe("PaymentFailed",      self._on_payment_failed)
        self.event_bus.subscribe("ShipmentFailed",     self._on_shipment_failed)

    async def _on_order_created(self, event: Dict):
        await self.event_bus.publish("ReserveInventory", {
            "saga_id": event["order_id"],
            "order_id": event["order_id"],
            "items": event["items"],
        })

    async def _on_inventory_reserved(self, event: Dict):
        await self.event_bus.publish("ProcessPayment", {
            "saga_id": event["saga_id"],
            "order_id": event["order_id"],
            "amount": event["total_amount"],
            "reservation_id": event["reservation_id"],
        })

    async def _on_payment_processed(self, event: Dict):
        await self.event_bus.publish("CreateShipment", {
            "saga_id": event["saga_id"],
            "order_id": event["order_id"],
            "payment_id": event["payment_id"],
        })

    async def _on_shipment_created(self, event: Dict):
        await self.event_bus.publish("OrderFulfilled", {
            "saga_id": event["saga_id"],
            "order_id": event["order_id"],
            "tracking_number": event["tracking_number"],
        })

    # Compensation handlers
    async def _on_payment_failed(self, event: Dict):
        """Payment failed — release inventory and mark order failed."""
        await self.event_bus.publish("ReleaseInventory", {
            "saga_id": event["saga_id"],
            "reservation_id": event["reservation_id"],
        })
        await self.event_bus.publish("OrderFailed", {
            "order_id": event["order_id"],
            "reason": "Payment failed",
        })

    async def _on_shipment_failed(self, event: Dict):
        """Shipment failed — refund payment and release inventory."""
        await self.event_bus.publish("RefundPayment", {
            "saga_id": event["saga_id"],
            "payment_id": event["payment_id"],
        })
        await self.event_bus.publish("ReleaseInventory", {
            "saga_id": event["saga_id"],
            "reservation_id": event["reservation_id"],
        })

Template 3: Idempotent Step Guards

Every participant must guard against duplicate command delivery. Store an idempotency key before executing and return the cached result on replay.

async def handle_reserve_items(self, command: Dict):
    """Idempotency-guarded reservation step."""
    idempotency_key = f"reserve-{command['order_id']}"
    existing = await self.reservation_store.find_by_key(idempotency_key)
    if existing:
        # Already executed — return the previous result without side effects
        await self.event_publisher.publish("SagaStepCompleted", {
            "saga_id": command["saga_id"],
            "step_name": "reserve_inventory",
            "result": {"reservation_id": existing.id}
        })
        return

    # First execution
    reservation = await self.reserve(
        items=command["items"],
        order_id=command["order_id"],
        idempotency_key=idempotency_key
    )
    await self.event_publisher.publish("SagaStepCompleted", {
        "saga_id": command["saga_id"],
        "step_name": "reserve_inventory",
        "result": {"reservation_id": reservation.id}
    })

---

Core Concepts

Saga Pattern Types

Choreography                        Orchestration
┌─────┐  ┌─────┐  ┌─────┐         ┌─────────────┐
│Svc A│─►│Svc B│─►│Svc C│         │ Orchestrator│
└─────┘  └─────┘  └─────┘         └──────┬──────┘
   │        │        │                   │
   ▼        ▼        ▼             ┌─────┼─────┐
 Event    Event    Event           ▼     ▼     ▼
                                ┌────┐┌────┐┌────┐
Each service reacts to the      │Svc1││Svc2││Svc3│
previous service's event.       └────┘└────┘└────┘
No central coordinator.    Central coordinator sends
                           commands and tracks state.

Choose orchestration when: You need explicit step tracking, retries, and centralized visibility. Easier to debug.

Choose choreography when: You want loose coupling and services that can evolve independently. Harder to trace.

Saga Execution States

State	Description
Started	Saga initiated, first step dispatched
Pending	Waiting for a step reply from a participant
Compensating	A step failed; rolling back completed steps
Completed	All forward steps succeeded
Failed	Saga failed and all compensations have finished

Compensation Rules

Situation	Handling
Step never started	No compensation needed (skip)
Step completed successfully	Run compensation command
Step failed before completion	No compensation needed; mark failed
Compensation itself fails	Retry with backoff → DLQ → manual intervention alert
Step result no longer exists	Treat compensation as success (idempotency)

---

Related skills

TddFollow test-driven development with a strict red-green-refactor loop when creating reliable features or fixing bugs.510k185k

Test Driven DevelopmentEnforce writing failing tests before any production implementation code.176k260k

QaRun conversational QA sessions that turn user-reported bugs into well-written, domain-aware GitHub issues without manual ticket writing.164k185k

Migrate To ShoehornAutomatically update TypeScript test files that rely on unsafe `as` type assertions by replacing them with type-safe partial objects from @total-typescript/shoehorn.151k185k

Webapp TestingVerify frontend behavior, debug UI issues, capture screenshots, and inspect logs of a running local web application using Playwright.121k164k

Playwright CliRun browser automation, generate element snapshots, inspect DOM attributes, and execute Playwright tests from the terminal.96.3k12.2k

How it compares

Choose saga-orchestration for cross-service business flows; keep local database transactions when a single service owns all state.

About

Saga Orchestration by the numbers

saga-orchestration capabilities & compatibility

What saga-orchestration says it does

Add your badge

What problem does saga-orchestration solve for developers using this skill?

Who is it for?

When should I use this skill?

What you get

By the numbers

Files

Saga Orchestration

Inputs and Outputs

When to Use This Skill

Detailed section: Core Concepts

Detailed section: Templates

Best Practices

Do's

Don'ts

Troubleshooting

Saga stuck in COMPENSATING state

Duplicate saga executions on restart

Choreography saga losing events

Timeout firing before a slow-but-valid step completes

Compensation order not matching execution order

Advanced Patterns

Related Skills

Saga Orchestration — Advanced Patterns

Full Saga Orchestrator Base Class

Saga Orchestrator with Per-Step Timeouts

Detailed Compensating Transaction Chains

Production Monitoring Setup

Prometheus Metrics

Stuck Saga Detection Query (Prometheus)

Dead Letter Queue Recovery

See Also

saga-orchestration — detailed sections

Templates

Template 1: Order Fulfillment Saga (Orchestration)

Template 2: Choreography-Based Saga

Template 3: Idempotent Step Guards

Core Concepts

Saga Pattern Types

Saga Execution States

Compensation Rules

Related skills

How it compares

FAQ

What does saga-orchestration do?

When should I use saga-orchestration?

Is saga-orchestration safe to install?

This week in AI coding