
Saga Orchestration
Design and implement saga orchestration or choreography with compensating transactions when a solo builder’s microservices cannot use two-phase commit.
Overview
Saga orchestration is an agent skill most often used in Build (also Operate) that designs distributed transactions with compensating steps instead of two-phase commit across microservices.
Install
npx skills add https://github.com/wshobson/agents --skill saga-orchestrationWhat is this skill?
- Produces ordered saga steps with action commands and matching compensation commands per participant
- Supports orchestrator vs choreography choice aligned to your Kafka, RabbitMQ, or SQS stack
- Defines idempotent compensation logic and per-step timeouts from failure modes and SLAs
- Includes monitoring: state machine metrics, stuck-saga detection, and DLQ recovery patterns
- Covers debugging stuck production sagas where compensation never completes
Adoption & trust: 6.9k installs on skills.sh; 36.5k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need a multi-service business process to commit or roll back reliably, but 2PC is unavailable and a single failed step would otherwise leave inventory, payments, or bookings inconsistent.
Who is it for?
Event-driven or microservice backends where orders or bookings cross several owned services and you already know each step’s failure and retry behavior.
Skip if: Single-database monoliths where local transactions suffice, or greenfield projects with no messaging layer and no cross-service workflows yet.
When should I use this skill?
Implementing distributed transactions across microservices where 2PC is unavailable, designing compensating actions for failed multi-service order workflows, building event-driven saga coordinators for atomic rollback ac
What do I get? / Deliverables
You get an implementable saga with compensations, timeouts, and monitoring so failed steps trigger rollback and stuck states become detectable in production.
- Saga definition with action and compensation commands
- Orchestrator or choreography implementation with timeouts and monitoring hooks
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Canonical shelf is Build because the skill’s main deliverable is saga definitions, participant compensation, and coordinator code tied to service boundaries and messaging. Backend subphase fits cross-service workflows (inventory, payment, shipping) and event-driven coordination rather than frontend or pure DevOps wiring alone.
Where it fits
Model inventory reserve, charge, and ship as saga steps with compensations before writing service handlers.
Set per-step timeouts and DLQ recovery before first production traffic on a multi-service checkout.
Trace a saga stuck after payment succeeded but compensation on inventory never ran.
How it compares
Use for long-running cross-service workflows with explicit rollback—not for in-process database transactions or a generic CI deploy skill.
Common Questions / FAQ
Who is saga orchestration for?
Indie and solo builders (and small teams) implementing or fixing distributed order, payment, or booking flows across multiple services without 2PC.
When should I use saga orchestration?
During Build when defining cross-service workflows; during Ship when hardening failure handling; and during Operate when debugging sagas stuck before compensation finishes.
Is saga orchestration safe to install?
Treat it as architecture and code guidance—review the Security Audits panel on this Prism page and inspect generated coordinator and compensation code before production deploy.
SKILL.md
READMESKILL.md - Saga Orchestration
# Saga Orchestration Patterns for managing distributed transactions and long-running business processes without two-phase commit. ## Inputs and Outputs **What you provide:** - Service boundaries and ownership (which service owns which step) - Transaction requirements (which steps must be atomic, which can be eventual) - Failure modes for each step (transient vs. permanent, retry policy) - SLA requirements per step (informs timeout configuration) - Existing event/messaging infrastructure (Kafka, RabbitMQ, SQS, etc.) **What this skill produces:** - Saga definition with ordered steps, action commands, and compensation commands - Orchestrator or choreography implementation for your chosen pattern - Compensation logic for each participant service (idempotent, always-succeeds) - Step timeout configuration with per-step deadlines - Monitoring setup: state machine metrics, stuck saga detection, DLQ recovery --- ## When to Use This Skill - Coordinating multi-service transactions without distributed locks - Implementing compensating transactions for partial failures - Managing long-running business workflows (minutes to hours) - Handling failures in distributed systems where atomicity is required - Building order fulfillment, approval, or booking processes - Replacing fragile two-phase commit with async compensation --- ## Core Concepts ### Saga Pattern Types ```text Choreography Orchestration ┌─────┐ ┌─────┐ ┌─────┐ ┌─────────────┐ │Svc A│─►│Svc B│─►│Svc C│ │ Orchestrator│ └─────┘ └─────┘ └─────┘ └──────┬──────┘ │ │ │ │ ▼ ▼ ▼ ┌─────┼─────┐ Event Event Event ▼ ▼ ▼ ┌────┐┌────┐┌────┐ Each service reacts to the │Svc1││Svc2││Svc3│ previous service's event. └────┘└────┘└────┘ No central coordinator. Central coordinator sends commands and tracks state. ``` **Choose orchestration when:** You need explicit step tracking, retries, and centralized visibility. Easier to debug. **Choose choreography when:** You want loose coupling and services that can evolve independently. Harder to trace. ### Saga Execution States | State | Description | | ---------------- | ------------------------------------------------- | | **Started** | Saga initiated, first step dispatched | | **Pending** | Waiting for a step reply from a participant | | **Compensating** | A step failed; rolling back completed steps | | **Completed** | All forward steps succeeded | | **Failed** | Saga failed and all compensations have finished | ### Compensation Rules | Situation | Handling | | ------------------------------------ | ----------------------------------------------------- | | Step never started | No compensation needed (skip) | | Step completed successfully | Run compensation command | | Step failed before completion | No compensation needed; mark failed | | Compensation itself fails | Retry with backoff → DLQ → manual intervention alert | | Step result no longer exists | Treat c