How Saga Orchestration fits into a Paperclip company.

Saga Orchestration drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.
SaaS FactoryPaired
Pre-configured AI company — 18 agents, 18 skills, one-time purchase.
$27$59
Explore pack
Source file
SKILL.md361 linesmarkdown
Expand
1---2name: saga-orchestration3description: Implement saga patterns for distributed transactions and cross-aggregate workflows. Use this skill when implementing distributed transactions across microservices where 2PC is unavailable, designing compensating actions for failed order workflows that span inventory, payment, and shipping services, building event-driven saga coordinators for travel booking systems that must roll back hotel, flight, and car rental reservations atomically, or debugging stuck saga states in production where compensation steps never complete.4---5 6# Saga Orchestration7 8Patterns for managing distributed transactions and long-running business processes without two-phase commit.9 10## Inputs and Outputs11 12**What you provide:**13- Service boundaries and ownership (which service owns which step)14- Transaction requirements (which steps must be atomic, which can be eventual)15- Failure modes for each step (transient vs. permanent, retry policy)16- SLA requirements per step (informs timeout configuration)17- Existing event/messaging infrastructure (Kafka, RabbitMQ, SQS, etc.)18 19**What this skill produces:**20- Saga definition with ordered steps, action commands, and compensation commands21- Orchestrator or choreography implementation for your chosen pattern22- Compensation logic for each participant service (idempotent, always-succeeds)23- Step timeout configuration with per-step deadlines24- Monitoring setup: state machine metrics, stuck saga detection, DLQ recovery25 26---27 28## When to Use This Skill29 30- Coordinating multi-service transactions without distributed locks31- Implementing compensating transactions for partial failures32- Managing long-running business workflows (minutes to hours)33- Handling failures in distributed systems where atomicity is required34- Building order fulfillment, approval, or booking processes35- Replacing fragile two-phase commit with async compensation36 37---38 39## Core Concepts40 41### Saga Pattern Types42 43```text44Choreography                        Orchestration45┌─────┐  ┌─────┐  ┌─────┐         ┌─────────────┐46│Svc A│─►│Svc B│─►│Svc C│         │ Orchestrator│47└─────┘  └─────┘  └─────┘         └──────┬──────┘48   │        │        │                   │49   ▼        ▼        ▼             ┌─────┼─────┐50 Event    Event    Event           ▼     ▼     ▼51                                ┌────┐┌────┐┌────┐52Each service reacts to the      │Svc1││Svc2││Svc3│53previous service's event.       └────┘└────┘└────┘54No central coordinator.    Central coordinator sends55                           commands and tracks state.56```57 58**Choose orchestration when:** You need explicit step tracking, retries, and centralized visibility. Easier to debug.59 60**Choose choreography when:** You want loose coupling and services that can evolve independently. Harder to trace.61 62### Saga Execution States63 64| State            | Description                                       |65| ---------------- | ------------------------------------------------- |66| **Started**      | Saga initiated, first step dispatched             |67| **Pending**      | Waiting for a step reply from a participant       |68| **Compensating** | A step failed; rolling back completed steps       |69| **Completed**    | All forward steps succeeded                       |70| **Failed**       | Saga failed and all compensations have finished   |71 72### Compensation Rules73 74| Situation                            | Handling                                              |75| ------------------------------------ | ----------------------------------------------------- |76| Step never started                   | No compensation needed (skip)                         |77| Step completed successfully          | Run compensation command                              |78| Step failed before completion        | No compensation needed; mark failed                   |79| Compensation itself fails            | Retry with backoff → DLQ → manual intervention alert  |80| Step result no longer exists         | Treat compensation as success (idempotency)           |81 82---83 84## Templates85 86### Template 1: Order Fulfillment Saga (Orchestration)87 88Concrete subclass of the base orchestrator. Defines four steps spanning inventory, payment, shipping, and notification. See `references/advanced-patterns.md` for the full abstract `SagaOrchestrator` base class.89 90```python91from saga_orchestrator import SagaOrchestrator, SagaStep92from typing import Dict, List93 94 95class OrderFulfillmentSaga(SagaOrchestrator):96    """Orchestrates order fulfillment across four participant services."""97 98    @property99    def saga_type(self) -> str:100        return "OrderFulfillment"101 102    def define_steps(self, data: Dict) -> List[SagaStep]:103        return [104            SagaStep(105                name="reserve_inventory",106                action="InventoryService.ReserveItems",107                compensation="InventoryService.ReleaseReservation"108            ),109            SagaStep(110                name="process_payment",111                action="PaymentService.ProcessPayment",112                compensation="PaymentService.RefundPayment"113            ),114            SagaStep(115                name="create_shipment",116                action="ShippingService.CreateShipment",117                compensation="ShippingService.CancelShipment"118            ),119            SagaStep(120                name="send_confirmation",121                action="NotificationService.SendOrderConfirmation",122                compensation="NotificationService.SendCancellationNotice"123            ),124        ]125 126 127# Start a saga128async def create_order(order_data: Dict, saga_store, event_publisher):129    saga = OrderFulfillmentSaga(saga_store, event_publisher)130    return await saga.start({131        "order_id": order_data["order_id"],132        "customer_id": order_data["customer_id"],133        "items": order_data["items"],134        "payment_method": order_data["payment_method"],135        "shipping_address": order_data["shipping_address"],136    })137 138 139# Participant service — handles command and publishes reply140class InventoryService:141    async def handle_reserve_items(self, command: Dict):142        try:143            reservation = await self.reserve(command["items"], command["order_id"])144            await self.event_publisher.publish("SagaStepCompleted", {145                "saga_id": command["saga_id"],146                "step_name": "reserve_inventory",147                "result": {"reservation_id": reservation.id}148            })149        except InsufficientInventoryError as e:150            await self.event_publisher.publish("SagaStepFailed", {151                "saga_id": command["saga_id"],152                "step_name": "reserve_inventory",153                "error": str(e)154            })155 156    async def handle_release_reservation(self, command: Dict):157        """Compensation — idempotent, always publishes completion."""158        try:159            await self.release_reservation(160                command["original_result"]["reservation_id"]161            )162        except ReservationNotFoundError:163            pass  # Already released — treat as success164        await self.event_publisher.publish("SagaCompensationCompleted", {165            "saga_id": command["saga_id"],166            "step_name": "reserve_inventory"167        })168```169 170### Template 2: Choreography-Based Saga171 172Each service listens for the previous service's event and reacts. No central coordinator. Compensation is triggered by failure events propagating backward.173 174```python175from dataclasses import dataclass176from typing import Dict, Any177 178 179@dataclass180class SagaContext:181    """Carried through all events in a choreographed saga."""182    saga_id: str183    step: int184    data: Dict[str, Any]185    completed_steps: list186 187 188class OrderChoreographySaga:189    """Choreography-based saga — services react to each other's events."""190 191    def __init__(self, event_bus):192        self.event_bus = event_bus193        self._register_handlers()194 195    def _register_handlers(self):196        # Forward path197        self.event_bus.subscribe("OrderCreated",       self._on_order_created)198        self.event_bus.subscribe("InventoryReserved",  self._on_inventory_reserved)199        self.event_bus.subscribe("PaymentProcessed",   self._on_payment_processed)200        self.event_bus.subscribe("ShipmentCreated",    self._on_shipment_created)201        # Compensation path202        self.event_bus.subscribe("PaymentFailed",      self._on_payment_failed)203        self.event_bus.subscribe("ShipmentFailed",     self._on_shipment_failed)204 205    async def _on_order_created(self, event: Dict):206        await self.event_bus.publish("ReserveInventory", {207            "saga_id": event["order_id"],208            "order_id": event["order_id"],209            "items": event["items"],210        })211 212    async def _on_inventory_reserved(self, event: Dict):213        await self.event_bus.publish("ProcessPayment", {214            "saga_id": event["saga_id"],215            "order_id": event["order_id"],216            "amount": event["total_amount"],217            "reservation_id": event["reservation_id"],218        })219 220    async def _on_payment_processed(self, event: Dict):221        await self.event_bus.publish("CreateShipment", {222            "saga_id": event["saga_id"],223            "order_id": event["order_id"],224            "payment_id": event["payment_id"],225        })226 227    async def _on_shipment_created(self, event: Dict):228        await self.event_bus.publish("OrderFulfilled", {229            "saga_id": event["saga_id"],230            "order_id": event["order_id"],231            "tracking_number": event["tracking_number"],232        })233 234    # Compensation handlers235    async def _on_payment_failed(self, event: Dict):236        """Payment failed — release inventory and mark order failed."""237        await self.event_bus.publish("ReleaseInventory", {238            "saga_id": event["saga_id"],239            "reservation_id": event["reservation_id"],240        })241        await self.event_bus.publish("OrderFailed", {242            "order_id": event["order_id"],243            "reason": "Payment failed",244        })245 246    async def _on_shipment_failed(self, event: Dict):247        """Shipment failed — refund payment and release inventory."""248        await self.event_bus.publish("RefundPayment", {249            "saga_id": event["saga_id"],250            "payment_id": event["payment_id"],251        })252        await self.event_bus.publish("ReleaseInventory", {253            "saga_id": event["saga_id"],254            "reservation_id": event["reservation_id"],255        })256```257 258### Template 3: Idempotent Step Guards259 260Every participant must guard against duplicate command delivery. Store an idempotency key before executing and return the cached result on replay.261 262```python263async def handle_reserve_items(self, command: Dict):264    """Idempotency-guarded reservation step."""265    idempotency_key = f"reserve-{command['order_id']}"266    existing = await self.reservation_store.find_by_key(idempotency_key)267    if existing:268        # Already executed — return the previous result without side effects269        await self.event_publisher.publish("SagaStepCompleted", {270            "saga_id": command["saga_id"],271            "step_name": "reserve_inventory",272            "result": {"reservation_id": existing.id}273        })274        return275 276    # First execution277    reservation = await self.reserve(278        items=command["items"],279        order_id=command["order_id"],280        idempotency_key=idempotency_key281    )282    await self.event_publisher.publish("SagaStepCompleted", {283        "saga_id": command["saga_id"],284        "step_name": "reserve_inventory",285        "result": {"reservation_id": reservation.id}286    })287```288 289---290 291## Best Practices292 293### Do's294 295- **Make every step idempotent** — Commands may be replayed on broker reconnect296- **Design compensations carefully** — They are the most critical code path297- **Use correlation IDs** — The `saga_id` must flow through every event and log298- **Implement per-step timeouts** — Never wait indefinitely for a participant reply299- **Log state transitions** — `saga_id`, `step_name`, `old_state → new_state` on every change300- **Test compensation paths explicitly** — Inject failures at each step index in integration tests301 302### Don'ts303 304- **Don't assume instant completion** — Sagas are async and may take minutes305- **Don't skip compensation testing** — The rollback path is the hardest to get right306- **Don't couple services directly** — Use async messaging, never synchronous calls inside a saga step307- **Don't ignore partial failures** — A step that partially executed still needs compensation308- **Don't use a global timeout** — Each step has different latency characteristics309 310---311 312## Troubleshooting313 314### Saga stuck in COMPENSATING state315 316A saga enters compensation but never reaches FAILED. This means a compensation handler is throwing an unhandled exception and never publishing `SagaCompensationCompleted`. Add dead-letter queue (DLQ) handling to compensation consumers and ensure every compensation action publishes a result event even when the underlying operation was already rolled back.317 318```python319async def handle_release_reservation(self, command: Dict):320    try:321        await self.release_reservation(command["original_result"]["reservation_id"])322    except ReservationNotFoundError:323        pass  # Already released — treat as success324    # Always publish completion, regardless of outcome325    await self.event_publisher.publish("SagaCompensationCompleted", {326        "saga_id": command["saga_id"],327        "step_name": "reserve_inventory"328    })329```330 331### Duplicate saga executions on restart332 333If your orchestrator service restarts mid-saga, it may replay events and re-execute already-completed steps. Guard every step action with an idempotency key — see **Template 3** above.334 335### Choreography saga losing events336 337In a choreography-based saga, a downstream service may miss an event if it was offline when published. Use a durable message broker (Kafka with replication, RabbitMQ with persistence) and store the current saga state in a dedicated `saga_log` table so you can replay from the last known good step.338 339### Timeout firing before a slow-but-valid step completes340 341A step like `create_shipment` might take up to 15 minutes during peak load but your global timeout is 5 minutes, causing spurious compensation. Make step timeouts configurable per step type — see `references/advanced-patterns.md` for the `TimeoutSagaOrchestrator` implementation and the `STEP_TIMEOUTS` dict pattern.342 343### Compensation order not matching execution order344 345When two steps both complete before a failure is detected, compensation must run in strict reverse order or you leave data in an inconsistent state. Verify that `_compensate()` iterates from `current_step - 1` down to `0`, and add an integration test that deliberately fails at each step index to confirm correct rollback order.346 347---348 349## Advanced Patterns350 351The `references/` directory contains production-grade implementations not needed for most sagas:352 353- **`references/advanced-patterns.md`** — Full `SagaOrchestrator` abstract base class, `TimeoutSagaOrchestrator` with per-step deadlines, detailed bank transfer compensating transaction chain, Prometheus instrumentation, stuck saga PromQL alerts, and DLQ recovery worker.354 355---356 357## Related Skills358 359- `cqrs-implementation` — Pair sagas with CQRS for read-model updates after each step completes360- `event-store-design` — Store saga events in an event store for full audit trail and replay capability361- `workflow-orchestration-patterns` — Higher-level workflow engines (Temporal, Conductor) that build on saga concepts
Related skills
Accessibility Compliance

This walks you through implementing proper WCAG 2.2 compliance with real code patterns for screen readers, keyboard navigation, and mobile accessibility. It cov
Airflow Dag Patterns

If you're building data pipelines with Airflow, this skill gives you production-ready DAG patterns that actually work in the real world. It covers TaskFlow API
Angular Migration

Migrating from AngularJS to Angular is notoriously painful, and this skill tackles the practical stuff that makes or breaks these projects. It covers hybrid app