How Python Resilience fits into a Paperclip company.

Python Resilience drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.

SaaS FactoryPaired

Pre-configured AI company — 18 agents, 18 skills, one-time purchase.

$27$59

Explore pack

Source file

SKILL.md376 linesmarkdown

Expand

1---2name: python-resilience3description: Python resilience patterns including automatic retries, exponential backoff, timeouts, and fault-tolerant decorators. Use when adding retry logic, implementing timeouts, building fault-tolerant services, or handling transient failures.4---5 6# Python Resilience Patterns7 8Build fault-tolerant Python applications that gracefully handle transient failures, network issues, and service outages. Resilience patterns keep systems running when dependencies are unreliable.9 10## When to Use This Skill11 12- Adding retry logic to external service calls13- Implementing timeouts for network operations14- Building fault-tolerant microservices15- Handling rate limiting and backpressure16- Creating infrastructure decorators17- Designing circuit breakers18 19## Core Concepts20 21### 1. Transient vs Permanent Failures22 23Retry transient errors (network timeouts, temporary service issues). Don't retry permanent errors (invalid credentials, bad requests).24 25### 2. Exponential Backoff26 27Increase wait time between retries to avoid overwhelming recovering services.28 29### 3. Jitter30 31Add randomness to backoff to prevent thundering herd when many clients retry simultaneously.32 33### 4. Bounded Retries34 35Cap both attempt count and total duration to prevent infinite retry loops.36 37## Quick Start38 39```python40from tenacity import retry, stop_after_attempt, wait_exponential_jitter41 42@retry(43    stop=stop_after_attempt(3),44    wait=wait_exponential_jitter(initial=1, max=10),45)46def call_external_service(request: dict) -> dict:47    return httpx.post("https://api.example.com", json=request).json()48```49 50## Fundamental Patterns51 52### Pattern 1: Basic Retry with Tenacity53 54Use the `tenacity` library for production-grade retry logic. For simpler cases, consider built-in retry functionality or a lightweight custom implementation.55 56```python57from tenacity import (58    retry,59    stop_after_attempt,60    stop_after_delay,61    wait_exponential_jitter,62    retry_if_exception_type,63)64 65TRANSIENT_ERRORS = (ConnectionError, TimeoutError, OSError)66 67@retry(68    retry=retry_if_exception_type(TRANSIENT_ERRORS),69    stop=stop_after_attempt(5) | stop_after_delay(60),70    wait=wait_exponential_jitter(initial=1, max=30),71)72def fetch_data(url: str) -> dict:73    """Fetch data with automatic retry on transient failures."""74    response = httpx.get(url, timeout=30)75    response.raise_for_status()76    return response.json()77```78 79### Pattern 2: Retry Only Appropriate Errors80 81Whitelist specific transient exceptions. Never retry:82 83- `ValueError`, `TypeError` - These are bugs, not transient issues84- `AuthenticationError` - Invalid credentials won't become valid85- HTTP 4xx errors (except 429) - Client errors are permanent86 87```python88from tenacity import retry, retry_if_exception_type89import httpx90 91# Define what's retryable92RETRYABLE_EXCEPTIONS = (93    ConnectionError,94    TimeoutError,95    httpx.ConnectTimeout,96    httpx.ReadTimeout,97)98 99@retry(100    retry=retry_if_exception_type(RETRYABLE_EXCEPTIONS),101    stop=stop_after_attempt(3),102    wait=wait_exponential_jitter(initial=1, max=10),103)104def resilient_api_call(endpoint: str) -> dict:105    """Make API call with retry on network issues."""106    return httpx.get(endpoint, timeout=10).json()107```108 109### Pattern 3: HTTP Status Code Retries110 111Retry specific HTTP status codes that indicate transient issues.112 113```python114from tenacity import retry, retry_if_result, stop_after_attempt115import httpx116 117RETRY_STATUS_CODES = {429, 502, 503, 504}118 119def should_retry_response(response: httpx.Response) -> bool:120    """Check if response indicates a retryable error."""121    return response.status_code in RETRY_STATUS_CODES122 123@retry(124    retry=retry_if_result(should_retry_response),125    stop=stop_after_attempt(3),126    wait=wait_exponential_jitter(initial=1, max=10),127)128def http_request(method: str, url: str, **kwargs) -> httpx.Response:129    """Make HTTP request with retry on transient status codes."""130    return httpx.request(method, url, timeout=30, **kwargs)131```132 133### Pattern 4: Combined Exception and Status Retry134 135Handle both network exceptions and HTTP status codes.136 137```python138from tenacity import (139    retry,140    retry_if_exception_type,141    retry_if_result,142    stop_after_attempt,143    wait_exponential_jitter,144    before_sleep_log,145)146import logging147import httpx148 149logger = logging.getLogger(__name__)150 151TRANSIENT_EXCEPTIONS = (152    ConnectionError,153    TimeoutError,154    httpx.ConnectError,155    httpx.ReadTimeout,156)157RETRY_STATUS_CODES = {429, 500, 502, 503, 504}158 159def is_retryable_response(response: httpx.Response) -> bool:160    return response.status_code in RETRY_STATUS_CODES161 162@retry(163    retry=(164        retry_if_exception_type(TRANSIENT_EXCEPTIONS) |165        retry_if_result(is_retryable_response)166    ),167    stop=stop_after_attempt(5),168    wait=wait_exponential_jitter(initial=1, max=30),169    before_sleep=before_sleep_log(logger, logging.WARNING),170)171def robust_http_call(172    method: str,173    url: str,174    **kwargs,175) -> httpx.Response:176    """HTTP call with comprehensive retry handling."""177    return httpx.request(method, url, timeout=30, **kwargs)178```179 180## Advanced Patterns181 182### Pattern 5: Logging Retry Attempts183 184Track retry behavior for debugging and alerting.185 186```python187from tenacity import retry, stop_after_attempt, wait_exponential188import structlog189 190logger = structlog.get_logger()191 192def log_retry_attempt(retry_state):193    """Log detailed retry information."""194    exception = retry_state.outcome.exception()195    logger.warning(196        "Retrying operation",197        attempt=retry_state.attempt_number,198        exception_type=type(exception).__name__,199        exception_message=str(exception),200        next_wait_seconds=retry_state.next_action.sleep if retry_state.next_action else None,201    )202 203@retry(204    stop=stop_after_attempt(3),205    wait=wait_exponential(multiplier=1, max=10),206    before_sleep=log_retry_attempt,207)208def call_with_logging(request: dict) -> dict:209    """External call with retry logging."""210    ...211```212 213### Pattern 6: Timeout Decorator214 215Create reusable timeout decorators for consistent timeout handling.216 217```python218import asyncio219from functools import wraps220from typing import TypeVar, Callable221 222T = TypeVar("T")223 224def with_timeout(seconds: float):225    """Decorator to add timeout to async functions."""226    def decorator(func: Callable[..., T]) -> Callable[..., T]:227        @wraps(func)228        async def wrapper(*args, **kwargs) -> T:229            return await asyncio.wait_for(230                func(*args, **kwargs),231                timeout=seconds,232            )233        return wrapper234    return decorator235 236@with_timeout(30)237async def fetch_with_timeout(url: str) -> dict:238    """Fetch URL with 30 second timeout."""239    async with httpx.AsyncClient() as client:240        response = await client.get(url)241        return response.json()242```243 244### Pattern 7: Cross-Cutting Concerns via Decorators245 246Stack decorators to separate infrastructure from business logic.247 248```python249from functools import wraps250from typing import TypeVar, Callable251import structlog252 253logger = structlog.get_logger()254T = TypeVar("T")255 256def traced(name: str | None = None):257    """Add tracing to function calls."""258    def decorator(func: Callable[..., T]) -> Callable[..., T]:259        span_name = name or func.__name__260 261        @wraps(func)262        async def wrapper(*args, **kwargs) -> T:263            logger.info("Operation started", operation=span_name)264            try:265                result = await func(*args, **kwargs)266                logger.info("Operation completed", operation=span_name)267                return result268            except Exception as e:269                logger.error("Operation failed", operation=span_name, error=str(e))270                raise271        return wrapper272    return decorator273 274# Stack multiple concerns275@traced("fetch_user_data")276@with_timeout(30)277@retry(stop=stop_after_attempt(3), wait=wait_exponential_jitter())278async def fetch_user_data(user_id: str) -> dict:279    """Fetch user with tracing, timeout, and retry."""280    ...281```282 283### Pattern 8: Dependency Injection for Testability284 285Pass infrastructure components through constructors for easy testing.286 287```python288from dataclasses import dataclass289from typing import Protocol290 291class Logger(Protocol):292    def info(self, msg: str, **kwargs) -> None: ...293    def error(self, msg: str, **kwargs) -> None: ...294 295class MetricsClient(Protocol):296    def increment(self, metric: str, tags: dict | None = None) -> None: ...297    def timing(self, metric: str, value: float) -> None: ...298 299@dataclass300class UserService:301    """Service with injected infrastructure."""302 303    repository: UserRepository304    logger: Logger305    metrics: MetricsClient306 307    async def get_user(self, user_id: str) -> User:308        self.logger.info("Fetching user", user_id=user_id)309        start = time.perf_counter()310 311        try:312            user = await self.repository.get(user_id)313            self.metrics.increment("user.fetch.success")314            return user315        except Exception as e:316            self.metrics.increment("user.fetch.error")317            self.logger.error("Failed to fetch user", user_id=user_id, error=str(e))318            raise319        finally:320            elapsed = time.perf_counter() - start321            self.metrics.timing("user.fetch.duration", elapsed)322 323# Easy to test with fakes324service = UserService(325    repository=FakeRepository(),326    logger=FakeLogger(),327    metrics=FakeMetrics(),328)329```330 331### Pattern 9: Fail-Safe Defaults332 333Degrade gracefully when non-critical operations fail.334 335```python336from typing import TypeVar337from collections.abc import Callable338 339T = TypeVar("T")340 341def fail_safe(default: T, log_failure: bool = True):342    """Return default value on failure instead of raising."""343    def decorator(func: Callable[..., T]) -> Callable[..., T]:344        @wraps(func)345        async def wrapper(*args, **kwargs) -> T:346            try:347                return await func(*args, **kwargs)348            except Exception as e:349                if log_failure:350                    logger.warning(351                        "Operation failed, using default",352                        function=func.__name__,353                        error=str(e),354                    )355                return default356        return wrapper357    return decorator358 359@fail_safe(default=[])360async def get_recommendations(user_id: str) -> list[str]:361    """Get recommendations, return empty list on failure."""362    ...363```364 365## Best Practices Summary366 3671. **Retry only transient errors** - Don't retry bugs or authentication failures3682. **Use exponential backoff** - Give services time to recover3693. **Add jitter** - Prevent thundering herd from synchronized retries3704. **Cap total duration** - `stop_after_attempt(5) | stop_after_delay(60)`3715. **Log every retry** - Silent retries hide systemic problems3726. **Use decorators** - Keep retry logic separate from business logic3737. **Inject dependencies** - Make infrastructure testable3748. **Set timeouts everywhere** - Every network call needs a timeout3759. **Fail gracefully** - Return cached/default values for non-critical paths37610. **Monitor retry rates** - High retry rates indicate underlying issues

Related skills

Accessibility Compliance

This walks you through implementing proper WCAG 2.2 compliance with real code patterns for screen readers, keyboard navigation, and mobile accessibility. It cov

Airflow Dag Patterns

If you're building data pipelines with Airflow, this skill gives you production-ready DAG patterns that actually work in the real world. It covers TaskFlow API

Angular Migration

Migrating from AngularJS to Angular is notoriously painful, and this skill tackles the practical stuff that makes or breaks these projects. It covers hybrid app