Claude Agent Skill · by Wshobson

Python Resilience

Built around the tenacity library, this gives you production-ready retry patterns with exponential backoff, jitter, and proper error classification. It shows ho

Install
Terminal · npx
$npx skills add https://github.com/wshobson/agents --skill python-resilience
Works with Paperclip

How Python Resilience fits into a Paperclip company.

Python Resilience drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.

S
SaaS FactoryPaired

Pre-configured AI company — 18 agents, 18 skills, one-time purchase.

$27$59
Explore pack
Source file
SKILL.md376 lines
Expand
---name: python-resiliencedescription: Python resilience patterns including automatic retries, exponential backoff, timeouts, and fault-tolerant decorators. Use when adding retry logic, implementing timeouts, building fault-tolerant services, or handling transient failures.--- # Python Resilience Patterns Build fault-tolerant Python applications that gracefully handle transient failures, network issues, and service outages. Resilience patterns keep systems running when dependencies are unreliable. ## When to Use This Skill - Adding retry logic to external service calls- Implementing timeouts for network operations- Building fault-tolerant microservices- Handling rate limiting and backpressure- Creating infrastructure decorators- Designing circuit breakers ## Core Concepts ### 1. Transient vs Permanent Failures Retry transient errors (network timeouts, temporary service issues). Don't retry permanent errors (invalid credentials, bad requests). ### 2. Exponential Backoff Increase wait time between retries to avoid overwhelming recovering services. ### 3. Jitter Add randomness to backoff to prevent thundering herd when many clients retry simultaneously. ### 4. Bounded Retries Cap both attempt count and total duration to prevent infinite retry loops. ## Quick Start ```pythonfrom tenacity import retry, stop_after_attempt, wait_exponential_jitter @retry(    stop=stop_after_attempt(3),    wait=wait_exponential_jitter(initial=1, max=10),)def call_external_service(request: dict) -> dict:    return httpx.post("https://api.example.com", json=request).json()``` ## Fundamental Patterns ### Pattern 1: Basic Retry with Tenacity Use the `tenacity` library for production-grade retry logic. For simpler cases, consider built-in retry functionality or a lightweight custom implementation. ```pythonfrom tenacity import (    retry,    stop_after_attempt,    stop_after_delay,    wait_exponential_jitter,    retry_if_exception_type,) TRANSIENT_ERRORS = (ConnectionError, TimeoutError, OSError) @retry(    retry=retry_if_exception_type(TRANSIENT_ERRORS),    stop=stop_after_attempt(5) | stop_after_delay(60),    wait=wait_exponential_jitter(initial=1, max=30),)def fetch_data(url: str) -> dict:    """Fetch data with automatic retry on transient failures."""    response = httpx.get(url, timeout=30)    response.raise_for_status()    return response.json()``` ### Pattern 2: Retry Only Appropriate Errors Whitelist specific transient exceptions. Never retry: - `ValueError`, `TypeError` - These are bugs, not transient issues- `AuthenticationError` - Invalid credentials won't become valid- HTTP 4xx errors (except 429) - Client errors are permanent ```pythonfrom tenacity import retry, retry_if_exception_typeimport httpx # Define what's retryableRETRYABLE_EXCEPTIONS = (    ConnectionError,    TimeoutError,    httpx.ConnectTimeout,    httpx.ReadTimeout,) @retry(    retry=retry_if_exception_type(RETRYABLE_EXCEPTIONS),    stop=stop_after_attempt(3),    wait=wait_exponential_jitter(initial=1, max=10),)def resilient_api_call(endpoint: str) -> dict:    """Make API call with retry on network issues."""    return httpx.get(endpoint, timeout=10).json()``` ### Pattern 3: HTTP Status Code Retries Retry specific HTTP status codes that indicate transient issues. ```pythonfrom tenacity import retry, retry_if_result, stop_after_attemptimport httpx RETRY_STATUS_CODES = {429, 502, 503, 504} def should_retry_response(response: httpx.Response) -> bool:    """Check if response indicates a retryable error."""    return response.status_code in RETRY_STATUS_CODES @retry(    retry=retry_if_result(should_retry_response),    stop=stop_after_attempt(3),    wait=wait_exponential_jitter(initial=1, max=10),)def http_request(method: str, url: str, **kwargs) -> httpx.Response:    """Make HTTP request with retry on transient status codes."""    return httpx.request(method, url, timeout=30, **kwargs)``` ### Pattern 4: Combined Exception and Status Retry Handle both network exceptions and HTTP status codes. ```pythonfrom tenacity import (    retry,    retry_if_exception_type,    retry_if_result,    stop_after_attempt,    wait_exponential_jitter,    before_sleep_log,)import loggingimport httpx logger = logging.getLogger(__name__) TRANSIENT_EXCEPTIONS = (    ConnectionError,    TimeoutError,    httpx.ConnectError,    httpx.ReadTimeout,)RETRY_STATUS_CODES = {429, 500, 502, 503, 504} def is_retryable_response(response: httpx.Response) -> bool:    return response.status_code in RETRY_STATUS_CODES @retry(    retry=(        retry_if_exception_type(TRANSIENT_EXCEPTIONS) |        retry_if_result(is_retryable_response)    ),    stop=stop_after_attempt(5),    wait=wait_exponential_jitter(initial=1, max=30),    before_sleep=before_sleep_log(logger, logging.WARNING),)def robust_http_call(    method: str,    url: str,    **kwargs,) -> httpx.Response:    """HTTP call with comprehensive retry handling."""    return httpx.request(method, url, timeout=30, **kwargs)``` ## Advanced Patterns ### Pattern 5: Logging Retry Attempts Track retry behavior for debugging and alerting. ```pythonfrom tenacity import retry, stop_after_attempt, wait_exponentialimport structlog logger = structlog.get_logger() def log_retry_attempt(retry_state):    """Log detailed retry information."""    exception = retry_state.outcome.exception()    logger.warning(        "Retrying operation",        attempt=retry_state.attempt_number,        exception_type=type(exception).__name__,        exception_message=str(exception),        next_wait_seconds=retry_state.next_action.sleep if retry_state.next_action else None,    ) @retry(    stop=stop_after_attempt(3),    wait=wait_exponential(multiplier=1, max=10),    before_sleep=log_retry_attempt,)def call_with_logging(request: dict) -> dict:    """External call with retry logging."""    ...``` ### Pattern 6: Timeout Decorator Create reusable timeout decorators for consistent timeout handling. ```pythonimport asynciofrom functools import wrapsfrom typing import TypeVar, Callable T = TypeVar("T") def with_timeout(seconds: float):    """Decorator to add timeout to async functions."""    def decorator(func: Callable[..., T]) -> Callable[..., T]:        @wraps(func)        async def wrapper(*args, **kwargs) -> T:            return await asyncio.wait_for(                func(*args, **kwargs),                timeout=seconds,            )        return wrapper    return decorator @with_timeout(30)async def fetch_with_timeout(url: str) -> dict:    """Fetch URL with 30 second timeout."""    async with httpx.AsyncClient() as client:        response = await client.get(url)        return response.json()``` ### Pattern 7: Cross-Cutting Concerns via Decorators Stack decorators to separate infrastructure from business logic. ```pythonfrom functools import wrapsfrom typing import TypeVar, Callableimport structlog logger = structlog.get_logger()T = TypeVar("T") def traced(name: str | None = None):    """Add tracing to function calls."""    def decorator(func: Callable[..., T]) -> Callable[..., T]:        span_name = name or func.__name__         @wraps(func)        async def wrapper(*args, **kwargs) -> T:            logger.info("Operation started", operation=span_name)            try:                result = await func(*args, **kwargs)                logger.info("Operation completed", operation=span_name)                return result            except Exception as e:                logger.error("Operation failed", operation=span_name, error=str(e))                raise        return wrapper    return decorator # Stack multiple concerns@traced("fetch_user_data")@with_timeout(30)@retry(stop=stop_after_attempt(3), wait=wait_exponential_jitter())async def fetch_user_data(user_id: str) -> dict:    """Fetch user with tracing, timeout, and retry."""    ...``` ### Pattern 8: Dependency Injection for Testability Pass infrastructure components through constructors for easy testing. ```pythonfrom dataclasses import dataclassfrom typing import Protocol class Logger(Protocol):    def info(self, msg: str, **kwargs) -> None: ...    def error(self, msg: str, **kwargs) -> None: ... class MetricsClient(Protocol):    def increment(self, metric: str, tags: dict | None = None) -> None: ...    def timing(self, metric: str, value: float) -> None: ... @dataclassclass UserService:    """Service with injected infrastructure."""     repository: UserRepository    logger: Logger    metrics: MetricsClient     async def get_user(self, user_id: str) -> User:        self.logger.info("Fetching user", user_id=user_id)        start = time.perf_counter()         try:            user = await self.repository.get(user_id)            self.metrics.increment("user.fetch.success")            return user        except Exception as e:            self.metrics.increment("user.fetch.error")            self.logger.error("Failed to fetch user", user_id=user_id, error=str(e))            raise        finally:            elapsed = time.perf_counter() - start            self.metrics.timing("user.fetch.duration", elapsed) # Easy to test with fakesservice = UserService(    repository=FakeRepository(),    logger=FakeLogger(),    metrics=FakeMetrics(),)``` ### Pattern 9: Fail-Safe Defaults Degrade gracefully when non-critical operations fail. ```pythonfrom typing import TypeVarfrom collections.abc import Callable T = TypeVar("T") def fail_safe(default: T, log_failure: bool = True):    """Return default value on failure instead of raising."""    def decorator(func: Callable[..., T]) -> Callable[..., T]:        @wraps(func)        async def wrapper(*args, **kwargs) -> T:            try:                return await func(*args, **kwargs)            except Exception as e:                if log_failure:                    logger.warning(                        "Operation failed, using default",                        function=func.__name__,                        error=str(e),                    )                return default        return wrapper    return decorator @fail_safe(default=[])async def get_recommendations(user_id: str) -> list[str]:    """Get recommendations, return empty list on failure."""    ...``` ## Best Practices Summary 1. **Retry only transient errors** - Don't retry bugs or authentication failures2. **Use exponential backoff** - Give services time to recover3. **Add jitter** - Prevent thundering herd from synchronized retries4. **Cap total duration** - `stop_after_attempt(5) | stop_after_delay(60)`5. **Log every retry** - Silent retries hide systemic problems6. **Use decorators** - Keep retry logic separate from business logic7. **Inject dependencies** - Make infrastructure testable8. **Set timeouts everywhere** - Every network call needs a timeout9. **Fail gracefully** - Return cached/default values for non-critical paths10. **Monitor retry rates** - High retry rates indicate underlying issues