Name: Python Observability
Author: Wshobson
Install
Terminal · npx
$npx skills add https://github.com/wshobson/agents --skill python-observability
Works with Paperclip
How Python Observability fits into a Paperclip company.

Python Observability drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.
SaaS FactoryPaired
Pre-configured AI company — 18 agents, 18 skills, one-time purchase.
$27$59
Explore pack
Source file
SKILL.md400 linesmarkdown
Expand
1---2name: python-observability3description: Python observability patterns including structured logging, metrics, and distributed tracing. Use when adding logging, implementing metrics collection, setting up tracing, or debugging production systems.4---5 6# Python Observability7 8Instrument Python applications with structured logs, metrics, and traces. When something breaks in production, you need to answer "what, where, and why" without deploying new code.9 10## When to Use This Skill11 12- Adding structured logging to applications13- Implementing metrics collection with Prometheus14- Setting up distributed tracing across services15- Propagating correlation IDs through request chains16- Debugging production issues17- Building observability dashboards18 19## Core Concepts20 21### 1. Structured Logging22 23Emit logs as JSON with consistent fields for production environments. Machine-readable logs enable powerful queries and alerts. For local development, consider human-readable formats.24 25### 2. The Four Golden Signals26 27Track latency, traffic, errors, and saturation for every service boundary.28 29### 3. Correlation IDs30 31Thread a unique ID through all logs and spans for a single request, enabling end-to-end tracing.32 33### 4. Bounded Cardinality34 35Keep metric label values bounded. Unbounded labels (like user IDs) explode storage costs.36 37## Quick Start38 39```python40import structlog41 42structlog.configure(43    processors=[44        structlog.processors.TimeStamper(fmt="iso"),45        structlog.processors.JSONRenderer(),46    ],47)48 49logger = structlog.get_logger()50logger.info("Request processed", user_id="123", duration_ms=45)51```52 53## Fundamental Patterns54 55### Pattern 1: Structured Logging with Structlog56 57Configure structlog for JSON output with consistent fields.58 59```python60import logging61import structlog62 63def configure_logging(log_level: str = "INFO") -> None:64    """Configure structured logging for the application."""65    structlog.configure(66        processors=[67            structlog.contextvars.merge_contextvars,68            structlog.processors.add_log_level,69            structlog.processors.TimeStamper(fmt="iso"),70            structlog.processors.StackInfoRenderer(),71            structlog.processors.format_exc_info,72            structlog.processors.JSONRenderer(),73        ],74        wrapper_class=structlog.make_filtering_bound_logger(75            getattr(logging, log_level.upper())76        ),77        context_class=dict,78        logger_factory=structlog.PrintLoggerFactory(),79        cache_logger_on_first_use=True,80    )81 82# Initialize at application startup83configure_logging("INFO")84logger = structlog.get_logger()85```86 87### Pattern 2: Consistent Log Fields88 89Every log entry should include standard fields for filtering and correlation.90 91```python92import structlog93from contextvars import ContextVar94 95# Store correlation ID in context96correlation_id: ContextVar[str] = ContextVar("correlation_id", default="")97 98logger = structlog.get_logger()99 100def process_request(request: Request) -> Response:101    """Process request with structured logging."""102    logger.info(103        "Request received",104        correlation_id=correlation_id.get(),105        method=request.method,106        path=request.path,107        user_id=request.user_id,108    )109 110    try:111        result = handle_request(request)112        logger.info(113            "Request completed",114            correlation_id=correlation_id.get(),115            status_code=200,116            duration_ms=elapsed,117        )118        return result119    except Exception as e:120        logger.error(121            "Request failed",122            correlation_id=correlation_id.get(),123            error_type=type(e).__name__,124            error_message=str(e),125        )126        raise127```128 129### Pattern 3: Semantic Log Levels130 131Use log levels consistently across the application.132 133| Level | Purpose | Examples |134|-------|---------|----------|135| `DEBUG` | Development diagnostics | Variable values, internal state |136| `INFO` | Request lifecycle, operations | Request start/end, job completion |137| `WARNING` | Recoverable anomalies | Retry attempts, fallback used |138| `ERROR` | Failures needing attention | Exceptions, service unavailable |139 140```python141# DEBUG: Detailed internal information142logger.debug("Cache lookup", key=cache_key, hit=cache_hit)143 144# INFO: Normal operational events145logger.info("Order created", order_id=order.id, total=order.total)146 147# WARNING: Abnormal but handled situations148logger.warning(149    "Rate limit approaching",150    current_rate=950,151    limit=1000,152    reset_seconds=30,153)154 155# ERROR: Failures requiring investigation156logger.error(157    "Payment processing failed",158    order_id=order.id,159    error=str(e),160    payment_provider="stripe",161)162```163 164Never log expected behavior at `ERROR`. A user entering a wrong password is `INFO`, not `ERROR`.165 166### Pattern 4: Correlation ID Propagation167 168Generate a unique ID at ingress and thread it through all operations.169 170```python171from contextvars import ContextVar172import uuid173import structlog174 175correlation_id: ContextVar[str] = ContextVar("correlation_id", default="")176 177def set_correlation_id(cid: str | None = None) -> str:178    """Set correlation ID for current context."""179    cid = cid or str(uuid.uuid4())180    correlation_id.set(cid)181    structlog.contextvars.bind_contextvars(correlation_id=cid)182    return cid183 184# FastAPI middleware example185from fastapi import Request186 187async def correlation_middleware(request: Request, call_next):188    """Middleware to set and propagate correlation ID."""189    # Use incoming header or generate new190    cid = request.headers.get("X-Correlation-ID") or str(uuid.uuid4())191    set_correlation_id(cid)192 193    response = await call_next(request)194    response.headers["X-Correlation-ID"] = cid195    return response196```197 198Propagate to outbound requests:199 200```python201import httpx202 203async def call_downstream_service(endpoint: str, data: dict) -> dict:204    """Call downstream service with correlation ID."""205    async with httpx.AsyncClient() as client:206        response = await client.post(207            endpoint,208            json=data,209            headers={"X-Correlation-ID": correlation_id.get()},210        )211        return response.json()212```213 214## Advanced Patterns215 216### Pattern 5: The Four Golden Signals with Prometheus217 218Track these metrics for every service boundary:219 220```python221from prometheus_client import Counter, Histogram, Gauge222 223# Latency: How long requests take224REQUEST_LATENCY = Histogram(225    "http_request_duration_seconds",226    "Request latency in seconds",227    ["method", "endpoint", "status"],228    buckets=[0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10],229)230 231# Traffic: Request rate232REQUEST_COUNT = Counter(233    "http_requests_total",234    "Total HTTP requests",235    ["method", "endpoint", "status"],236)237 238# Errors: Error rate239ERROR_COUNT = Counter(240    "http_errors_total",241    "Total HTTP errors",242    ["method", "endpoint", "error_type"],243)244 245# Saturation: Resource utilization246DB_POOL_USAGE = Gauge(247    "db_connection_pool_used",248    "Number of database connections in use",249)250```251 252Instrument your endpoints:253 254```python255import time256from functools import wraps257 258def track_request(func):259    """Decorator to track request metrics."""260    @wraps(func)261    async def wrapper(request: Request, *args, **kwargs):262        method = request.method263        endpoint = request.url.path264        start = time.perf_counter()265 266        try:267            response = await func(request, *args, **kwargs)268            status = str(response.status_code)269            return response270        except Exception as e:271            status = "500"272            ERROR_COUNT.labels(273                method=method,274                endpoint=endpoint,275                error_type=type(e).__name__,276            ).inc()277            raise278        finally:279            duration = time.perf_counter() - start280            REQUEST_COUNT.labels(method=method, endpoint=endpoint, status=status).inc()281            REQUEST_LATENCY.labels(method=method, endpoint=endpoint, status=status).observe(duration)282 283    return wrapper284```285 286### Pattern 6: Bounded Cardinality287 288Avoid labels with unbounded values to prevent metric explosion.289 290```python291# BAD: User ID has potentially millions of values292REQUEST_COUNT.labels(method="GET", user_id=user.id)  # Don't do this!293 294# GOOD: Bounded values only295REQUEST_COUNT.labels(method="GET", endpoint="/users", status="200")296 297# If you need per-user metrics, use a different approach:298# - Log the user_id and query logs299# - Use a separate analytics system300# - Bucket users by type/tier301REQUEST_COUNT.labels(302    method="GET",303    endpoint="/users",304    user_tier="premium",  # Bounded set of values305)306```307 308### Pattern 7: Timed Operations with Context Manager309 310Create a reusable timing context manager for operations.311 312```python313from contextlib import contextmanager314import time315import structlog316 317logger = structlog.get_logger()318 319@contextmanager320def timed_operation(name: str, **extra_fields):321    """Context manager for timing and logging operations."""322    start = time.perf_counter()323    logger.debug("Operation started", operation=name, **extra_fields)324 325    try:326        yield327    except Exception as e:328        elapsed_ms = (time.perf_counter() - start) * 1000329        logger.error(330            "Operation failed",331            operation=name,332            duration_ms=round(elapsed_ms, 2),333            error=str(e),334            **extra_fields,335        )336        raise337    else:338        elapsed_ms = (time.perf_counter() - start) * 1000339        logger.info(340            "Operation completed",341            operation=name,342            duration_ms=round(elapsed_ms, 2),343            **extra_fields,344        )345 346# Usage347with timed_operation("fetch_user_orders", user_id=user.id):348    orders = await order_repository.get_by_user(user.id)349```350 351### Pattern 8: OpenTelemetry Tracing352 353Set up distributed tracing with OpenTelemetry.354 355**Note:** OpenTelemetry is actively evolving. Check the [official Python documentation](https://opentelemetry.io/docs/languages/python/) for the latest API patterns and best practices.356 357```python358from opentelemetry import trace359from opentelemetry.sdk.trace import TracerProvider360from opentelemetry.sdk.trace.export import BatchSpanProcessor361from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter362 363def configure_tracing(service_name: str, otlp_endpoint: str) -> None:364    """Configure OpenTelemetry tracing."""365    provider = TracerProvider()366    processor = BatchSpanProcessor(OTLPSpanExporter(endpoint=otlp_endpoint))367    provider.add_span_processor(processor)368    trace.set_tracer_provider(provider)369 370tracer = trace.get_tracer(__name__)371 372async def process_order(order_id: str) -> Order:373    """Process order with tracing."""374    with tracer.start_as_current_span("process_order") as span:375        span.set_attribute("order.id", order_id)376 377        with tracer.start_as_current_span("validate_order"):378            validate_order(order_id)379 380        with tracer.start_as_current_span("charge_payment"):381            charge_payment(order_id)382 383        with tracer.start_as_current_span("send_confirmation"):384            send_confirmation(order_id)385 386        return order387```388 389## Best Practices Summary390 3911. **Use structured logging** - JSON logs with consistent fields3922. **Propagate correlation IDs** - Thread through all requests and logs3933. **Track the four golden signals** - Latency, traffic, errors, saturation3944. **Bound label cardinality** - Never use unbounded values as metric labels3955. **Log at appropriate levels** - Don't cry wolf with ERROR3966. **Include context** - User ID, request ID, operation name in logs3977. **Use context managers** - Consistent timing and error handling3988. **Separate concerns** - Observability code shouldn't pollute business logic3999. **Test your observability** - Verify logs and metrics in integration tests40010. **Set up alerts** - Metrics are useless without alerting
Related skills
Accessibility Compliance

This walks you through implementing proper WCAG 2.2 compliance with real code patterns for screen readers, keyboard navigation, and mobile accessibility. It cov
Airflow Dag Patterns

If you're building data pipelines with Airflow, this skill gives you production-ready DAG patterns that actually work in the real world. It covers TaskFlow API
Angular Migration

Migrating from AngularJS to Angular is notoriously painful, and this skill tackles the practical stuff that makes or breaks these projects. It covers hybrid app