npx skills add https://github.com/wshobson/agents --skill distributed-tracingHow Distributed Tracing fits into a Paperclip company.
Distributed Tracing drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.
Pre-configured AI company — 18 agents, 18 skills, one-time purchase.
SKILL.md449 linesExpandCollapse
---name: distributed-tracingdescription: Implement distributed tracing with Jaeger and Tempo to track requests across microservices and identify performance bottlenecks. Use when debugging microservices, analyzing request flows, or implementing observability for distributed systems.--- # Distributed Tracing Implement distributed tracing with Jaeger and Tempo for request flow visibility across microservices. ## Purpose Track requests across distributed systems to understand latency, dependencies, and failure points. ## When to Use - Debug latency issues- Understand service dependencies- Identify bottlenecks- Trace error propagation- Analyze request paths ## Distributed Tracing Concepts ### Trace Structure ```Trace (Request ID: abc123) ↓Span (frontend) [100ms] ↓Span (api-gateway) [80ms] ├→ Span (auth-service) [10ms] └→ Span (user-service) [60ms] └→ Span (database) [40ms]``` ### Key Components - **Trace** - End-to-end request journey- **Span** - Single operation within a trace- **Context** - Metadata propagated between services- **Tags** - Key-value pairs for filtering- **Logs** - Timestamped events within a span ## Jaeger Setup ### Kubernetes Deployment ```bash# Deploy Jaeger Operatorkubectl create namespace observabilitykubectl create -f https://github.com/jaegertracing/jaeger-operator/releases/download/v1.51.0/jaeger-operator.yaml -n observability # Deploy Jaeger instancekubectl apply -f - <<EOFapiVersion: jaegertracing.io/v1kind: Jaegermetadata: name: jaeger namespace: observabilityspec: strategy: production storage: type: elasticsearch options: es: server-urls: http://elasticsearch:9200 ingress: enabled: trueEOF``` ### Docker Compose ```yamlversion: "3.8"services: jaeger: image: jaegertracing/all-in-one:latest ports: - "5775:5775/udp" - "6831:6831/udp" - "6832:6832/udp" - "5778:5778" - "16686:16686" # UI - "14268:14268" # Collector - "14250:14250" # gRPC - "9411:9411" # Zipkin environment: - COLLECTOR_ZIPKIN_HOST_PORT=:9411``` **Reference:** See `references/jaeger-setup.md` ## Application Instrumentation ### OpenTelemetry (Recommended) #### Python (Flask) ```pythonfrom opentelemetry import tracefrom opentelemetry.exporter.jaeger.thrift import JaegerExporterfrom opentelemetry.sdk.resources import SERVICE_NAME, Resourcefrom opentelemetry.sdk.trace import TracerProviderfrom opentelemetry.sdk.trace.export import BatchSpanProcessorfrom opentelemetry.instrumentation.flask import FlaskInstrumentorfrom flask import Flask # Initialize tracerresource = Resource(attributes={SERVICE_NAME: "my-service"})provider = TracerProvider(resource=resource)processor = BatchSpanProcessor(JaegerExporter( agent_host_name="jaeger", agent_port=6831,))provider.add_span_processor(processor)trace.set_tracer_provider(provider) # Instrument Flaskapp = Flask(__name__)FlaskInstrumentor().instrument_app(app) @app.route('/api/users')def get_users(): tracer = trace.get_tracer(__name__) with tracer.start_as_current_span("get_users") as span: span.set_attribute("user.count", 100) # Business logic users = fetch_users_from_db() return {"users": users} def fetch_users_from_db(): tracer = trace.get_tracer(__name__) with tracer.start_as_current_span("database_query") as span: span.set_attribute("db.system", "postgresql") span.set_attribute("db.statement", "SELECT * FROM users") # Database query return query_database()``` #### Node.js (Express) ```javascriptconst { NodeTracerProvider } = require("@opentelemetry/sdk-trace-node");const { JaegerExporter } = require("@opentelemetry/exporter-jaeger");const { BatchSpanProcessor } = require("@opentelemetry/sdk-trace-base");const { registerInstrumentations } = require("@opentelemetry/instrumentation");const { HttpInstrumentation } = require("@opentelemetry/instrumentation-http");const { ExpressInstrumentation,} = require("@opentelemetry/instrumentation-express"); // Initialize tracerconst provider = new NodeTracerProvider({ resource: { attributes: { "service.name": "my-service" } },}); const exporter = new JaegerExporter({ endpoint: "http://jaeger:14268/api/traces",}); provider.addSpanProcessor(new BatchSpanProcessor(exporter));provider.register(); // Instrument librariesregisterInstrumentations({ instrumentations: [new HttpInstrumentation(), new ExpressInstrumentation()],}); const express = require("express");const app = express(); app.get("/api/users", async (req, res) => { const tracer = trace.getTracer("my-service"); const span = tracer.startSpan("get_users"); try { const users = await fetchUsers(); span.setAttributes({ "user.count": users.length }); res.json({ users }); } finally { span.end(); }});``` #### Go ```gopackage main import ( "context" "go.opentelemetry.io/otel" "go.opentelemetry.io/otel/exporters/jaeger" "go.opentelemetry.io/otel/sdk/resource" sdktrace "go.opentelemetry.io/otel/sdk/trace" semconv "go.opentelemetry.io/otel/semconv/v1.4.0") func initTracer() (*sdktrace.TracerProvider, error) { exporter, err := jaeger.New(jaeger.WithCollectorEndpoint( jaeger.WithEndpoint("http://jaeger:14268/api/traces"), )) if err != nil { return nil, err } tp := sdktrace.NewTracerProvider( sdktrace.WithBatcher(exporter), sdktrace.WithResource(resource.NewWithAttributes( semconv.SchemaURL, semconv.ServiceNameKey.String("my-service"), )), ) otel.SetTracerProvider(tp) return tp, nil} func getUsers(ctx context.Context) ([]User, error) { tracer := otel.Tracer("my-service") ctx, span := tracer.Start(ctx, "get_users") defer span.End() span.SetAttributes(attribute.String("user.filter", "active")) users, err := fetchUsersFromDB(ctx) if err != nil { span.RecordError(err) return nil, err } span.SetAttributes(attribute.Int("user.count", len(users))) return users, nil}``` **Reference:** See `references/instrumentation.md` ## Context Propagation ### HTTP Headers ```traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01tracestate: congo=t61rcWkgMzE``` ### Propagation in HTTP Requests #### Python ```pythonfrom opentelemetry.propagate import inject headers = {}inject(headers) # Injects trace context response = requests.get('http://downstream-service/api', headers=headers)``` #### Node.js ```javascriptconst { propagation } = require("@opentelemetry/api"); const headers = {};propagation.inject(context.active(), headers); axios.get("http://downstream-service/api", { headers });``` ## Tempo Setup (Grafana) ### Kubernetes Deployment ```yamlapiVersion: v1kind: ConfigMapmetadata: name: tempo-configdata: tempo.yaml: | server: http_listen_port: 3200 distributor: receivers: jaeger: protocols: thrift_http: grpc: otlp: protocols: http: grpc: storage: trace: backend: s3 s3: bucket: tempo-traces endpoint: s3.amazonaws.com querier: frontend_worker: frontend_address: tempo-query-frontend:9095---apiVersion: apps/v1kind: Deploymentmetadata: name: tempospec: replicas: 1 template: spec: containers: - name: tempo image: grafana/tempo:latest args: - -config.file=/etc/tempo/tempo.yaml volumeMounts: - name: config mountPath: /etc/tempo volumes: - name: config configMap: name: tempo-config``` **Reference:** See `assets/jaeger-config.yaml.template` ## Sampling Strategies ### Probabilistic Sampling ```yaml# Sample 1% of tracessampler: type: probabilistic param: 0.01``` ### Rate Limiting Sampling ```yaml# Sample max 100 traces per secondsampler: type: ratelimiting param: 100``` ### Adaptive Sampling ```pythonfrom opentelemetry.sdk.trace.sampling import ParentBased, TraceIdRatioBased # Sample based on trace ID (deterministic)sampler = ParentBased(root=TraceIdRatioBased(0.01))``` ## Trace Analysis ### Finding Slow Requests **Jaeger Query:** ```service=my-serviceduration > 1s``` ### Finding Errors **Jaeger Query:** ```service=my-serviceerror=truetags.http.status_code >= 500``` ### Service Dependency Graph Jaeger automatically generates service dependency graphs showing: - Service relationships- Request rates- Error rates- Average latencies ## Best Practices 1. **Sample appropriately** (1-10% in production)2. **Add meaningful tags** (user_id, request_id)3. **Propagate context** across all service boundaries4. **Log exceptions** in spans5. **Use consistent naming** for operations6. **Monitor tracing overhead** (<1% CPU impact)7. **Set up alerts** for trace errors8. **Implement distributed context** (baggage)9. **Use span events** for important milestones10. **Document instrumentation** standards ## Integration with Logging ### Correlated Logs ```pythonimport loggingfrom opentelemetry import trace logger = logging.getLogger(__name__) def process_request(): span = trace.get_current_span() trace_id = span.get_span_context().trace_id logger.info( "Processing request", extra={"trace_id": format(trace_id, '032x')} )``` ## Troubleshooting **No traces appearing:** - Check collector endpoint- Verify network connectivity- Check sampling configuration- Review application logs **High latency overhead:** - Reduce sampling rate- Use batch span processor- Check exporter configuration ## Related Skills - `prometheus-configuration` - For metrics- `grafana-dashboards` - For visualization- `slo-implementation` - For latency SLOsAccessibility Compliance
This walks you through implementing proper WCAG 2.2 compliance with real code patterns for screen readers, keyboard navigation, and mobile accessibility. It cov
Airflow Dag Patterns
If you're building data pipelines with Airflow, this skill gives you production-ready DAG patterns that actually work in the real world. It covers TaskFlow API
Angular Migration
Migrating from AngularJS to Angular is notoriously painful, and this skill tackles the practical stuff that makes or breaks these projects. It covers hybrid app