How Distributed Tracing fits into a Paperclip company.

Distributed Tracing drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.

SaaS FactoryPaired

Pre-configured AI company — 18 agents, 18 skills, one-time purchase.

$27$59

Explore pack

Source file

SKILL.md449 linesmarkdown

Expand

1---2name: distributed-tracing3description: Implement distributed tracing with Jaeger and Tempo to track requests across microservices and identify performance bottlenecks. Use when debugging microservices, analyzing request flows, or implementing observability for distributed systems.4---5 6# Distributed Tracing7 8Implement distributed tracing with Jaeger and Tempo for request flow visibility across microservices.9 10## Purpose11 12Track requests across distributed systems to understand latency, dependencies, and failure points.13 14## When to Use15 16- Debug latency issues17- Understand service dependencies18- Identify bottlenecks19- Trace error propagation20- Analyze request paths21 22## Distributed Tracing Concepts23 24### Trace Structure25 26```27Trace (Request ID: abc123)28  ↓29Span (frontend) [100ms]30  ↓31Span (api-gateway) [80ms]32  ├→ Span (auth-service) [10ms]33  └→ Span (user-service) [60ms]34      └→ Span (database) [40ms]35```36 37### Key Components38 39- **Trace** - End-to-end request journey40- **Span** - Single operation within a trace41- **Context** - Metadata propagated between services42- **Tags** - Key-value pairs for filtering43- **Logs** - Timestamped events within a span44 45## Jaeger Setup46 47### Kubernetes Deployment48 49```bash50# Deploy Jaeger Operator51kubectl create namespace observability52kubectl create -f https://github.com/jaegertracing/jaeger-operator/releases/download/v1.51.0/jaeger-operator.yaml -n observability53 54# Deploy Jaeger instance55kubectl apply -f - <<EOF56apiVersion: jaegertracing.io/v157kind: Jaeger58metadata:59  name: jaeger60  namespace: observability61spec:62  strategy: production63  storage:64    type: elasticsearch65    options:66      es:67        server-urls: http://elasticsearch:920068  ingress:69    enabled: true70EOF71```72 73### Docker Compose74 75```yaml76version: "3.8"77services:78  jaeger:79    image: jaegertracing/all-in-one:latest80    ports:81      - "5775:5775/udp"82      - "6831:6831/udp"83      - "6832:6832/udp"84      - "5778:5778"85      - "16686:16686" # UI86      - "14268:14268" # Collector87      - "14250:14250" # gRPC88      - "9411:9411" # Zipkin89    environment:90      - COLLECTOR_ZIPKIN_HOST_PORT=:941191```92 93**Reference:** See `references/jaeger-setup.md`94 95## Application Instrumentation96 97### OpenTelemetry (Recommended)98 99#### Python (Flask)100 101```python102from opentelemetry import trace103from opentelemetry.exporter.jaeger.thrift import JaegerExporter104from opentelemetry.sdk.resources import SERVICE_NAME, Resource105from opentelemetry.sdk.trace import TracerProvider106from opentelemetry.sdk.trace.export import BatchSpanProcessor107from opentelemetry.instrumentation.flask import FlaskInstrumentor108from flask import Flask109 110# Initialize tracer111resource = Resource(attributes={SERVICE_NAME: "my-service"})112provider = TracerProvider(resource=resource)113processor = BatchSpanProcessor(JaegerExporter(114    agent_host_name="jaeger",115    agent_port=6831,116))117provider.add_span_processor(processor)118trace.set_tracer_provider(provider)119 120# Instrument Flask121app = Flask(__name__)122FlaskInstrumentor().instrument_app(app)123 124@app.route('/api/users')125def get_users():126    tracer = trace.get_tracer(__name__)127 128    with tracer.start_as_current_span("get_users") as span:129        span.set_attribute("user.count", 100)130        # Business logic131        users = fetch_users_from_db()132        return {"users": users}133 134def fetch_users_from_db():135    tracer = trace.get_tracer(__name__)136 137    with tracer.start_as_current_span("database_query") as span:138        span.set_attribute("db.system", "postgresql")139        span.set_attribute("db.statement", "SELECT * FROM users")140        # Database query141        return query_database()142```143 144#### Node.js (Express)145 146```javascript147const { NodeTracerProvider } = require("@opentelemetry/sdk-trace-node");148const { JaegerExporter } = require("@opentelemetry/exporter-jaeger");149const { BatchSpanProcessor } = require("@opentelemetry/sdk-trace-base");150const { registerInstrumentations } = require("@opentelemetry/instrumentation");151const { HttpInstrumentation } = require("@opentelemetry/instrumentation-http");152const {153  ExpressInstrumentation,154} = require("@opentelemetry/instrumentation-express");155 156// Initialize tracer157const provider = new NodeTracerProvider({158  resource: { attributes: { "service.name": "my-service" } },159});160 161const exporter = new JaegerExporter({162  endpoint: "http://jaeger:14268/api/traces",163});164 165provider.addSpanProcessor(new BatchSpanProcessor(exporter));166provider.register();167 168// Instrument libraries169registerInstrumentations({170  instrumentations: [new HttpInstrumentation(), new ExpressInstrumentation()],171});172 173const express = require("express");174const app = express();175 176app.get("/api/users", async (req, res) => {177  const tracer = trace.getTracer("my-service");178  const span = tracer.startSpan("get_users");179 180  try {181    const users = await fetchUsers();182    span.setAttributes({ "user.count": users.length });183    res.json({ users });184  } finally {185    span.end();186  }187});188```189 190#### Go191 192```go193package main194 195import (196    "context"197    "go.opentelemetry.io/otel"198    "go.opentelemetry.io/otel/exporters/jaeger"199    "go.opentelemetry.io/otel/sdk/resource"200    sdktrace "go.opentelemetry.io/otel/sdk/trace"201    semconv "go.opentelemetry.io/otel/semconv/v1.4.0"202)203 204func initTracer() (*sdktrace.TracerProvider, error) {205    exporter, err := jaeger.New(jaeger.WithCollectorEndpoint(206        jaeger.WithEndpoint("http://jaeger:14268/api/traces"),207    ))208    if err != nil {209        return nil, err210    }211 212    tp := sdktrace.NewTracerProvider(213        sdktrace.WithBatcher(exporter),214        sdktrace.WithResource(resource.NewWithAttributes(215            semconv.SchemaURL,216            semconv.ServiceNameKey.String("my-service"),217        )),218    )219 220    otel.SetTracerProvider(tp)221    return tp, nil222}223 224func getUsers(ctx context.Context) ([]User, error) {225    tracer := otel.Tracer("my-service")226    ctx, span := tracer.Start(ctx, "get_users")227    defer span.End()228 229    span.SetAttributes(attribute.String("user.filter", "active"))230 231    users, err := fetchUsersFromDB(ctx)232    if err != nil {233        span.RecordError(err)234        return nil, err235    }236 237    span.SetAttributes(attribute.Int("user.count", len(users)))238    return users, nil239}240```241 242**Reference:** See `references/instrumentation.md`243 244## Context Propagation245 246### HTTP Headers247 248```249traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01250tracestate: congo=t61rcWkgMzE251```252 253### Propagation in HTTP Requests254 255#### Python256 257```python258from opentelemetry.propagate import inject259 260headers = {}261inject(headers)  # Injects trace context262 263response = requests.get('http://downstream-service/api', headers=headers)264```265 266#### Node.js267 268```javascript269const { propagation } = require("@opentelemetry/api");270 271const headers = {};272propagation.inject(context.active(), headers);273 274axios.get("http://downstream-service/api", { headers });275```276 277## Tempo Setup (Grafana)278 279### Kubernetes Deployment280 281```yaml282apiVersion: v1283kind: ConfigMap284metadata:285  name: tempo-config286data:287  tempo.yaml: |288    server:289      http_listen_port: 3200290 291    distributor:292      receivers:293        jaeger:294          protocols:295            thrift_http:296            grpc:297        otlp:298          protocols:299            http:300            grpc:301 302    storage:303      trace:304        backend: s3305        s3:306          bucket: tempo-traces307          endpoint: s3.amazonaws.com308 309    querier:310      frontend_worker:311        frontend_address: tempo-query-frontend:9095312---313apiVersion: apps/v1314kind: Deployment315metadata:316  name: tempo317spec:318  replicas: 1319  template:320    spec:321      containers:322        - name: tempo323          image: grafana/tempo:latest324          args:325            - -config.file=/etc/tempo/tempo.yaml326          volumeMounts:327            - name: config328              mountPath: /etc/tempo329      volumes:330        - name: config331          configMap:332            name: tempo-config333```334 335**Reference:** See `assets/jaeger-config.yaml.template`336 337## Sampling Strategies338 339### Probabilistic Sampling340 341```yaml342# Sample 1% of traces343sampler:344  type: probabilistic345  param: 0.01346```347 348### Rate Limiting Sampling349 350```yaml351# Sample max 100 traces per second352sampler:353  type: ratelimiting354  param: 100355```356 357### Adaptive Sampling358 359```python360from opentelemetry.sdk.trace.sampling import ParentBased, TraceIdRatioBased361 362# Sample based on trace ID (deterministic)363sampler = ParentBased(root=TraceIdRatioBased(0.01))364```365 366## Trace Analysis367 368### Finding Slow Requests369 370**Jaeger Query:**371 372```373service=my-service374duration > 1s375```376 377### Finding Errors378 379**Jaeger Query:**380 381```382service=my-service383error=true384tags.http.status_code >= 500385```386 387### Service Dependency Graph388 389Jaeger automatically generates service dependency graphs showing:390 391- Service relationships392- Request rates393- Error rates394- Average latencies395 396## Best Practices397 3981. **Sample appropriately** (1-10% in production)3992. **Add meaningful tags** (user_id, request_id)4003. **Propagate context** across all service boundaries4014. **Log exceptions** in spans4025. **Use consistent naming** for operations4036. **Monitor tracing overhead** (<1% CPU impact)4047. **Set up alerts** for trace errors4058. **Implement distributed context** (baggage)4069. **Use span events** for important milestones40710. **Document instrumentation** standards408 409## Integration with Logging410 411### Correlated Logs412 413```python414import logging415from opentelemetry import trace416 417logger = logging.getLogger(__name__)418 419def process_request():420    span = trace.get_current_span()421    trace_id = span.get_span_context().trace_id422 423    logger.info(424        "Processing request",425        extra={"trace_id": format(trace_id, '032x')}426    )427```428 429## Troubleshooting430 431**No traces appearing:**432 433- Check collector endpoint434- Verify network connectivity435- Check sampling configuration436- Review application logs437 438**High latency overhead:**439 440- Reduce sampling rate441- Use batch span processor442- Check exporter configuration443 444 445## Related Skills446 447- `prometheus-configuration` - For metrics448- `grafana-dashboards` - For visualization449- `slo-implementation` - For latency SLOs

Related skills

Accessibility Compliance

This walks you through implementing proper WCAG 2.2 compliance with real code patterns for screen readers, keyboard navigation, and mobile accessibility. It cov

Airflow Dag Patterns

If you're building data pipelines with Airflow, this skill gives you production-ready DAG patterns that actually work in the real world. It covers TaskFlow API

Angular Migration

Migrating from AngularJS to Angular is notoriously painful, and this skill tackles the practical stuff that makes or breaks these projects. It covers hybrid app