Distributed Tracing

Name: Distributed Tracing
Author: wshobson

wshobson/agents

8.3k installs
38.3k repo stars
Updated July 22, 2026
wshobson/agents

distributed-tracing is an agent skill that Implement distributed tracing with Jaeger and Tempo to track requests across microservices and identify performance bottlenecks. Use when debugging microservice.

About

Implement distributed tracing with Jaeger and Tempo to track requests across microservices and identify performance bottlenecks. Use when debugging microservices, analyzing request flows, or implementing observability for distributed systems. --- name: distributed-tracing description: Implement distributed tracing with Jaeger and Tempo to track requests across microservices and identify performance bottlenecks. Use when debugging microservices, analyzing request flows, or implementing observability for distributed systems. --- # Distributed Tracing Implement distributed tracing with Jaeger and Tempo for request flow visibility across microservices. ## Purpose Track requests across distributed systems to understand latency, dependencies, and failure points. ## When to Use - Debug latency issues - Understand service dependencies - Identify bottlenecks - Trace error propagation - Analyze request paths ## Detailed patterns and worked examples Detailed pattern documentation lives in `references/details.md`. Read that file when the navigation tier above is insufficient. **Sample appropriately** (1-10% in production) 2. **Add meaningful tags** (user_id, request_id) 3.

Understand service dependencies
Trace error propagation
Analyze request paths
**Sample appropriately** (1-10% in production)
**Add meaningful tags** (user_id, request_id)

Distributed Tracing by the numbers

8,331 all-time installs (skills.sh)
+161 installs in the week ending Jul 28, 2026 (Skillselion tracking)
Ranked #112 of 4,386 Backend & APIs skills by installs in the Skillselion catalog
Security screen: MEDIUM risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

At a glance

distributed-tracing capabilities & compatibility

Capabilities: understand service dependencies · trace error propagation · analyze request paths · **sample appropriately** (1 10% in production) · **add meaningful tags** (user_id, request_id)
Use cases: documentation

From the docs

What distributed-tracing says it does

--- name: distributed-tracing description: Implement distributed tracing with Jaeger and Tempo to track requests across microservices and identify performance bottlenecks.

SKILL.md

Use when debugging microservices, analyzing request flows, or implementing observability for distributed systems.

SKILL.md

--- # Distributed Tracing Implement distributed tracing with Jaeger and Tempo for request flow visibility across microservices.

SKILL.md

## Purpose Track requests across distributed systems to understand latency, dependencies, and failure points.

SKILL.md

npx skills add https://github.com/wshobson/agents --skill distributed-tracing

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/wshobson/agents/distributed-tracing.svg)](https://skillselion.com/skills/wshobson/agents/distributed-tracing)

Installs	8.3k
repo stars	★ 38.3k
Security audit	2 / 3 scanners passed
Last updated	July 22, 2026
Repository	wshobson/agents ↗

What problem does distributed-tracing solve for developers using this skill?

Who is it for?

Developers who need distributed-tracing patterns described in the cached skill documentation.

Skip if: Skip when docs are empty or the task is outside the skill's documented scope.

When should I use this skill?

What you get

Actionable workflows and conventions from SKILL.md for distributed-tracing.

Jaeger deployment configuration
Span instrumentation patterns

By the numbers

Example trace hierarchy spans frontend, API gateway, auth-service, user-service, and database layers
Includes Jaeger Operator Kubernetes deployment workflow with observability namespace setup

Files

SKILL.mdMarkdownGitHub ↗

Distributed Tracing

Implement distributed tracing with Jaeger and Tempo for request flow visibility across microservices.

Purpose

Track requests across distributed systems to understand latency, dependencies, and failure points.

When to Use

Debug latency issues
Understand service dependencies
Identify bottlenecks
Trace error propagation
Analyze request paths

Detailed patterns and worked examples

Detailed pattern documentation lives in references/details.md. Read that file when the navigation tier above is insufficient.

Best Practices

1. Sample appropriately (1-10% in production) 2. Add meaningful tags (user_id, request_id) 3. Propagate context across all service boundaries 4. Log exceptions in spans 5. Use consistent naming for operations 6. Monitor tracing overhead (<1% CPU impact) 7. Set up alerts for trace errors 8. Implement distributed context (baggage) 9. Use span events for important milestones 10. Document instrumentation standards

Integration with Logging

Correlated Logs

import logging
from opentelemetry import trace

logger = logging.getLogger(__name__)

def process_request():
    span = trace.get_current_span()
    trace_id = span.get_span_context().trace_id

    logger.info(
        "Processing request",
        extra={"trace_id": format(trace_id, '032x')}
    )

Troubleshooting

No traces appearing:

Check collector endpoint
Verify network connectivity
Check sampling configuration
Review application logs

High latency overhead:

Reduce sampling rate
Use batch span processor
Check exporter configuration

Related Skills

prometheus-configuration - For metrics
grafana-dashboards - For visualization
slo-implementation - For latency SLOs

distributed-tracing — detailed patterns and worked examples

Distributed Tracing Concepts

Trace Structure

Trace (Request ID: abc123)
  ↓
Span (frontend) [100ms]
  ↓
Span (api-gateway) [80ms]
  ├→ Span (auth-service) [10ms]
  └→ Span (user-service) [60ms]
      └→ Span (database) [40ms]

Key Components

Trace - End-to-end request journey
Span - Single operation within a trace
Context - Metadata propagated between services
Tags - Key-value pairs for filtering
Logs - Timestamped events within a span

Jaeger Setup

Kubernetes Deployment

# Deploy Jaeger Operator
kubectl create namespace observability
kubectl create -f https://github.com/jaegertracing/jaeger-operator/releases/download/v1.51.0/jaeger-operator.yaml -n observability

# Deploy Jaeger instance
kubectl apply -f - <<EOF
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: jaeger
  namespace: observability
spec:
  strategy: production
  storage:
    type: elasticsearch
    options:
      es:
        server-urls: http://elasticsearch:9200
  ingress:
    enabled: true
EOF

Docker Compose

version: "3.8"
services:
  jaeger:
    image: jaegertracing/all-in-one:1.62
    ports:
      - "5775:5775/udp"
      - "6831:6831/udp"
      - "6832:6832/udp"
      - "5778:5778"
      - "16686:16686" # UI
      - "14268:14268" # Collector
      - "14250:14250" # gRPC
      - "9411:9411" # Zipkin
    environment:
      - COLLECTOR_ZIPKIN_HOST_PORT=:9411

Reference: See references/jaeger-setup.md

Application Instrumentation

OpenTelemetry (Recommended)

Python (Flask)

from opentelemetry import trace
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from flask import Flask

# Initialize tracer
resource = Resource(attributes={SERVICE_NAME: "my-service"})
provider = TracerProvider(resource=resource)
processor = BatchSpanProcessor(JaegerExporter(
    agent_host_name="jaeger",
    agent_port=6831,
))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)

# Instrument Flask
app = Flask(__name__)
FlaskInstrumentor().instrument_app(app)

@app.route('/api/users')
def get_users():
    tracer = trace.get_tracer(__name__)

    with tracer.start_as_current_span("get_users") as span:
        span.set_attribute("user.count", 100)
        # Business logic
        users = fetch_users_from_db()
        return {"users": users}

def fetch_users_from_db():
    tracer = trace.get_tracer(__name__)

    with tracer.start_as_current_span("database_query") as span:
        span.set_attribute("db.system", "postgresql")
        span.set_attribute("db.statement", "SELECT * FROM users")
        # Database query
        return query_database()

Node.js (Express)

const { NodeTracerProvider } = require("@opentelemetry/sdk-trace-node");
const { JaegerExporter } = require("@opentelemetry/exporter-jaeger");
const { BatchSpanProcessor } = require("@opentelemetry/sdk-trace-base");
const { registerInstrumentations } = require("@opentelemetry/instrumentation");
const { HttpInstrumentation } = require("@opentelemetry/instrumentation-http");
const {
  ExpressInstrumentation,
} = require("@opentelemetry/instrumentation-express");

// Initialize tracer
const provider = new NodeTracerProvider({
  resource: { attributes: { "service.name": "my-service" } },
});

const exporter = new JaegerExporter({
  endpoint: "http://jaeger:14268/api/traces",
});

provider.addSpanProcessor(new BatchSpanProcessor(exporter));
provider.register();

// Instrument libraries
registerInstrumentations({
  instrumentations: [new HttpInstrumentation(), new ExpressInstrumentation()],
});

const express = require("express");
const app = express();

app.get("/api/users", async (req, res) => {
  const tracer = trace.getTracer("my-service");
  const span = tracer.startSpan("get_users");

  try {
    const users = await fetchUsers();
    span.setAttributes({ "user.count": users.length });
    res.json({ users });
  } finally {
    span.end();
  }
});

Go

package main

import (
    "context"
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/exporters/jaeger"
    "go.opentelemetry.io/otel/sdk/resource"
    sdktrace "go.opentelemetry.io/otel/sdk/trace"
    semconv "go.opentelemetry.io/otel/semconv/v1.4.0"
)

func initTracer() (*sdktrace.TracerProvider, error) {
    exporter, err := jaeger.New(jaeger.WithCollectorEndpoint(
        jaeger.WithEndpoint("http://jaeger:14268/api/traces"),
    ))
    if err != nil {
        return nil, err
    }

    tp := sdktrace.NewTracerProvider(
        sdktrace.WithBatcher(exporter),
        sdktrace.WithResource(resource.NewWithAttributes(
            semconv.SchemaURL,
            semconv.ServiceNameKey.String("my-service"),
        )),
    )

    otel.SetTracerProvider(tp)
    return tp, nil
}

func getUsers(ctx context.Context) ([]User, error) {
    tracer := otel.Tracer("my-service")
    ctx, span := tracer.Start(ctx, "get_users")
    defer span.End()

    span.SetAttributes(attribute.String("user.filter", "active"))

    users, err := fetchUsersFromDB(ctx)
    if err != nil {
        span.RecordError(err)
        return nil, err
    }

    span.SetAttributes(attribute.Int("user.count", len(users)))
    return users, nil
}

Reference: See references/instrumentation.md

Context Propagation

HTTP Headers

traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01
tracestate: congo=t61rcWkgMzE

Propagation in HTTP Requests

Python

from opentelemetry.propagate import inject

headers = {}
inject(headers)  # Injects trace context

response = requests.get('http://downstream-service/api', headers=headers)

Node.js

const { propagation } = require("@opentelemetry/api");

const headers = {};
propagation.inject(context.active(), headers);

axios.get("http://downstream-service/api", { headers });

Tempo Setup (Grafana)

Kubernetes Deployment

apiVersion: v1
kind: ConfigMap
metadata:
  name: tempo-config
data:
  tempo.yaml: |
    server:
      http_listen_port: 3200

    distributor:
      receivers:
        jaeger:
          protocols:
            thrift_http:
            grpc:
        otlp:
          protocols:
            http:
            grpc:

    storage:
      trace:
        backend: s3
        s3:
          bucket: tempo-traces
          endpoint: s3.amazonaws.com

    querier:
      frontend_worker:
        frontend_address: tempo-query-frontend:9095
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: tempo
spec:
  replicas: 1
  template:
    spec:
      containers:
        - name: tempo
          image: grafana/tempo:2.7
          args:
            - -config.file=/etc/tempo/tempo.yaml
          volumeMounts:
            - name: config
              mountPath: /etc/tempo
      volumes:
        - name: config
          configMap:
            name: tempo-config

Reference: See assets/jaeger-config.yaml.template

Sampling Strategies

Probabilistic Sampling

# Sample 1% of traces
sampler:
  type: probabilistic
  param: 0.01

Rate Limiting Sampling

# Sample max 100 traces per second
sampler:
  type: ratelimiting
  param: 100

Adaptive Sampling

from opentelemetry.sdk.trace.sampling import ParentBased, TraceIdRatioBased

# Sample based on trace ID (deterministic)
sampler = ParentBased(root=TraceIdRatioBased(0.01))

Trace Analysis

Finding Slow Requests

Jaeger Query:

service=my-service
duration > 1s

Finding Errors

Jaeger Query:

service=my-service
error=true
tags.http.status_code >= 500

Service Dependency Graph

Jaeger automatically generates service dependency graphs showing:

Service relationships
Request rates
Error rates
Average latencies

Related skills

Lark Openapi ExplorerInstantly explore, test, and generate calls against the full Lark (Feishu) OpenAPI surface without leaving their agent workflow.471k

Lark EventConsume real-time events from Lark/Feishu as structured NDJSON streams inside AI agent workflows.382k15.8k

Lark Openapi ExplorerWhen an existing Lark/Feishu skill or CLI command cannot fulfill a specific requirement and they need to discover and invoke the exact native OpenAPI endpoint.381k15.8k

Just ScrapeQuickly search, crawl, extract structured JSON, or monitor web pages without writing custom scraping code.245k37

Lark AppsQuery the current visibility and permission scope of a Lark (Feishu) app without writing HTTP client code.230k15.8k

SupabaseGet accurate, up-to-date Supabase implementation guidance across database, auth, realtime, storage, edge functions and vector search without relying on outd182k2.4k

How it compares

Reach for distributed-tracing when requests cross service boundaries; use metrics-only monitoring when failures are local to one process.

About

Distributed Tracing by the numbers

distributed-tracing capabilities & compatibility

What distributed-tracing says it does

Add your badge

What problem does distributed-tracing solve for developers using this skill?

Who is it for?

When should I use this skill?

What you get

By the numbers

Files

Distributed Tracing

Purpose

When to Use

Detailed patterns and worked examples

Best Practices

Integration with Logging

Correlated Logs

Troubleshooting

Related Skills

distributed-tracing — detailed patterns and worked examples

Distributed Tracing Concepts

Trace Structure

Key Components

Jaeger Setup

Kubernetes Deployment

Docker Compose

Application Instrumentation

OpenTelemetry (Recommended)

Python (Flask)

Node.js (Express)

Go

Context Propagation

HTTP Headers

Propagation in HTTP Requests

Python

Node.js

Tempo Setup (Grafana)

Kubernetes Deployment

Sampling Strategies

Probabilistic Sampling

Rate Limiting Sampling

Adaptive Sampling

Trace Analysis

Finding Slow Requests

Finding Errors

Service Dependency Graph

Related skills

How it compares

FAQ

What does distributed-tracing do?

When should I use distributed-tracing?

Is distributed-tracing safe to install?

This week in AI coding