
Distributed Tracing
Stand up Jaeger-style distributed tracing and instrument multi-service requests so you can debug latency in production.
Overview
distributed-tracing is an agent skill most often used in Operate (also Build integrations, Ship perf) that documents Jaeger deployment and trace/span patterns for multi-service request debugging.
Install
npx skills add https://github.com/wshobson/agents --skill distributed-tracingWhat is this skill?
- Explains trace vs span hierarchy with nested frontend, gateway, auth, user-service, and database timings
- Jaeger Operator Kubernetes install with production strategy and Elasticsearch storage options
- Docker Compose all-in-one Jaeger image with UI (16686), collector, gRPC, and Zipkin ports enumerated
- Defines context propagation, tags, and in-span logs for filterable diagnostics
- Includes kubectl apply patterns and namespace setup for observability
- Example trace tree shows 5 nested spans (frontend through database)
- Docker Compose lists 8 published Jaeger-related host ports including 16686 UI
Adoption & trust: 6.9k installs on skills.sh; 36.5k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You cannot see which service or database call dominates latency when requests cross several hops and logs are not correlated.
Who is it for?
Solo builders running multiple services on Kubernetes or Docker who need OpenTelemetry/Jaeger-style visibility without hiring an SRE team.
Skip if: Single-process monoliths with no cross-service calls, or teams that only need basic uptime pings without trace instrumentation.
When should I use this skill?
Setting up or explaining Jaeger distributed tracing, trace/span structure, and Kubernetes or Docker deployment.
What do I get? / Deliverables
You can deploy Jaeger (K8s or Compose), model traces and spans, and propagate context so production issues become searchable timelines instead of guesswork.
- Jaeger Operator or Compose deployment manifests and commands
- Trace/span instrumentation guidance with tags, logs, and context propagation
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Operate/monitoring is canonical because the skill centers on traces, spans, collectors, and production observability stacks rather than feature coding. Monitoring matches end-to-end trace visualization, span tags/logs, and collector deployment—the core deliverable for running systems.
Where it fits
Add outbound HTTP client spans and propagate trace headers between your API gateway and auth service.
Use nested span timings to find a 40ms database segment inside a 60ms user-service call before release.
Deploy Jaeger to the observability namespace and open the UI on port 16686 to trace failing customer requests.
Wire Elasticsearch as Jaeger storage in a production strategy CR instead of losing traces on pod restarts.
How it compares
Observability implementation patterns for traces—not a hosted APM product or a frontend analytics skill.
Common Questions / FAQ
Who is distributed-tracing for?
Indie developers and small teams operating API gateways, auth, and data services who need Jaeger-oriented tracing setup and mental models.
When should I use distributed-tracing?
In Operate when standing up monitoring, during Build/integrations when adding instrumentation, and in Ship/perf when investigating cross-service latency before launch.
Is distributed-tracing safe to install?
The skill includes kubectl and Docker deployment commands you must run deliberately; review the Security Audits panel on this Prism page and validate images and cluster access before applying manifests.
SKILL.md
READMESKILL.md - Distributed Tracing
# distributed-tracing — detailed patterns and worked examples ## Distributed Tracing Concepts ### Trace Structure ``` Trace (Request ID: abc123) ↓ Span (frontend) [100ms] ↓ Span (api-gateway) [80ms] ├→ Span (auth-service) [10ms] └→ Span (user-service) [60ms] └→ Span (database) [40ms] ``` ### Key Components - **Trace** - End-to-end request journey - **Span** - Single operation within a trace - **Context** - Metadata propagated between services - **Tags** - Key-value pairs for filtering - **Logs** - Timestamped events within a span ## Jaeger Setup ### Kubernetes Deployment ```bash # Deploy Jaeger Operator kubectl create namespace observability kubectl create -f https://github.com/jaegertracing/jaeger-operator/releases/download/v1.51.0/jaeger-operator.yaml -n observability # Deploy Jaeger instance kubectl apply -f - <<EOF apiVersion: jaegertracing.io/v1 kind: Jaeger metadata: name: jaeger namespace: observability spec: strategy: production storage: type: elasticsearch options: es: server-urls: http://elasticsearch:9200 ingress: enabled: true EOF ``` ### Docker Compose ```yaml version: "3.8" services: jaeger: image: jaegertracing/all-in-one:1.62 ports: - "5775:5775/udp" - "6831:6831/udp" - "6832:6832/udp" - "5778:5778" - "16686:16686" # UI - "14268:14268" # Collector - "14250:14250" # gRPC - "9411:9411" # Zipkin environment: - COLLECTOR_ZIPKIN_HOST_PORT=:9411 ``` **Reference:** See `references/jaeger-setup.md` ## Application Instrumentation ### OpenTelemetry (Recommended) #### Python (Flask) ```python from opentelemetry import trace from opentelemetry.exporter.jaeger.thrift import JaegerExporter from opentelemetry.sdk.resources import SERVICE_NAME, Resource from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import BatchSpanProcessor from opentelemetry.instrumentation.flask import FlaskInstrumentor from flask import Flask # Initialize tracer resource = Resource(attributes={SERVICE_NAME: "my-service"}) provider = TracerProvider(resource=resource) processor = BatchSpanProcessor(JaegerExporter( agent_host_name="jaeger", agent_port=6831, )) provider.add_span_processor(processor) trace.set_tracer_provider(provider) # Instrument Flask app = Flask(__name__) FlaskInstrumentor().instrument_app(app) @app.route('/api/users') def get_users(): tracer = trace.get_tracer(__name__) with tracer.start_as_current_span("get_users") as span: span.set_attribute("user.count", 100) # Business logic users = fetch_users_from_db() return {"users": users} def fetch_users_from_db(): tracer = trace.get_tracer(__name__) with tracer.start_as_current_span("database_query") as span: span.set_attribute("db.system", "postgresql") span.set_attribute("db.statement", "SELECT * FROM users") # Database query return query_database() ``` #### Node.js (Express) ```javascript const { NodeTracerProvider } = require("@opentelemetry/sdk-trace-node"); const { JaegerExporter } = require("@opentelemetry/exporter-jaeger"); const { BatchSpanProcessor } = require("@opentelemetry/sdk-trace-base"); const { registerInstrumentations } = require("@opentelemetry/instrumentation"); const { HttpInstrumentation } = require("@opentelemetry/instrumentation-http"); const { ExpressInstrumentation, } = require("@opentelemetry/instrumentation-express"); // Initialize tracer const provider = new NodeTracerProvider({ resource: { attributes: { "service.name": "my-service" } }, }); const exporter = new JaegerExporter({ endpoint: "http://jaeger:14268/api/traces", }); provider.addSpanProcessor(new BatchSpanProcessor(exporter)); provider.register(); // Instrument libraries registerInstrumentations({ instrumentations: [new HttpInstrumentation(), new ExpressInstrumentation()], }); const express = require("express"); const app = expres