Python Observability

Name: Python Observability
Author: wshobson

wshobson/agents

8.8k installs
38.3k repo stars
Updated July 22, 2026
wshobson/agents

python-observability is an agent skill that Python observability patterns including structured logging, metrics, and distributed tracing. Use when adding logging, implementing metrics collection, setting .

About

Python observability patterns including structured logging, metrics, and distributed tracing. Use when adding logging, implementing metrics collection, setting up tracing, or debugging production systems. --- name: python-observability description: Python observability patterns including structured logging, metrics, and distributed tracing. Use when adding logging, implementing metrics collection, setting up tracing, or debugging production systems. --- # Python Observability Instrument Python applications with structured logs, metrics, and traces. When something breaks in production, you need to answer "what, where, and why" without deploying new code. ## When to Use This Skill - Adding structured logging to applications - Implementing metrics collection with Prometheus - Setting up distributed tracing across services - Propagating correlation IDs through request chains - Debugging production issues - Building observability dashboards ## Core Concepts ### 1. Structured Logging Emit logs as JSON with consistent fields for production environments. Machine-readable logs enable powerful queries and alerts. For local development, consider human-readable formats.

Adding structured logging to applications
Implementing metrics collection with Prometheus
Setting up distributed tracing across services
Propagating correlation IDs through request chains
Debugging production issues

Python Observability by the numbers

8,802 all-time installs (skills.sh)
+209 installs in the week ending Jul 28, 2026 (Skillselion tracking)
Ranked #135 of 2,184 Testing & QA skills by installs in the Skillselion catalog
Security screen: HIGH risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

At a glance

python-observability capabilities & compatibility

Capabilities: adding structured logging to applications · implementing metrics collection with prometheus · setting up distributed tracing across services · propagating correlation ids through request chai · debugging production issues
Use cases: documentation

From the docs

What python-observability says it does

--- name: python-observability description: Python observability patterns including structured logging, metrics, and distributed tracing.

SKILL.md

Use when adding logging, implementing metrics collection, setting up tracing, or debugging production systems.

SKILL.md

--- # Python Observability Instrument Python applications with structured logs, metrics, and traces.

SKILL.md

When something breaks in production, you need to answer "what, where, and why" without deploying new code.

SKILL.md

npx skills add https://github.com/wshobson/agents --skill python-observability

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/wshobson/agents/python-observability.svg)](https://skillselion.com/skills/wshobson/agents/python-observability)

Installs	8.8k
repo stars	★ 38.3k
Security audit	2 / 3 scanners passed
Last updated	July 22, 2026
Repository	wshobson/agents ↗

What problem does python-observability solve for developers using this skill?

Who is it for?

Developers who need python-observability patterns described in the cached skill documentation.

Skip if: Skip when docs are empty or the task is outside the skill's documented scope.

When should I use this skill?

What you get

Actionable workflows and conventions from SKILL.md for python-observability.

Prometheus metric definitions
Labeled Counter and Histogram modules
/metrics endpoint wiring

By the numbers

Histogram defines 10 latency buckets from 0.01s through 10s
Implements four Golden Signals: latency, traffic, errors, and saturation

Files

SKILL.mdMarkdownGitHub ↗

Python Observability

Instrument Python applications with structured logs, metrics, and traces. When something breaks in production, you need to answer "what, where, and why" without deploying new code.

When to Use This Skill

Adding structured logging to applications
Implementing metrics collection with Prometheus
Setting up distributed tracing across services
Propagating correlation IDs through request chains
Debugging production issues
Building observability dashboards

Core Concepts

1. Structured Logging

Emit logs as JSON with consistent fields for production environments. Machine-readable logs enable powerful queries and alerts. For local development, consider human-readable formats.

2. The Four Golden Signals

Track latency, traffic, errors, and saturation for every service boundary.

3. Correlation IDs

Thread a unique ID through all logs and spans for a single request, enabling end-to-end tracing.

4. Bounded Cardinality

Keep metric label values bounded. Unbounded labels (like user IDs) explode storage costs.

Quick Start

import structlog

structlog.configure(
    processors=[
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.processors.JSONRenderer(),
    ],
)

logger = structlog.get_logger()
logger.info("Request processed", user_id="123", duration_ms=45)

Fundamental Patterns

Pattern 1: Structured Logging with Structlog

Configure structlog for JSON output with consistent fields.

import logging
import structlog

def configure_logging(log_level: str = "INFO") -> None:
    """Configure structured logging for the application."""
    structlog.configure(
        processors=[
            structlog.contextvars.merge_contextvars,
            structlog.processors.add_log_level,
            structlog.processors.TimeStamper(fmt="iso"),
            structlog.processors.StackInfoRenderer(),
            structlog.processors.format_exc_info,
            structlog.processors.JSONRenderer(),
        ],
        wrapper_class=structlog.make_filtering_bound_logger(
            getattr(logging, log_level.upper())
        ),
        context_class=dict,
        logger_factory=structlog.PrintLoggerFactory(),
        cache_logger_on_first_use=True,
    )

# Initialize at application startup
configure_logging("INFO")
logger = structlog.get_logger()

Pattern 2: Consistent Log Fields

Every log entry should include standard fields for filtering and correlation.

import structlog
from contextvars import ContextVar

# Store correlation ID in context
correlation_id: ContextVar[str] = ContextVar("correlation_id", default="")

logger = structlog.get_logger()

def process_request(request: Request) -> Response:
    """Process request with structured logging."""
    logger.info(
        "Request received",
        correlation_id=correlation_id.get(),
        method=request.method,
        path=request.path,
        user_id=request.user_id,
    )

    try:
        result = handle_request(request)
        logger.info(
            "Request completed",
            correlation_id=correlation_id.get(),
            status_code=200,
            duration_ms=elapsed,
        )
        return result
    except Exception as e:
        logger.error(
            "Request failed",
            correlation_id=correlation_id.get(),
            error_type=type(e).__name__,
            error_message=str(e),
        )
        raise

Pattern 3: Semantic Log Levels

Use log levels consistently across the application.

Level	Purpose	Examples
`DEBUG`	Development diagnostics	Variable values, internal state
`INFO`	Request lifecycle, operations	Request start/end, job completion
`WARNING`	Recoverable anomalies	Retry attempts, fallback used
`ERROR`	Failures needing attention	Exceptions, service unavailable

# DEBUG: Detailed internal information
logger.debug("Cache lookup", key=cache_key, hit=cache_hit)

# INFO: Normal operational events
logger.info("Order created", order_id=order.id, total=order.total)

# WARNING: Abnormal but handled situations
logger.warning(
    "Rate limit approaching",
    current_rate=950,
    limit=1000,
    reset_seconds=30,
)

# ERROR: Failures requiring investigation
logger.error(
    "Payment processing failed",
    order_id=order.id,
    error=str(e),
    payment_provider="stripe",
)

Never log expected behavior at ERROR. A user entering a wrong password is INFO, not ERROR.

Pattern 4: Correlation ID Propagation

Generate a unique ID at ingress and thread it through all operations.

from contextvars import ContextVar
import uuid
import structlog

correlation_id: ContextVar[str] = ContextVar("correlation_id", default="")

def set_correlation_id(cid: str | None = None) -> str:
    """Set correlation ID for current context."""
    cid = cid or str(uuid.uuid4())
    correlation_id.set(cid)
    structlog.contextvars.bind_contextvars(correlation_id=cid)
    return cid

# FastAPI middleware example
from fastapi import Request

async def correlation_middleware(request: Request, call_next):
    """Middleware to set and propagate correlation ID."""
    # Use incoming header or generate new
    cid = request.headers.get("X-Correlation-ID") or str(uuid.uuid4())
    set_correlation_id(cid)

    response = await call_next(request)
    response.headers["X-Correlation-ID"] = cid
    return response

Propagate to outbound requests:

import httpx

async def call_downstream_service(endpoint: str, data: dict) -> dict:
    """Call downstream service with correlation ID."""
    async with httpx.AsyncClient() as client:
        response = await client.post(
            endpoint,
            json=data,
            headers={"X-Correlation-ID": correlation_id.get()},
        )
        return response.json()

Detailed worked examples and patterns

Detailed sections (starting with ## Advanced Patterns) live in references/details.md. Read that file when the navigation summary above is insufficient.

Best Practices Summary

1. Use structured logging - JSON logs with consistent fields 2. Propagate correlation IDs - Thread through all requests and logs 3. Track the four golden signals - Latency, traffic, errors, saturation 4. Bound label cardinality - Never use unbounded values as metric labels 5. Log at appropriate levels - Don't cry wolf with ERROR 6. Include context - User ID, request ID, operation name in logs 7. Use context managers - Consistent timing and error handling 8. Separate concerns - Observability code shouldn't pollute business logic 9. Test your observability - Verify logs and metrics in integration tests 10. Set up alerts - Metrics are useless without alerting

python-observability — detailed worked examples

Advanced Patterns

Pattern 5: The Four Golden Signals with Prometheus

Track these metrics for every service boundary:

from prometheus_client import Counter, Histogram, Gauge

# Latency: How long requests take
REQUEST_LATENCY = Histogram(
    "http_request_duration_seconds",
    "Request latency in seconds",
    ["method", "endpoint", "status"],
    buckets=[0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10],
)

# Traffic: Request rate
REQUEST_COUNT = Counter(
    "http_requests_total",
    "Total HTTP requests",
    ["method", "endpoint", "status"],
)

# Errors: Error rate
ERROR_COUNT = Counter(
    "http_errors_total",
    "Total HTTP errors",
    ["method", "endpoint", "error_type"],
)

# Saturation: Resource utilization
DB_POOL_USAGE = Gauge(
    "db_connection_pool_used",
    "Number of database connections in use",
)

Instrument your endpoints:

import time
from functools import wraps

def track_request(func):
    """Decorator to track request metrics."""
    @wraps(func)
    async def wrapper(request: Request, *args, **kwargs):
        method = request.method
        endpoint = request.url.path
        start = time.perf_counter()

        try:
            response = await func(request, *args, **kwargs)
            status = str(response.status_code)
            return response
        except Exception as e:
            status = "500"
            ERROR_COUNT.labels(
                method=method,
                endpoint=endpoint,
                error_type=type(e).__name__,
            ).inc()
            raise
        finally:
            duration = time.perf_counter() - start
            REQUEST_COUNT.labels(method=method, endpoint=endpoint, status=status).inc()
            REQUEST_LATENCY.labels(method=method, endpoint=endpoint, status=status).observe(duration)

    return wrapper

Pattern 6: Bounded Cardinality

Avoid labels with unbounded values to prevent metric explosion.

# BAD: User ID has potentially millions of values
REQUEST_COUNT.labels(method="GET", user_id=user.id)  # Don't do this!

# GOOD: Bounded values only
REQUEST_COUNT.labels(method="GET", endpoint="/users", status="200")

# If you need per-user metrics, use a different approach:
# - Log the user_id and query logs
# - Use a separate analytics system
# - Bucket users by type/tier
REQUEST_COUNT.labels(
    method="GET",
    endpoint="/users",
    user_tier="premium",  # Bounded set of values
)

Pattern 7: Timed Operations with Context Manager

Create a reusable timing context manager for operations.

from contextlib import contextmanager
import time
import structlog

logger = structlog.get_logger()

@contextmanager
def timed_operation(name: str, **extra_fields):
    """Context manager for timing and logging operations."""
    start = time.perf_counter()
    logger.debug("Operation started", operation=name, **extra_fields)

    try:
        yield
    except Exception as e:
        elapsed_ms = (time.perf_counter() - start) * 1000
        logger.error(
            "Operation failed",
            operation=name,
            duration_ms=round(elapsed_ms, 2),
            error=str(e),
            **extra_fields,
        )
        raise
    else:
        elapsed_ms = (time.perf_counter() - start) * 1000
        logger.info(
            "Operation completed",
            operation=name,
            duration_ms=round(elapsed_ms, 2),
            **extra_fields,
        )

# Usage
with timed_operation("fetch_user_orders", user_id=user.id):
    orders = await order_repository.get_by_user(user.id)

Pattern 8: OpenTelemetry Tracing

Set up distributed tracing with OpenTelemetry.

Note: OpenTelemetry is actively evolving. Check the official Python documentation for the latest API patterns and best practices.

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

def configure_tracing(service_name: str, otlp_endpoint: str) -> None:
    """Configure OpenTelemetry tracing."""
    provider = TracerProvider()
    processor = BatchSpanProcessor(OTLPSpanExporter(endpoint=otlp_endpoint))
    provider.add_span_processor(processor)
    trace.set_tracer_provider(provider)

tracer = trace.get_tracer(__name__)

async def process_order(order_id: str) -> Order:
    """Process order with tracing."""
    with tracer.start_as_current_span("process_order") as span:
        span.set_attribute("order.id", order_id)

        with tracer.start_as_current_span("validate_order"):
            validate_order(order_id)

        with tracer.start_as_current_span("charge_payment"):
            charge_payment(order_id)

        with tracer.start_as_current_span("send_confirmation"):
            send_confirmation(order_id)

        return order

Related skills

TddFollow test-driven development with a strict red-green-refactor loop when creating reliable features or fixing bugs.510k185k

Test Driven DevelopmentEnforce writing failing tests before any production implementation code.176k260k

QaRun conversational QA sessions that turn user-reported bugs into well-written, domain-aware GitHub issues without manual ticket writing.164k185k

Migrate To ShoehornAutomatically update TypeScript test files that rely on unsafe `as` type assertions by replacing them with type-safe partial objects from @total-typescript/shoehorn.151k185k

Webapp TestingVerify frontend behavior, debug UI issues, capture screenshots, and inspect logs of a running local web application using Playwright.121k164k

Playwright CliRun browser automation, generate element snapshots, inspect DOM attributes, and execute Playwright tests from the terminal.96.3k12.2k

How it compares

Use python-observability for in-app Prometheus instrumentation; pair with prometheus-configuration for server-side scrape and alert YAML.

Python Observability

About

Python Observability by the numbers

python-observability capabilities & compatibility

What python-observability says it does

Add your badge

What problem does python-observability solve for developers using this skill?

Who is it for?

When should I use this skill?

What you get

By the numbers

Files

Python Observability

When to Use This Skill

Core Concepts

1. Structured Logging

2. The Four Golden Signals

3. Correlation IDs

4. Bounded Cardinality

Quick Start

Fundamental Patterns

Pattern 1: Structured Logging with Structlog

Pattern 2: Consistent Log Fields

Pattern 3: Semantic Log Levels

Pattern 4: Correlation ID Propagation

Detailed worked examples and patterns

Best Practices Summary

python-observability — detailed worked examples

Advanced Patterns

Pattern 5: The Four Golden Signals with Prometheus

Pattern 6: Bounded Cardinality

Pattern 7: Timed Operations with Context Manager

Pattern 8: OpenTelemetry Tracing

Related skills

How it compares

FAQ

What does python-observability do?

When should I use python-observability?

Is python-observability safe to install?

About

Python Observability by the numbers

python-observability capabilities & compatibility

What python-observability says it does

Add your badge

What problem does python-observability solve for developers using this skill?

Who is it for?

When should I use this skill?

What you get

By the numbers

Files

Python Observability

When to Use This Skill

Core Concepts

1. Structured Logging

2. The Four Golden Signals

3. Correlation IDs

4. Bounded Cardinality

Quick Start

Fundamental Patterns

Pattern 1: Structured Logging with Structlog

Pattern 2: Consistent Log Fields

Pattern 3: Semantic Log Levels

Pattern 4: Correlation ID Propagation

Detailed worked examples and patterns

Best Practices Summary

python-observability — detailed worked examples

Advanced Patterns

Pattern 5: The Four Golden Signals with Prometheus

Pattern 6: Bounded Cardinality

Pattern 7: Timed Operations with Context Manager

Pattern 8: OpenTelemetry Tracing

Related skills

How it compares

FAQ

What does python-observability do?

When should I use python-observability?

Is python-observability safe to install?

This week in AI coding