Sentry Instrumentation Guide

Name: Sentry Instrumentation Guide
Author: getsentry

getsentry/sentry-for-ai

699 installs
243 repo stars
Updated July 27, 2026
getsentry/sentry-for-ai

Decide which Sentry signal to reach for when instrumenting code — error, span, span attribute, log, or metric.

About

Decide which Sentry signal to reach for when instrumenting code — error, span, span attribute, log, or metric. Use when adding instrumentation and unsure whether something should be a log vs a span vs a metric, when deciding "what to instrument where", when reviewing instrumentation for gaps, or when a coding agent needs a rule for choosing between errors, traces, logs, and metrics. This skill decides WHAT to emit; the sentry-*-sdk skills handle HOW to set each pillar up. > [All Skills](../../SKILL_TREE.md) > [Feature Setup](../sentry-feature-setup/SKILL.md) > Instrumentation Guide

> [All Skills](../../SKILL_TREE.md) > [Feature Setup](../sentry-feature-setup/SKILL.md) > Instrumentation Guide
# Sentry Instrumentation Guide: When to Reach for What
Errors, traces, logs, and metrics are the four kinds of telemetry most apps run on, and they
overlap enough that the choice is rarely obvious. You can stuff context into a span attribute
instead of logging it. You can count log lines instead of emitting a metric. You can add a

Sentry Instrumentation Guide by the numbers

699 all-time installs (skills.sh)
Ranked #206 of 1,453 DevOps & CI/CD skills by installs in the Skillselion catalog
Data as of Jul 28, 2026 (Skillselion catalog sync)

At a glance

sentry-instrumentation-guide capabilities & compatibility

Capabilities: > [all skills](../../skill_tree.md) > [feature s · # sentry instrumentation guide: when to reach fo · errors, traces, logs, and metrics are the four k · overlap enough that the choice is rarely obvious
Use cases: documentation

From the docs

What sentry-instrumentation-guide says it does

Decide which Sentry signal to reach for when instrumenting code — error, span, span attribute, log, or metric. Use when adding instrumentation and unsure whether something should be a log vs a span vs

SKILL.md

npx skills add https://github.com/getsentry/sentry-for-ai --skill sentry-instrumentation-guide

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/getsentry/sentry-for-ai/sentry-instrumentation-guide.svg)](https://skillselion.com/skills/getsentry/sentry-for-ai/sentry-instrumentation-guide)

Installs	699
repo stars	★ 243
Last updated	July 27, 2026
Repository	getsentry/sentry-for-ai ↗

How do I apply sentry-instrumentation-guide using the workflow in its SKILL.md?

Decide which Sentry signal to reach for when instrumenting code — error, span, span attribute, log, or metric. Use when adding instrumentation and unsure whether something should be a log...

Who is it for?

Developers following the sentry-instrumentation-guide skill for the tasks it documents.

Skip if: Tasks outside the sentry-instrumentation-guide scope described in SKILL.md.

When should I use this skill?

User mentions sentry-instrumentation-guide or related triggers from the skill description.

What you get

Working sentry-instrumentation-guide setup aligned with the documented patterns and constraints.

Instrumentation plan
Signal routing decisions

Files

SKILL.mdMarkdownGitHub ↗

All Skills > Feature Setup > Instrumentation Guide

Sentry Instrumentation Guide: When to Reach for What

Errors, traces, logs, and metrics are the four kinds of telemetry most apps run on, and they overlap enough that the choice is rarely obvious. You can stuff context into a span attribute instead of logging it. You can count log lines instead of emitting a metric. You can add a duration to a log and call it a span.

But each signal exists because it answers a different question and feeds a different workflow once it lands. Reaching for the wrong one means the data is technically there but useless for the job you actually have later. This skill is the decision framework: given a value or an event in front of you, which signal should carry it, and why.

It decides what to emit. For how to turn each pillar on for a given stack, hand off to the sentry-*-sdk skills and sentry-setup-ai-monitoring.

Invoke This Skill When

You're instrumenting a piece of code and unsure whether something should be a log, a span, a

span attribute, or a metric

You're deciding "what to instrument where" across a service or request handler
You're reviewing existing instrumentation for gaps (e.g. an error feed that's empty while users

report problems)

A coding agent needs a consistent rule for choosing between errors, traces, logs, and metrics

Important: The SDK APIs and code samples here are illustrative. Verify exact signatures and minimum versions against docs.sentry.io and the relevant sentry-*-sdk skill before implementing.

The Four Signals, One Question Each

Signal	The question it answers	Docs
Errors	"What just broke?" — a stack trace and exception type, grouped into a deduplicated Issue that gets assigned and tracked to resolution. If your code threw, it's an error.	Issues
Traces	"Did the request flow the way it was supposed to?" — a waterfall of timed spans. Mostly auto-instrumented.	Trace Explorer
Logs	"What was true at this point in the code, and why?" — the system's state at one moment as a structured event: config, flags, inputs/outputs, the decision that was made.	Logs
Metrics	"How's this trending over time?" — counters, gauges, distributions you can slice by attribute and chart, alert on, or compare across a deploy.	Metrics

A useful mental split: a log is one request's story (the needle), a metric is the aggregate (whether the haystack is normal), a trace is where the time went, and an error is the thing that needs a stack trace and an owner.

The Decision Table

Use this as a gut check:

What you want to know	Reach for
Something crashed, show the stack trace	Error
How long did this take? Which step is slow?	Traces / Spans
Did the request flow through the steps I expected?	Traces / Spans
What was the state when the code made this decision?	Log
What did this function receive and return?	Log
How often does X happen? Is the rate normal?	Metric
Did something change after the deploy?	Metric

Resolving the Overlaps

The same value can legitimately appear in more than one signal. These four tiebreakers cover almost every real case. (Full reasoning, gotchas, and the "why not just log everything / emit one wide event?" arguments live in `references/choosing-signals.md`.)

Span attribute or metric? Context about one request's flow that you want while reading that

trace → span attribute (it rides on the span in the waterfall). A standalone value you want to chart, alert on, or slice over time across all requests → metric. The same number can warrant both: candidate_count on the span to read one request, recommendations.served as a metric to watch the rate.

Log or span? The span is the timed node in the flow (mostly auto-instrumented, you rarely

write it). The log is the decision-point state inside that node (you always write it on purpose). Span answers where and how long; log answers what was true and why.

Log or metric? A log finds the one specific request that went wrong (the needle). A metric

tells you how many requests went wrong (the haystack). Don't derive a rate by counting log lines — emit the metric directly.

Error or log? Needs a stack trace and should be tracked as an Issue → error. An

unexpected-but-handled condition worth recording → log. Truly non-critical with a traceback → logger.warning(exc_info=True) keeps the trace in logs without creating noise in the error feed.

Sampling vs Filtering — Match Retention to the Question

Each signal's retention falls out of the question it answers:

Traces are sampled. You don't need every request to understand where time goes, so keep a

representative slice via traces_sample_rate (higher in dev, lower in production).

Errors are captured by default. No sampling to think about for the baseline.
Logs and metrics are NOT sampled. You keep every one and filter instead, with

before_send_log and before_send_metric. This is the point: the whole reason for a log is to find the one rare request that went sideways, and you can't find what you sampled away.

(For the exact sampling and filtering config in your language, see the matching SDK skill's references/tracing.md and references/metrics.md.)

Because all four signals come from one SDK, they share a trace_id and correlate on their own — every log and metric is tied to its trace, so you can drill from a metric spike straight into the samples behind it.

What Deliberate Instrumentation Looks Like

Roughly 80% of spans are auto-instrumented by your framework and database integrations — you write almost none of them. The deliberate work is the other 20%: a span attribute or two to enrich the flow, a decision-point log, and a metric, placed at the spots where your code makes a choice worth questioning later.

`references/instrumentation-examples.md` walks through a single request handler instrumented end to end, in both Python and JavaScript/TypeScript, showing the span attribute, the log, and the metric side by side on the same decision.

Handing Off to Setup

This skill tells you what to emit. To actually wire a pillar up:

Install the SDK and turn on tracing, logs, and metrics → the matching sentry-<platform>-sdk

skill (e.g. sentry-python-sdk, sentry-nextjs-sdk, sentry-node-sdk). Each has per-feature reference files for tracing, logging, metrics, and more.

Instrument LLM / agent calls → sentry-setup-ai-monitoring.

Logs and metrics are the two pillars most projects haven't turned on yet, and both are included on every plan. If they aren't enabled, route to the SDK skill first, then come back here to decide what to put where.

Choosing Signals — Deep Dive

The main SKILL.md gives the decision table and the four tiebreakers. This file is the reasoning behind them: what each signal is for, how to resolve the overlaps when the same value could go more than one place, why retention differs per signal, and the answer to "can't I just log everything / emit one wide event and derive the rest?"

Each Signal, In Depth

Errors — "What just broke?"

A stack trace and an exception type, grouped into an Issue that gets deduplicated, assigned, and tracked until it's resolved. The defining trait is the workflow: errors aren't just recorded, they become work items with an owner and a lifecycle.

Reach for it when: code threw an exception, or you have a condition serious enough that it

should halt and be tracked to resolution.

Workflow it feeds: the Issues feed — grouping, assignment, regression detection, alerting on

new/regressed issues.

Gotcha: a successful request is not an error. A query that returns zero rows succeeded. If

nothing threw, the error feed will be empty even while users are unhappy — which is exactly the case where you need the other three signals.

Traces and spans — "Did the request flow the way it was supposed to?"

Timed operations nested inside a trace, rendered as a waterfall. This is how you follow a request across services and see the DB query that dragged, the API call that timed out, the LLM tool call that took 8 seconds instead of 200ms.

Reach for it when: you want timing, or you want to confirm the request took the path you

expected.

Workflow it feeds: the trace waterfall — a structured dependency tree with timing on every

node. Critically, this is a format a coding agent can reason about directly: it can read the spans, find work that could run in parallel, and rewrite the code. Hand it the same information as a stream of log lines and it has to reconstruct the call graph from timestamps first.

Gotcha: most spans are auto-instrumented (framework + DB integrations). You rarely write one

by hand — and a clean trace can still hide a quietly wrong outcome. A span tells you the request flowed as designed; it can't tell you the design just failed this user.

Logs — "What was true at this point, and why?"

The state of the system at one specific moment, captured as a structured event: config values, feature flags, the inputs and outputs of a function, the user ID. Logs are the trail through a function's decision tree — the markers you drop where the code makes a choice, so a human or an agent can later follow the reasoning. They fill in the why once errors and traces have told you what broke and where the time went.

Reach for it when: you need to reconstruct one specific request's decisions after the fact,

especially the request from a support ticket.

Workflow it feeds: searchable structured records you can pull up by user_id, trace_id, or

any attribute.

Gotcha: logs are most valuable when they're wide — a structured event packed with context

(the flag that was on, the inputs, the outcome), not a bare one-line string.

Metrics — "How have the key parts behaved over time?"

Counters, gauges, and distributions, each kept as an individual measurement you can slice by any attribute and drill from an aggregate back into the samples (and the trace) behind it. Not just "12,000 checkouts this week," but how that splits by region and how the line moved across the last deploy.

Reach for it when: you want a rate, a trend, a threshold to alert on, or a number to chart on

a dashboard. Metrics are a historical signal as much as a right-now one.

Workflow it feeds: charts, dashboards, and alerts — and drill-down from an aggregate into the

individual samples behind it.

Gotcha: keep attribute cardinality low. High-cardinality attributes (like raw user_id)

degrade backend performance — that level of detail belongs on a log, not a metric dimension.

The Four Overlaps, In Full

Span attribute or metric?

If it's context about one request's flow through the system and you want it while reading that trace, it's a span attribute — it rides on the span in the waterfall. If it's a standalone value you want to chart, alert on, or slice over time across all requests, it's a metric.

The same number can warrant both. candidate_count as a span attribute lets you read one request; recommendations.served as a metric lets you watch the rate. One inspects a single flow, the other watches the aggregate. The rule of thumb: if a value only makes sense in the context of a specific span, it lives on the span.

Log or span?

The span is the timed node in the flow, and most are auto-instrumented, so you rarely write them. The log is the decision-point state inside that node, and you always write it on purpose. Span answers where and how long; log answers what was true and why.

Why not just attach the decision state as span attributes instead of logging it? Because traces are sampled, and the one request a customer is complaining about usually turns out to be the one that got sampled out. A span attribute is great for reading a trace you've found; it can't help you find one. Logs aren't sampled, so you can always pull up the specific request.

Log or metric?

A log is one request's story — the needle. A metric is the aggregate — the question of whether the haystack is normal. When you want to find the specific request that went wrong, that's a log. When you want to know how many requests went wrong, that's a metric.

Don't derive a rate by counting log lines: that means paying to store every line just to compute a number you could have emitted directly and cheaply as a metric. Emit the metric and the log when you need both the rate and the ability to reconstruct individual requests — same decision, two shapes.

Error or log?

If it needs a stack trace and should be tracked as an Issue, it's an error. If it's an unexpected-but-handled condition worth recording, it's a log. If it's truly non-critical but you still want the traceback, logger.warning(exc_info=True) (Python) captures the traceback into logs without creating noise in your error feed.

"Can't I Just Log Everything / Emit One Wide Event?"

There's a popular argument that four signals are overkill: emit one rich, wide event per request and derive the rest later. It's half right.

Emit wide, absolutely. The best version of any signal is a structured event packed with context (the flag that was on, the user, the inputs and outputs), not a bare number or a one-line string.

But the shape you emit is the shape you get to work with. One fat event in a columnar store charts fine after the fact, but it can't group itself into a deduplicated Issue, render itself as a waterfall, or fire a real-time alert on a threshold you haven't defined yet. Those are workflows, and each needs its data in a particular shape. The APIs reflect this: the metrics API is built for counts and measures you'll aggregate, the span API for durations and the shape of a request, the log API integrates with your structured logging library so the lines you already write become queryable events.

So emit wide — into the signal whose workflow you actually need. That's why a single decision often warrants both a metric and a log: same decision, same trace, two shapes, because watching a rate and reconstructing one request are different jobs.

Why Retention Differs Per Signal

Match the retention to the question:

Signal	Retention model	Why
Traces	Sampled (`traces_sample_rate`)	A representative slice is enough to understand where time goes, and it's cheaper. Higher rate in dev, lower in production.
Errors	Captured by default	The baseline; you want to know about every distinct crash.
Logs	Not sampled — filtered with `before_send_log`	The whole point is finding the one rare request that went sideways; you can't find what you sampled away.
Metrics	Not sampled — filtered with `before_send_metric`	Aggregates must count every event to be accurate; you drop noisy metrics by name, not by random fraction.

Because all four come from the same SDK, they share a trace_id and correlate automatically: every log and metric carries the trace it belongs to, so you can jump from a metric spike straight into the traces and logs behind it without gluing separate tools together.

A Worked Investigation (Why the Mix Matters)

A storefront serves generic recommendations to a chunk of logged-in users. No exception is thrown (every /recommendations/{user_id} returns 200), so the error feed is empty. A trace shows the request flowed exactly as designed — load user, check the ranking_v2 flag, query the new table, fall back to popular items — because returning zero rows is a perfectly successful query. A metric (recommendations.served tagged by ranking_version and outcome) reveals the v2 cohort is serving almost nothing but fallbacks and that the drop lines up with the flag rollout — scope and trigger, without opening a trace. A log, pulled up by the user_id from the ticket, finally says why: the flag is on, the source table is recommendations_v2, candidate_count is 0, outcome is fallback — the table shipped but the rows were never backfilled.

No single signal cracked it; each ruled something out. That's the case for instrumenting with the right signal at each decision point, ahead of time.

Instrumentation Examples — Span Attribute vs Log vs Metric

One request handler, instrumented end to end, showing the three deliberate signals side by side on the same decision. The handler loads a user, checks the ranking_v2 feature flag, queries a personalized-recommendations table, and falls back to popular items when the query comes back empty.

The route span and the database spans are auto-instrumented — you write none of them. What you place by hand are exactly three things:

a span attribute — context about this request's flow, read inside the trace
a decision-point log — the state at the moment the code chose personalized vs. fallback (the

only signal that records why), not sampled, so you can always find this request

a metric — the rate across all requests, sliceable by version and outcome

These examples show the shape of deliberate instrumentation, not how to set the SDK up. For

exact API signatures, the init flags that enable logs and metrics, and current minimum versions,

follow the matching SDK skill (sentry-python-sdk, sentry-nextjs-sdk, sentry-node-sdk, etc.)

and its references/tracing.md, references/logging.md, and references/metrics.md. Those are

the maintained source of truth; this guide intentionally doesn't duplicate them.

Python (FastAPI)

import sentry_sdk
from sentry_sdk import logger

# The route is auto-instrumented. FastAPI gives you the request span;
# the DB integration gives you a span for every query below. You write none of it.
@app.get("/recommendations/{user_id}")
def get_recommendations(user_id: int):
    user = db.get_user(user_id)                          # auto-instrumented db span
    use_v2 = flag_enabled("ranking_v2", user)
    ranking_version = "v2" if use_v2 else "v1"

    candidates = db.personalized_recs(user_id, version=ranking_version)  # auto db span
    outcome = "personalized" if candidates else "fallback"
    items = candidates or db.popular_items()             # auto db span on the fallback

    # SPAN ATTRIBUTE: context about THIS request's flow, read inside the trace.
    # It rides on the auto-instrumented request span; no new span needed.
    span = sentry_sdk.get_current_span()
    span.set_data("ranking_version", ranking_version)
    span.set_data("recommendation.outcome", outcome)

    # LOG: the trail through the decision tree, the state at the moment the
    # code chose personalized vs. fallback. The only signal that records *why*.
    logger.info(
        "recommendations lookup",
        attributes={
            "user_id": user_id,
            "ranking_version": ranking_version,
            "flag.ranking_v2": use_v2,
            "source_table": f"recommendations_{ranking_version}",
            "candidate_count": len(candidates),
            "outcome": outcome,
        },
    )

    # METRIC: the rate across all requests, sliceable by version and outcome.
    sentry_sdk.metrics.count(
        "recommendations.served",
        1,
        attributes={"ranking_version": ranking_version, "outcome": outcome},
    )

    return items

If you do want a sub-operation timed in the waterfall (say the ranking step, or a call to an external recommender), wrap it in a custom span with sentry_sdk.start_span (see sentry-python-sdk's references/tracing.md for the full custom-span API):

with sentry_sdk.start_span(op="rank", name="rank_candidates") as span:
    ranked = rank(candidates)
    span.set_data("candidate_count", len(candidates))

JavaScript / TypeScript (Express / Node)

The same three deliberate touches, with the Node SDK. The route and DB spans are auto-instrumented by the framework and database integrations; logs require enableLogs: true at init.

import * as Sentry from "@sentry/node";

// The route is auto-instrumented. The framework gives you the request span;
// the DB integration gives you a span for every query below. You write none of it.
app.get("/recommendations/:userId", async (req, res) => {
  const userId = Number(req.params.userId);

  const user = await db.getUser(userId);                 // auto-instrumented db span
  const useV2 = flagEnabled("ranking_v2", user);
  const rankingVersion = useV2 ? "v2" : "v1";

  const candidates = await db.personalizedRecs(userId, rankingVersion); // auto db span
  const outcome = candidates.length ? "personalized" : "fallback";
  const items = candidates.length ? candidates : await db.popularItems(); // auto db span

  // SPAN ATTRIBUTE: context about THIS request's flow, read inside the trace.
  // It rides on the auto-instrumented request span; no new span needed.
  Sentry.getActiveSpan()?.setAttributes({
    ranking_version: rankingVersion,
    "recommendation.outcome": outcome,
  });

  // LOG: the state at the moment the code chose personalized vs. fallback.
  // Requires `enableLogs: true` in Sentry.init(). The only signal that records *why*.
  Sentry.logger.info("recommendations lookup", {
    user_id: userId,
    ranking_version: rankingVersion,
    "flag.ranking_v2": useV2,
    source_table: `recommendations_${rankingVersion}`,
    candidate_count: candidates.length,
    outcome,
  });

  // METRIC: the rate across all requests, sliceable by version and outcome.
  Sentry.metrics.count("recommendations.served", 1, {
    attributes: { ranking_version: rankingVersion, outcome },
  });

  res.json(items);
});

For a sub-operation you want timed in the waterfall, wrap it in a custom span:

const ranked = await Sentry.startSpan(
  { name: "rank_candidates", op: "rank" },
  (span) => {
    span.setAttribute("candidate_count", candidates.length);
    return rank(candidates);
  },
);

Reading the Three Touches

Three deliberate touches, each carrying a piece the others can't:

Touch	What it carries	When it earns its keep
Span attribute (`ranking_version`, `outcome`)	Tags this request's flow so the path is right there when you open the trace	While reading a trace you've already found
Log (`recommendations lookup` + attributes)	What the function decided and why, at the instant it decided — never sampled	Pulling up the one request from a support ticket by `user_id`
Metric (`recommendations.served`)	The outcome counted with enough dimension to slice by version and outcome	Watching the rate, charting it, alerting when it moves after a deploy

Beyond these, the SDK fills in the rest on its own: frontend SDKs tag everything with browser, OS, and release; one setUser() call follows the user across errors, spans, logs, and metrics; and because all four come from the same SDK they share a trace_id and correlate without any extra work. See `choosing-signals.md` for how to decide which touch a given value deserves.

Related skills

Azure DeploySafely execute production deployments of already-prepared applications to Microsoft Azure.478k1.3k

Azure ValidateRun deep pre-deployment checks on Azure configuration, infrastructure definitions, RBAC roles, and managed identities before pushing to production.477k1.3k

Github Actions DocsGet precise, docs-grounded answers about GitHub Actions workflows, syntax, security, and migration instead of relying on stale knowledge.275k72

Setup Pre CommitAutomatically run Prettier, type checking, and tests on every commit via Husky and lint-staged.161k188k

Deploy To VercelSafely turn any local project into a live Vercel preview with one instruction.97.8k29.5k

Vercel Cli With TokensDeploy projects to Vercel from agents and scripts using token authentication instead of interactive browser login.73.4k29.5k

How it compares

Use sentry-instrumentation-guide when deciding signal placement; use the parent Sentry skill for SDK setup and configuration.

FAQ

What does sentry-instrumentation-guide do?