
Golang Observability
Instrument Go services with slog, Prometheus, OpenTelemetry, profiling, and dashboards so production behavior is measurable before users complain.
Overview
Golang observability is an agent skill most often used in Operate (also Build integrations, Ship perf) that instruments Go services with logs, metrics, traces, profiling, and dashboards for everyday production monitoring
Install
npx skills add https://github.com/samber/cc-skills-golang --skill golang-observabilityWhat is this skill?
- Structured logging with slog and migration guidance from zap, logrus, and zerolog
- Prometheus metrics plus OpenTelemetry distributed tracing and log-trace correlation
- Continuous profiling with pprof and Pyroscope for production Go services
- Server-side RUM event tracking with GDPR/CCPA-aware CDP patterns
- Alerting and Grafana dashboard patterns for everyday observability
- Skill metadata version 1.2.1
- Explicitly defers deep performance investigation to golang-benchmark and golang-performance skills
Adoption & trust: 3.7k installs on skills.sh; 2k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
Your Go service ships without correlated logs, metrics, or traces, so you cannot tell whether slowness is code, infra, or bad deploys.
Who is it for?
Solo builders shipping Go microservices or APIs who want standard observability baked in before scaling traffic.
Skip if: One-off flamegraph deep dives or micro-benchmark tuning sessions—use golang-benchmark and golang-performance skills instead.
When should I use this skill?
Instrumenting Go services for production monitoring, setting up metrics or alerting, adding OpenTelemetry tracing, correlating logs with traces, migrating legacy loggers to slog, or implementing GDPR/CCPA-compliant CDP t
What do I get? / Deliverables
You get a consistent slog + Prometheus + OpenTelemetry + profiling stack with alerting and dashboard guidance so incidents are diagnosable from signals, not guesswork.
- Structured slog logging with correlation fields
- Prometheus metric instrumentation
- OpenTelemetry trace wiring and dashboard or alert guidance
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Operate monitoring is the canonical shelf because the skill targets always-on production signals, alerting, and correlation—not one-off feature coding. monitoring matches metrics, traces, logs, RUM-style server events, Grafana, and alert design called out in the skill description.
Where it fits
Add RED metrics and trace spans when implementing a new HTTP handler before merge.
Enable pprof or Pyroscope hooks and baseline dashboards ahead of a traffic-heavy launch.
Tune alert thresholds and log-trace correlation after intermittent 5xx spikes.
Instrument error paths with structured slog fields so on-call can filter by trace_id.
How it compares
Skill package for holistic Go instrumentation, not a hosted APM product or a single-purpose linter rule.
Common Questions / FAQ
Who is golang-observability for?
Go developers using Claude Code or similar agents who own production services and need logging, metrics, tracing, and profiling in one workflow.
When should I use golang-observability?
In Build when adding observability to new handlers, in Ship when validating launch monitoring, and in Operate when fixing alerts, dashboards, or GDPR-aware event tracking.
Is golang-observability safe to install?
It may edit Go source, run go and golangci-lint, and fetch docs—review the Security Audits panel on this page and restrict bash in untrusted repos.
SKILL.md
READMESKILL.md - Golang Observability
**Persona:** You are a Go observability engineer. You treat every unobserved production system as a liability — instrument proactively, correlate signals to diagnose, and never consider a feature done until it is observable. **Modes:** - **Coding / instrumentation** (default): Add observability to new or existing code — declare metrics, add spans, set up structured logging, wire pprof toggles. Follow the sequential instrumentation guide. - **Review mode** — reviewing a PR's instrumentation changes. Check that new code exports the expected signals (metrics declared, spans opened and closed, structured log fields consistent). Sequential. - **Audit mode** — auditing existing observability coverage across a codebase. Launch up to 5 parallel sub-agents — one per signal (metrics, logging, tracing, profiling, RUM) — to check coverage simultaneously. > **Community default.** A company skill that explicitly supersedes `samber/cc-skills-golang@golang-observability` skill takes precedence. # Go Observability Best Practices Observability is the ability to understand a system's internal state from its external outputs. In Go services, this means five complementary signals: **logs**, **metrics**, **traces**, **profiles**, and **RUM**. Each answers different questions, and together they give you full visibility into both system behavior and user experience. When using observability libraries (Prometheus client, OpenTelemetry SDK, vendor integrations), refer to the library's official documentation and code examples for current API signatures. ## Best Practices Summary 1. **Use structured logging** with `log/slog` — production services MUST emit structured logs (JSON), not freeform strings 2. **Choose the right log level** — Debug for development, Info for normal operations, Warn for degraded states, Error for failures requiring attention 3. **Log with context** — use `slog.InfoContext(ctx, ...)` to correlate logs with traces 4. **Prefer Histogram over Summary** for latency metrics — Histograms support server-side aggregation and percentile queries. Every HTTP endpoint MUST have latency and error rate metrics. 5. **Keep label cardinality low** in Prometheus — NEVER use unbounded values (user IDs, full URLs) as label values 6. **Track percentiles** (P50, P90, P99, P99.9) using Histograms + `histogram_quantile()` in PromQL 7. **Set up OpenTelemetry tracing on new projects** — configure the TracerProvider early, then add spans everywhere 8. **Add spans to every meaningful operation** — service methods, DB queries, external API calls, message queue operations 9. **Propagate context everywhere** — context is the vehicle that carries trace_id, span_id, and deadlines across service boundaries 10. **Enable p