
Aiconfig Ai Metrics
Wire LaunchDarkly AI Config metrics (duration, tokens, success/error, TTFT) around existing LLM calls using the lowest tier of the official four-tier instrumentation ladder.
Overview
AI Config AI Metrics is an agent skill most often used in Build (also Operate, Grow) that instruments LLM calls with LaunchDarkly AI Config using the official four-tier metrics ladder.
Install
npx skills add https://github.com/launchdarkly/agent-skills --skill aiconfig-ai-metricsWhat is this skill?
- Four-tier ladder: managed runner → provider package → custom extractor + trackMetricsOf → raw manual
- Targets Monitoring-tab metrics: duration, input/output tokens, success/error, TTFT when streaming
- Defaults to the highest tier that fits the call shape to avoid drift and missed metrics
- Requires LaunchDarkly server-side AI SDK ≥0.20.0 (Python or Node) and an existing AI Config
- Apache-2.0 experimental skill from LaunchDarkly author metadata
- Four-tier instrumentation ladder (managed runner through raw manual)
- Requires launchdarkly-server-sdk-ai or @launchdarkly/server-sdk-ai >= 0.20.0
Adoption & trust: 1.2k installs on skills.sh; 16 GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
Your AI features run in production but LaunchDarkly Monitoring lacks duration, token, and error signals because instrumentation tier choice was guessed or left manual.
Who is it for?
Codebases with an existing AI Config and Python or Node LLM calls that need SDK-aligned metrics without rewriting the whole inference stack.
Skip if: Projects with no LaunchDarkly AI Config, no server-side AI SDK, or teams that only need feature flags unrelated to model telemetry.
When should I use this skill?
Instrument an existing codebase with LaunchDarkly AI Config tracking and pick the lowest-ceremony tier that still captures duration, tokens, success/error, and TTFT when streaming.
What do I get? / Deliverables
Provider calls emit the metrics the Monitoring tab expects via the lowest-ceremony tier that still captures duration, tokens, success/error, and streaming TTFT when applicable.
- Chosen instrumentation tier implemented on target provider calls
- Monitoring-tab-ready metrics for duration, tokens, success/error (and TTFT when streaming)
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Build → integrations is the primary shelf because the skill changes application code and SDK wiring around provider calls, not dashboard-only ops. Integrations fits LaunchDarkly server-side AI SDK tiers (managed runner, provider package, custom extractor, manual tracker) added to an existing AI Config.
Where it fits
Wrap an existing OpenAI-style call with the provider-package tier so tokens and errors flow into AI Config automatically.
Fix a regression where manual trackers dropped TTFT after enabling streaming responses.
Compare duration and token usage across AI Config variants before raising spend limits.
How it compares
Skill-guided SDK ladder, not a generic OpenTelemetry snippet that ignores LaunchDarkly AI Config provider packages.
Common Questions / FAQ
Who is aiconfig-ai-metrics for?
Indie builders and small teams already on LaunchDarkly AI Config who need trustworthy model metrics in the Monitoring tab without maintaining bespoke tracker glue.
When should I use aiconfig-ai-metrics?
During Build when integrating LaunchDarkly into new LLM paths, during Operate monitoring when dashboards are empty, or during Grow analytics when you need token and latency trends per config variant.
Is aiconfig-ai-metrics safe to install?
It guides code changes around API keys and production SDKs—review the Security Audits panel on this page and verify LaunchDarkly credentials handling in your repo before merging.
SKILL.md
READMESKILL.md - Aiconfig Ai Metrics
# AI Metrics Instrumentation You're using a skill that wires LaunchDarkly AI metrics around an existing provider call. Your job is to audit what's already there, pick the right tier from the ladder below, and implement it with the **least ceremony that still captures the metrics the Monitoring tab needs** (duration, input/output tokens, success/error, plus TTFT when streaming). The single most important thing to get right: **default to the highest tier that fits the shape of the call**. Going lower ("just write the manual tracker calls") looks flexible but costs you drift, missed metrics, and legacy patterns the SDKs have moved past. ## The four-tier ladder This is the order the official SDK READMEs (Python core, Node core, and every provider package) recommend. Walk from the top and stop at the first tier that fits: | Tier | Pattern | Use when | Tracks automatically | |------|---------|----------|----------------------| | **1 — Managed runner** | Python: `ai_client.create_model(...)` returning a `ManagedModel`, then `await model.run(...)`. <br>Node: `aiClient.createModel(...)` returning a `ManagedModel`, then `await model.run(...)`. | The call is conversational (chat history, turn-based). This is what the provider READMEs lead with. | Duration, tokens, success/error — **all of it, zero tracker calls**. | | **2 — Provider package + `trackMetricsOf`** | `tracker.trackMetricsOf(Provider.getAIMetricsFromResponse, () => providerCall())`. Provider packages today: `@launchdarkly/server-sdk-ai-openai`, `-langchain`, `-vercel` (Node) and `launchdarkly-server-sdk-ai-openai`, `-langchain` (Python). | The shape isn't a chat loop (one-shot completion, structured output, agent step) but the framework or provider has a package. | Duration + success/error from the wrapper; tokens from the package's built-in `getAIMetricsFromResponse` extractor. | | **3 — Custom extractor + `trackMetricsOf`** | Same `trackMetricsOf` wrapper, but you write a small function that maps the provider response to `LDAIMetrics` (tokens + success). | No provider package exists (Anthropic direct, Gemini, Cohere, custom HTTP). | Duration + success/error from the wrapper; tokens from your extractor. | | **4 — Raw manual** | Separate calls to `trackDuration`, `trackTokens`, `trackSuccess` / `trackError`, plus `trackTimeToFirstToken` for streams. | Streaming with TTFT, unusual response shapes, partial tracking, anything Tier 2–3 can't cleanly wrap. | Only what you explicitly call — it's on you to not miss one. | Every provider — OpenAI, LangChain, Vercel, Bedrock, Anthropic, Gemini, custom HTTP — uses the same generic shape: `tracker.trackMetricsOf(getAIMetricsFromResponse, () => providerCall())` in Node, `tracker.track_metrics_of(get_ai_metrics_from_response, provider_call)` in Python. The extractor is the only thing that changes per provider: import `getAIMetricsFromResponse` from the matching `@launchdarkly/server-sdk-ai-<provider>` (or `ldai_<provider>`) package, or write a small custom function that returns `LDAIMetrics`. There are no provider-specific tracker methods. ## Workflow ### 1. Explore the existing call site Before picking a tier, find the provider call and answer these questions: - [ ] **Shape?** Is it a chat loop (history + turn-based), a one-shot completion, an agent step, or something else? → drives Tier 1 vs 2. - [ ] **Framework?** Raw provider SDK? LangCha