
Observability Llm Obs
Answer questions about LLM and agentic app health, token spend, quality, and trace chains using data already ingested into Elasticsearch.
Overview
Observability LLM Obs is an agent skill most often used in Operate (also Grow analytics) that answers LLM monitoring questions using ES|QL and Elasticsearch APIs on ingested trace and metric data.
Install
npx skills add https://github.com/elastic/agent-skills --skill observability-llm-obsWhat is this skill?
- Scopes answers to Elastic-ingested traces and metrics (APM, OTLP, OpenLLMetry/OpenLIT/Langtrace paths)
- Uses ES|QL, Elasticsearch APIs, and Kibana APIs without requiring the Kibana UI
- Discovers which ingestion path exists (`traces*`, OTel generic streams) before querying
- Covers LLM performance, token/cost utilization, response quality, and agent workflow orchestration
- Aligns with Elastic OpenTelemetry LLM use-case documentation for EDOT-style deployments
- Skill metadata version 0.1.0
Adoption & trust: 1k installs on skills.sh; 502 GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You run agents in production but cannot tie latency, token cost, and multi-hop workflows to real telemetry in your Elastic deployment.
Who is it for?
Indie builders and tiny teams standardized on Elastic/Kibana who export OpenTelemetry or APM spans from LLM and agent frameworks.
Skip if: Builders without Elasticsearch observability plumbing who need setup guides first—this skill queries existing data rather than installing agents from scratch.
When should I use this skill?
User asks about LLM monitoring, GenAI observability, AI cost/quality, or workflow orchestration on Elastic-ingested data.
What do I get? / Deliverables
You get ES|QL- and API-grounded visibility into LLM performance, spend, and orchestration chains from whatever trace sources your cluster already has.
- ES|QL or API query patterns for LLM traces
- Interpretation of performance, cost, and chain quality from cluster data
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Operate → Monitoring is the primary shelf because the skill assumes production telemetry and ES|QL/API investigation—not greenfield instrumentation design alone. Monitoring fits ongoing trace, metric, and cost visibility for GenAI workloads rather than launch-time distribution work.
Where it fits
Trace a sudden latency regression across multi-step agent spans in `traces*` indices.
Compare weekly token utilization and cost drivers per model from ingested OTel metrics.
Pre-launch sanity-check that production LLM spans and required fields land in Elasticsearch before go-live.
How it compares
Elastic-data interrogation skill, not a generic prompt-quality coach or a standalone SaaS APM product.
Common Questions / FAQ
Who is observability-llm-obs for?
Developers operating LLM or agentic apps on Elastic who want an agent to query traces, costs, and quality from ingested observability data.
When should I use observability-llm-obs?
In Operate when incidents or spend spikes need trace-backed answers; in Grow when analyzing token trends; whenever users mention LLM monitoring, GenAI observability, or AI cost on Elastic.
Is observability-llm-obs safe to install?
It instructs API access to your Elastic deployment; review the Security Audits panel on this page and scope cluster credentials and network access carefully.
SKILL.md
READMESKILL.md - Observability Llm Obs
# LLM and Agentic Observability Answer user questions about monitoring LLMs and agentic components using **data ingested into Elastic** only. Focus on LLM performance, cost and token utilization, response quality, and call chaining or agentic workflow orchestration. Use **ES|QL**, Elasticsearch APIs, and (where needed) Kibana APIs. Do not rely on Kibana UI; the skill works without it. A given deployment typically uses **one or more** ingestion paths (APM/OTLP traces **and/or** integration metrics/logs)— discover what is available before querying. ## Where to look - **Trace and metrics data (APM / OTel):** Trace data in Elastic is stored in **`traces*`** when collected by the Elastic APM Agent, and in **`traces-generic.otel-default`** (and similar) when collected by OpenTelemetry. Use the generic pattern **`traces*`** to find all trace data regardless of source. When the application is instrumented with OpenTelemetry (e.g. Elastic [Distributions of OpenTelemetry (EDOT)](https://www.elastic.co/docs/solutions/observability/get-started/opentelemetry/use-cases/llms), OpenLLMetry, OpenLIT, Langtrace exporting to OTLP), LLM and agent spans land in these trace data streams; metrics may land in **`metrics-apm*`** or metrics-generic. Query **`traces*`** and **`metrics*`** data streams for per-request and aggregated LLM signals. - **Integration metrics and logs:** When the user collects data via [Elastic LLM integrations](https://www.elastic.co/docs/solutions/observability/applications/llm-observability) (OpenAI, Azure OpenAI, Azure AI Foundry, Amazon Bedrock, Bedrock AgentCore, GCP Vertex AI, etc.), metrics and logs go to **integration data streams** (e.g. `metrics*`, `logs*` with dataset/namespace per integration). Check which data streams exist. - **Discover first:** Use Elasticsearch to list data streams or indices (e.g. `GET _data_stream`, or `GET traces*/_mapping`, `GET metrics*/_mapping`) and optionally sample a document to see which LLM-related fields are present. Do not assume both APM and integration data exist. - **ES|QL:** Use the **elasticsearch-esql** skill for ES|QL syntax, commands, and query patterns when building queries against `traces*` or metrics data streams. - **Alerts and SLOs:** Use the [Observability APIs](https://www.elastic.co/docs/solutions/observability/apis) **SLOs API** ([Stack](https://www.elastic.co/docs/api/doc/kibana/group/endpoint-slo) | [Serverless](https://www.elastic.co/docs/api/doc/serverless/group/endpoint-slo)) and **Alerting API** ([Stack](https://www.elastic.co/docs/api/doc/kibana/group/endpoint-alerting) | [Serverless](https://www.elastic.co/docs/api/doc/serverless/group/endpoint-alerting)) to find SLOs and alerting rules that target LLM-related data (e.g. services backed by `traces*`, or integration metrics). Firing alerts or violated/degrading SLOs point to potential degraded performance. ## Data available in Elastic ### From traces and metrics (traces*, metrics-apm* / metrics-generic) Spans from OTel/EDOT (and compatible SDKs) carry **span attributes** that may follow [OpenTelemetry GenAI semantic conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/) or provider-specific names. In Elasticsearch, attributes typically appear under `span.attributes` (exact key names depend on ingestion). Common attributes: | Purpose | Example attribute names (OTel GenAI) | | -------------------- | --------------------------------------------------------- | | Operation / provider | `gen_ai.operation.name`, `gen_ai.provider.name` | | Model | `gen_ai.request.model`, `gen_ai.response.model`