
Phoenix Tracing
Instrument LLM and agent workflows with OpenInference spans and Phoenix-compatible tracing so you can debug runs and score quality in production.
Overview
Phoenix Tracing is an agent skill most often used in Operate (also Build integrations and agent-tooling) that teaches OpenInference span conventions and Phoenix instrumentation for LLM agent observability.
Install
npx skills add https://github.com/github/awesome-copilot --skill phoenix-tracingWhat is this skill?
- Flat rules/ index with semantic prefixes: span-*, setup-*, instrumentation-*, fundamentals-*, attributes-*, annotations-
- OpenInference span kinds (LLM, CHAIN, TOOL, and related conventions) aligned with the public spec
- Four annotation target types: span, trace, document, and session—for human and automated evaluation feedback
- Setup and instrumentation guides for Python OTEL, Python client, and TypeScript Phoenix APIs
- Links to Phoenix docs, OpenInference spec, and export/advanced annotation workflows
- Four annotation target types in the annotations overview table
- Flat rules/ directory organized by semantic prefixes (span-, setup-, instrumentation-, etc.)
Adoption & trust: 850 installs on skills.sh; 34.6k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
Your agent pipelines run opaque tool and LLM steps with no shared span schema, so you cannot compare runs or attach evaluation feedback in Phoenix.
Who is it for?
Solo builders instrumenting multi-step LLM apps who already use or plan to use Phoenix for trace review and annotations.
Skip if: Teams that only need generic application logs with no LLM-specific spans, or shops not willing to adopt OpenInference naming.
When should I use this skill?
You need OpenInference conventions, Phoenix instrumentation guides, span kinds, or annotation/export patterns for agent traces.
What do I get? / Deliverables
You get a consistent OpenInference-aligned tracing model—span kinds, attributes, setup paths, and annotation types—ready to implement in Python or TypeScript and export to Phoenix.
- Instrumented span taxonomy aligned to OpenInference
- Annotation strategy for spans/traces/sessions/documents
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Canonical shelf is Operate because tracing pays off once agents run in real environments and you need span-level visibility, though teams also reach for it while wiring instrumentation during Build. Monitoring is the right primary subphase: semantic spans, export, and annotations are observability primitives for live agent traffic, not greenfield UI work.
Where it fits
Map each tool call to TOOL spans before merging an agent PR.
Pick span kinds and attribute keys while scaffolding a RAG chain in TypeScript.
Verify exported traces include CHAIN and LLM spans before release.
Add session annotations after user-reported bad completions.
How it compares
Reference skill for semantic tracing conventions—not a drop-in MCP server or a one-click deploy to Phoenix Cloud.
Common Questions / FAQ
Who is phoenix-tracing for?
Indie and solo developers building agentic features who want Phoenix-compatible traces, span annotations, and OpenInference-aligned attributes without guessing from scattered docs.
When should I use phoenix-tracing?
During Build when wiring OTEL or Phoenix clients into chains and tools; during Ship when validating observability before launch; and during Operate when adding span or session annotations for evals and monitoring.
Is phoenix-tracing safe to install?
Treat it as documentation-only procedural knowledge in your agent; review the Security Audits panel on this Prism page and your org policy before pointing agents at production API keys or trace exporters.
SKILL.md
READMESKILL.md - Phoenix Tracing
# Phoenix Tracing Skill OpenInference semantic conventions and instrumentation guides for Phoenix. ## Usage Start with `SKILL.md` for the index and quick reference. ## File Organization All files in flat `rules/` directory with semantic prefixes: - `span-*` - Span kinds (LLM, CHAIN, TOOL, etc.) - `setup-*`, `instrumentation-*` - Getting started guides - `fundamentals-*`, `attributes-*` - Reference docs - `annotations-*`, `export-*` - Advanced features ## Reference - [OpenInference Spec](https://github.com/Arize-ai/openinference/tree/main/spec) - [Phoenix Documentation](https://docs.arize.com/phoenix) - [Python OTEL API](https://arize-phoenix.readthedocs.io/projects/otel/en/latest/) - [Python Client API](https://arize-phoenix.readthedocs.io/projects/client/en/latest/) - [TypeScript API](https://arize-ai.github.io/phoenix/) # Annotations Overview Annotations allow you to add human or automated feedback to traces, spans, documents, and sessions. Annotations are essential for evaluation, quality assessment, and building training datasets. ## Annotation Types Phoenix supports four types of annotations: | Type | Target | Purpose | Example Use Case | | ----------------------- | -------------------------------- | ---------------------------------------- | -------------------------------- | | **Span Annotation** | Individual span | Feedback on a specific operation | "This LLM response was accurate" | | **Document Annotation** | Document within a RETRIEVER span | Feedback on retrieved document relevance | "This document was not helpful" | | **Trace Annotation** | Entire trace | Feedback on end-to-end interaction | "User was satisfied with result" | | **Session Annotation** | User session | Feedback on multi-turn conversation | "Session ended successfully" | ## Annotation Fields Every annotation has these fields: ### Required Fields | Field | Type | Description | | --------- | ------ | ----------------------------------------------------------------------------- | | Entity ID | String | ID of the target entity (span_id, trace_id, session_id, or document_position) | | `name` | String | Annotation name/label (e.g., "quality", "relevance", "helpfulness") | ### Result Fields (At Least One Required) | Field | Type | Description | | ------------- | ----------------- | ----------------------------------------------------------------- | | `label` | String (optional) | Categorical value (e.g., "good", "bad", "relevant", "irrelevant") | | `score` | Float (optional) | Numeric value (typically 0-1, but can be any range) | | `explanation` | String (optional) | Free-text explanation of the annotation | **At least one** of `label`, `score`, or `explanation` must be provided. ### Optional Fields | Field | Type | Description | | ---------------- | ------ | --------------------------------------------------------------------------------------- | | `annotator_kind` | String | Who created this annotation: "HUMAN", "LLM", or "CODE" (default: "HUMAN") | | `identifier` | String | Unique identifier for upsert behavior (updates existing if same name+entity+identifier) | | `metadata` | Object | Custom metadata as key-value pairs | ## Annotator Kinds | Kind | Description | Example | | ------- | ------------------------------ | --------------------------------- | | `HUMAN` | Manual feedback from a person | User ratings, expert labels |