
Phoenix Cli
Turn open-ended trace observations into MECE-style failure categories with counts to prioritize agent fixes and eval coverage.
Overview
Phoenix-cli is an agent skill most often used in Ship (also Build agent-tooling, Grow analytics) that performs axial coding to structure trace observations into failure taxonomies for evals and prioritization.
Install
npx skills add https://github.com/github/awesome-copilot --skill phoenix-cliWhat is this skill?
- Axial coding groups open observations into named failure categories with counts
- Designed after open coding but usable from any set of qualitative trace notes
- Reuses a single coding annotation identifier across annotate calls and Phoenix session JSONL
- Supports MECE breakdowns and fix prioritization grounded in real traces
- Explicit handoff vocabulary for eval design and downstream Phoenix CLI annotate workflow
Adoption & trust: 842 installs on skills.sh; 34.6k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have piles of agent trace notes but no shared categories or counts to decide what to fix or measure first.
Who is it for?
Solo builders running Phoenix-style agent eval loops who already capture observations and need MECE categories before building tests.
Skip if: Pure unit-test writing with no qualitative traces, or teams that skip open coding when the skill assumes an existing annotation session.
When should I use this skill?
User has open-ended observations and needs structured failure categories, counts, MECE breakdowns, eval priorities, or fix ordering grounded in traces.
What do I get? / Deliverables
You produce a trace-backed failure taxonomy with counts and a stable coding session id ready for annotate-driven eval design—typically after open coding completes.
- Axial failure taxonomy with category names and counts
- Annotated session artifacts via Phoenix annotate workflow
- Prioritized list of failure themes for evals and fixes
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Ship is canonical because axial coding closes the loop on quality—structuring failures before release gates and eval hardening. Testing subphase matches eval design, failure taxonomy, and trace-grounded categories rather than greenfield coding.
Where it fits
Cluster recurring agent mistakes from release-candidate traces before signing off.
Define eval buckets and annotation rules for a new custom skill.
Track which failure categories grow week-over-week after a model or prompt change.
Re-axial-code production incidents to see if a hotfix actually removed a dominant failure mode.
How it compares
Qualitative coding workflow for agent failures, not a generic linter or single-shot “list bugs” prompt.
Common Questions / FAQ
Who is phoenix-cli for?
Indie AI builders and small teams improving copilots and coding agents who use Phoenix CLI annotate flows and qualitative trace review.
When should I use phoenix-cli?
During Ship testing when grouping failures for release readiness; during Build agent-tooling when designing eval suites; during Grow analytics when quantifying recurring agent failure themes.
Is phoenix-cli safe to install?
It may drive CLI annotate actions and read local .px/coding artifacts—review the Security Audits panel on this Prism page and treat session identifiers and trace data as sensitive.
Workflow Chain
Requires first: open coding
SKILL.md
READMESKILL.md - Phoenix Cli
# Axial Coding Group open-ended observations into structured failure taxonomies. Axial coding turns notes, trace observations, or open-coding output into named categories with counts, supporting downstream work like eval design and fix prioritization. It works well after [open coding](open-coding.md), but can start from any set of open-ended observations. **Reach for this whenever** the user has observations and needs structure — e.g., "what categories of failures do we have", "what should I build evals for", "how do I prioritize fixes", "group these notes", "MECE breakdown", or any framing that asks for categories or counts grounded in real traces rather than invented top-down. ## Coding annotation identifier (reuse the open-coding value) Reuse the **coding annotation identifier** chosen in open coding — every `annotate` call below passes `--identifier "$CODING_ANNOTATION_IDENTIFIER"` explicitly. In a fresh shell or fresh agent invocation, set `CODING_ANNOTATION_IDENTIFIER` to the same value (recoverable from the wrap-up UI URL or by listing `.px/coding/*.jsonl`); don't mint a new id. See [open-coding.md#coding-annotation-identifier-pick-this-first](open-coding.md#coding-annotation-identifier-pick-this-first) for the rationale and the sanitization rule. > **Workflow term vs. server annotation name.** The skill calls this value the **coding annotation identifier**; the server annotation NAME used for the UI filter stays `coding_session_id` for data compatibility. Don't try to rename the server-side key. ```bash CODING_ANNOTATION_IDENTIFIER="coding-run:chatbot-context-loss-2026-05-06" SLUG=$(echo -n "$CODING_ANNOTATION_IDENTIFIER" | sed 's/[^a-zA-Z0-9_-]/-/g') NOTES_SIDECAR=".px/coding/${SLUG}.jsonl" AXIAL_SIDECAR=".px/coding/${SLUG}-axial.jsonl" ``` ## Choosing the unit Open coding's diagnostic in [open-coding.md#choosing-the-unit-of-analysis](open-coding.md#choosing-the-unit-of-analysis) commits to a unit (trace, span, or session). Axial coding inherits that unit by default — if open coding ran at the session level, axial labels will too; same for trace and span. **An axial label can live at a different level than the note that informed it** — that's a feature, and it works in every direction: - *Trace → span*: a trace-level note "answered shipping when asked about returns" can produce a span-level annotation on the retrieval span once a pattern reveals retrieval as the consistent culprit. - *Trace → session*: a batch of trace-level notes describing single-turn confusion can produce a session-level annotation once you see the pattern is "the agent doesn't track the user's stated context across turns." - *Session → trace*: a session-level note about cross-turn drift may, on closer reading, attribute to one specific turn where the agent dropped the thread; a trace-level annotation can name that turn. Whichever level you write the axial label on, write the matching `coding_session_id` UI-filter annotation on the same entity (see [UI-filter annotation](#ui-filter-annotation) below) so the UI link picks it up. ## Process 1. **Set the coding annotation identifier** — set `CODING_ANNOTATION_IDENTIFIER` to the value used in open coding and re-derive `SLUG`, `NOTES_SIDECAR`, `AXIAL_SIDECAR` (see [Coding annotation identifier](#coding-annotation-identifier-reuse-the-open-coding-value)) 2. **Gather** — read open-coding notes from `$NOTES_SIDECAR` (at the unit committed in open coding); no server round-trip 3. **Pattern** — group notes with common themes 4. **Name** — create actionable category names 5. **Attribute** — decide what level each category lives at; an axial label can move up (trace → session) or down (trace → span) from the source note's level to the level the pattern actually implicates 6. **Record** — `px {trace,span,session} annotate ... --name axial_coding_category --label <cat> --identifier "$CODING_ANNOTATION_IDENTIFIER"`, add/update one JSONL sidecar row for the label, then write the matching `coding_sessi