
arize-ai/phoenix
3 skills1.9k installs30.1k starsGitHub
Install
npx skills add https://github.com/arize-ai/phoenixSkills in this repo
1Phoenix CliAxial Coding is a Phoenix CLI workflow skill for solo and indie builders who already have open-ended notes, trace observations, or open-coding output and need a structured failure taxonomy before investing in evals or refactors. You reuse the same coding annotation identifier from open coding (or recover it from the wrap-up UI or `.px/coding/*.jsonl`), then drive `annotate` calls with `--identifier` so categories stay tied to Phoenix sessions and filters. The method emphasizes grounded grouping—what actually failed in traces—rather than brainstorming categories from memory, which makes counts and MECE-style breakdowns credible for prioritization. Reach for it when questions sound like “what kinds of failures do we have,” “what evals should I build first,” or “how do I prioritize fixes.” It is intermediate complexity: you need Phoenix wired up and comfort with annotation identifiers, but you do not need a separate spreadsheet workflow. Outcomes are named categories with supporting counts you can hand to eval design, ship checklists, or operate iteration backlogs.674installs2Phoenix TracingPhoenix Tracing is an agent skill that indexes OpenInference semantic conventions and Phoenix instrumentation guidance for solo builders shipping AI agents and LLM-backed APIs. It points you through span kinds, attribute fundamentals, setup paths, and annotation patterns so production traces are consistent enough to debug failures, compare runs, and attach human or automated feedback. Use it when you are adding or fixing tracing in Python or TypeScript stacks rather than guessing span names and attributes. The material is organized as flat prefixed rule files under rules/, which makes it easy for an agent to pull the right slice during a coding session. It pairs naturally with Phoenix as the observability backend and OpenTelemetry as the plumbing layer, so the outcome is traceable agent workflows instead of opaque chat logs.668installs3Phoenix EvalsPhoenix Evals (axial coding guidance in this skill) helps solo builders and small teams turn qualitative review notes from LLM or agent runs into actionable, countable failure categories, then attach those labels to traces in Arize Phoenix. The workflow mirrors classic qualitative research: collect open codes, cluster patterns, name categories clearly enough to drive fixes, and count incidence per bucket so prioritization is data-backed rather than anecdotal. Provided examples cover hallucination, incompleteness, tone mismatch, ignored context, and missing disclaimers—common solo-shipper pain points when shipping agent features. Python and TypeScript snippets show how to write span annotations synchronously for human review pipelines. Use when you are iterating prompts, tools, or RAG and need a repeatable taxonomy instead of one-off slack threads about “bad answers.”589installs