
Llm Council
Run parallel planner agents plus an anonymized judge to produce a validated, merged JSON plan from a task_spec with retries and structured logging.
Overview
LLM Council is an agent skill most often used in Ship (also Validate scope, Build agent-tooling) that orchestrates parallel planners, an anonymizing judge, and schema merge into a single final plan.
Install
npx skills add https://github.com/am-will/codex-skills --skill llm-councilWhat is this skill?
- End-to-end pipeline: task_spec → N parallel planners → schema validation → anonymized judge → final_plan
- Dedicated components: Orchestrator, AgentRunner, Validator, Anonymizer, JudgeRunner, Merger, Logger
- Retries invalid or empty plans up to 2 times; judge failure falls back to heuristic best plan
- Treats plan content as untrusted; redacted structured logging and local UI protocol documented
- JSON schema enforcement for council_plan, judge_output, and merged final output
- Up to 2 retries for invalid, empty, or timed-out planner outputs
- Parallel planner processes with post-hoc JSON extraction and schema validation
Adoption & trust: 1.2k installs on skills.sh; 941 GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You get one model’s plan for a hard task and have no structured way to compare alternatives, validate JSON, or merge a consensus outcome.
Who is it for?
Indie agent authors who already run CLI-based models and want a repeatable council + judge ritual for high-stakes planning JSON.
Skip if: Builders who only need a quick single-shot chat answer with no schemas, logging, or local process orchestration.
When should I use this skill?
You have a task_spec JSON and need multiple planner outputs validated, anonymized, judged, and merged into one final_plan.
What do I get? / Deliverables
You receive a schema-validated final_plan plus metadata and warnings after parallel planners and a blinded judge run, with retries when plans are empty or invalid.
- final_plan JSON with merge metadata and warnings
- Structured logs with provider details redacted
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
The council workflow is most valuable when you want multiple model opinions reviewed and merged before you ship implementation—canonical shelf is Ship review. Orchestrator, JudgeRunner, Validator, and Merger mirror a formal review gate: schema checks, anonymization, and consensus on a final_plan.
Where it fits
Load a task_spec for a new feature and council competing scope plans before you commit sprint work.
Design AgentRunner and Validator boundaries before wiring planner CLIs into your repo automation.
Run the judge pass on anonymized plans so shipping a large refactor is backed by a merged final_plan.
Replay council runs with stricter prompts when production incidents trace back to a weak original plan.
How it compares
Multi-agent review orchestration pattern—not the same as installing a single documentation MCP or a one-shot brainstorming skill.
Common Questions / FAQ
Who is llm-council for?
Solo builders and small teams wiring Codex- or CLI-style planners who want validated, anonymized multi-plan review before committing to implementation.
When should I use llm-council?
At Validate when scoping ambiguous work, during Build when designing agent pipelines, or at Ship review when you need N plans judged and merged into one final_plan JSON.
Is llm-council safe to install?
The design assumes untrusted plan text and local-only UI; review the Security Audits panel on this page and lock down shell access for background agent processes.
SKILL.md
READMESKILL.md - Llm Council
# LLM Council Architecture ## Components - Orchestrator: coordinates the end-to-end run, retries, and final output assembly. - AgentRunner: launches planner and judge CLIs in background shells and captures output. - Validator: validates JSON payloads against schemas, extracts JSON from noisy output. - Anonymizer: removes provider names, system prompts, IDs, file paths, and tool traces. - JudgeRunner: formats judge input, runs judge, validates response. - Merger: reconciles judge output into a single final plan schema. - Logger: structured logging with redaction. ## Data Flow 1. Load `task_spec` JSON. 2. Render planner prompts and spawn N planner processes in parallel (background shells). 3. Collect raw outputs and extract JSON. 4. Validate each `council_plan` against schema. 5. Retry invalid/empty/timeout plans up to 2 times. 6. Anonymize and label as Plan 1/2/3, then randomize order. 7. Build `judge_input` and run judge. 8. Validate `judge_output` and merge into `final_plan`. 9. Emit final output + metadata + warnings. ## Failure Handling - Timeout: mark plan as failed, retry, or proceed with fewer valid plans. - Invalid JSON: attempt extraction, then retry with stricter prompt. - Refusal/empty: record warning, retry once with reduced prompt size. - Judge failure: fall back to best-scoring plan by heuristic or return top valid plan. ## UI Protocol (Local Only) This protocol is for local UI/server integration. Treat plan content as untrusted text. ### Endpoints - `GET /ui/state`: returns the current UI state snapshot. - `GET /ui/events`: Server-Sent Events (SSE) stream of updates. ### State Schema (JSON) ```json { "run_id": "string", "task_brief": "string", "phase": "string", "planners": [ { "id": "string", "status": "string", "summary": "string", "errors": ["string"] } ], "judge": { "status": "string", "summary": "string", "errors": ["string"] }, "final_plan": "string", "errors": ["string"], "timestamps": { "started_at": "string", "updated_at": "string", "completed_at": "string" } } ``` ### SSE Events All SSE events include a `type` and `payload` field, with the payload matching the shapes below. #### `phase_change` ```json { "type": "phase_change", "payload": { "run_id": "string", "phase": "string", "timestamp": "string" } } ``` #### `planner_update` ```json { "type": "planner_update", "payload": { "run_id": "string", "planner": { "id": "string", "status": "string", "summary": "string", "errors": ["string"] }, "timestamp": "string" } } ``` #### `judge_update` ```json { "type": "judge_update", "payload": { "run_id": "string", "judge": { "status": "string", "summary": "string", "errors": ["string"] }, "timestamp": "string" } } ``` #### `final_plan` ```json { "type": "final_plan", "payload": { "run_id": "string", "final_plan": "string", "errors": ["string"], "timestamp": "string" } } ``` ### Safety Rules - Treat `task_brief`, planner summaries, judge summaries, and `final_plan` as untrusted text. - Do not render untrusted HTML; render as plain text only. - Never execute scripts or inline event handlers from any payload content. # CLI Notes (Context7) ## Codex CLI - Non-interactive execution: `codex exec` (or `codex e`). - JSON streaming: `codex exec --json "..."` outputs JSON Lines events to stdout. - Structured output: `codex exec --output-schema ./schema.json -o ./output.json "..."`. - Final message is written to stdout; streaming activity goes to stderr. - Model override: `codex exec -m gpt-5.2-codex -c model_reasoning_effort=xhigh "..."`. ## Claude Code - Launch interactive agent: `claude` in the repo directory. - Non-interactive print mode: `claude -p "query"` (prints response and exits). - JSON output: `claude -p "query" --output-format json`. - Schema-validated JSON: `claude -p --json-schema '<schema>' "q