Llm Council

Name: Llm Council
Author: am-will

am-will/codex-skills

Run parallel planner agents plus an anonymized judge to produce a validated, merged JSON plan from a task_spec with retries and structured logging.

Overview

LLM Council is an agent skill most often used in Ship (also Validate scope, Build agent-tooling) that orchestrates parallel planners, an anonymizing judge, and schema merge into a single final plan.

Install

npx skills add https://github.com/am-will/codex-skills --skill llm-council

What is this skill?

End-to-end pipeline: task_spec → N parallel planners → schema validation → anonymized judge → final_plan
Dedicated components: Orchestrator, AgentRunner, Validator, Anonymizer, JudgeRunner, Merger, Logger
Retries invalid or empty plans up to 2 times; judge failure falls back to heuristic best plan
Treats plan content as untrusted; redacted structured logging and local UI protocol documented
JSON schema enforcement for council_plan, judge_output, and merged final output
Up to 2 retries for invalid, empty, or timed-out planner outputs
Parallel planner processes with post-hoc JSON extraction and schema validation

Compatible agents: Codex, Claude Code, Cursor, any compatible agent

Adoption & trust: 1.2k installs on skills.sh; 941 GitHub stars; 2/3 security scanners passed (skills.sh audits).

What problem does it solve?

You get one model’s plan for a hard task and have no structured way to compare alternatives, validate JSON, or merge a consensus outcome.

Who is it for?

Indie agent authors who already run CLI-based models and want a repeatable council + judge ritual for high-stakes planning JSON.

Skip if: Builders who only need a quick single-shot chat answer with no schemas, logging, or local process orchestration.

When should I use this skill?

You have a task_spec JSON and need multiple planner outputs validated, anonymized, judged, and merged into one final_plan.

What do I get? / Deliverables

You receive a schema-validated final_plan plus metadata and warnings after parallel planners and a blinded judge run, with retries when plans are empty or invalid.

final_plan JSON with merge metadata and warnings
Structured logs with provider details redacted

Recommended Skills

Microsoft Foundrymicrosoft/azure-skills

Microsoft Foundry skill guides agents through the full Azure AI Foundry lifecycle—containerizing agents, pushing to ACR,…377k installs·1.2k stars

Azure Aimicrosoft/azure-skills

azure-ai is a Prism-oriented quick reference for Microsoft Azure AI work, with the published body centered on the Azure …375k installs·1.2k stars

Azure Hosted Copilot Sdkmicrosoft/azure-skills

Azure Hosted Copilot SDK is Microsoft's entry skill for repos using @github/copilot-sdk—it detects CopilotClient usage, …346k installs·1.2k stars

Lark Eventlarksuite/cli

Lark real-time subscription skill via lark-cli event consume for building bots and streaming webhook-style agent workers…208k installs·13.7k stars

Running Claude Code Via Litellm Copilotxixu-me/skills

Running Claude Code via LiteLLM Copilot walks through pointing Claude Code at a local LiteLLM proxy that forwards Anthro…200k installs·61 stars

Setup Matt Pocock Skillsmattpocock/skills

One-time per-repo setup so Matt Pocock engineering skills share correct issue tracker, triage strings, and domain docume…180k installs·121k stars

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

The council workflow is most valuable when you want multiple model opinions reviewed and merged before you ship implementation—canonical shelf is Ship review. Orchestrator, JudgeRunner, Validator, and Merger mirror a formal review gate: schema checks, anonymization, and consensus on a final_plan.

Also useful

ValidateScope & plan

Also useful

BuildAgent skills & templates

Where it fits

Example use

ValidateScope & plan

Load a task_spec for a new feature and council competing scope plans before you commit sprint work.

Example use

BuildAgent skills & templates

Design AgentRunner and Validator boundaries before wiring planner CLIs into your repo automation.

Example use

ShipCode review

Run the judge pass on anonymized plans so shipping a large refactor is backed by a merged final_plan.

Example use

OperateIteration & experiments

Replay council runs with stricter prompts when production incidents trace back to a weak original plan.

How it compares

Multi-agent review orchestration pattern—not the same as installing a single documentation MCP or a one-shot brainstorming skill.

Common Questions / FAQ

Who is llm-council for?

Solo builders and small teams wiring Codex- or CLI-style planners who want validated, anonymized multi-plan review before committing to implementation.

When should I use llm-council?

At Validate when scoping ambiguous work, during Build when designing agent pipelines, or at Ship review when you need N plans judged and merged into one final_plan JSON.

Is llm-council safe to install?

The design assumes untrusted plan text and local-only UI; review the Security Audits panel on this page and lock down shell access for background agent processes.

SKILL.md

READMESKILL.md - Llm Council

# LLM Council Architecture

## Components
- Orchestrator: coordinates the end-to-end run, retries, and final output assembly.
- AgentRunner: launches planner and judge CLIs in background shells and captures output.
- Validator: validates JSON payloads against schemas, extracts JSON from noisy output.
- Anonymizer: removes provider names, system prompts, IDs, file paths, and tool traces.
- JudgeRunner: formats judge input, runs judge, validates response.
- Merger: reconciles judge output into a single final plan schema.
- Logger: structured logging with redaction.

## Data Flow
1. Load `task_spec` JSON.
2. Render planner prompts and spawn N planner processes in parallel (background shells).
3. Collect raw outputs and extract JSON.
4. Validate each `council_plan` against schema.
5. Retry invalid/empty/timeout plans up to 2 times.
6. Anonymize and label as Plan 1/2/3, then randomize order.
7. Build `judge_input` and run judge.
8. Validate `judge_output` and merge into `final_plan`.
9. Emit final output + metadata + warnings.

## Failure Handling
- Timeout: mark plan as failed, retry, or proceed with fewer valid plans.
- Invalid JSON: attempt extraction, then retry with stricter prompt.
- Refusal/empty: record warning, retry once with reduced prompt size.
- Judge failure: fall back to best-scoring plan by heuristic or return top valid plan.

## UI Protocol (Local Only)
This protocol is for local UI/server integration. Treat plan content as untrusted text.

### Endpoints
- `GET /ui/state`: returns the current UI state snapshot.
- `GET /ui/events`: Server-Sent Events (SSE) stream of updates.

### State Schema (JSON)
```json
{
  "run_id": "string",
  "task_brief": "string",
  "phase": "string",
  "planners": [
    {
      "id": "string",
      "status": "string",
      "summary": "string",
      "errors": ["string"]
    }
  ],
  "judge": {
    "status": "string",
    "summary": "string",
    "errors": ["string"]
  },
  "final_plan": "string",
  "errors": ["string"],
  "timestamps": {
    "started_at": "string",
    "updated_at": "string",
    "completed_at": "string"
  }
}
```

### SSE Events
All SSE events include a `type` and `payload` field, with the payload matching the shapes below.

#### `phase_change`
```json
{
  "type": "phase_change",
  "payload": {
    "run_id": "string",
    "phase": "string",
    "timestamp": "string"
  }
}
```

#### `planner_update`
```json
{
  "type": "planner_update",
  "payload": {
    "run_id": "string",
    "planner": {
      "id": "string",
      "status": "string",
      "summary": "string",
      "errors": ["string"]
    },
    "timestamp": "string"
  }
}
```

#### `judge_update`
```json
{
  "type": "judge_update",
  "payload": {
    "run_id": "string",
    "judge": {
      "status": "string",
      "summary": "string",
      "errors": ["string"]
    },
    "timestamp": "string"
  }
}
```

#### `final_plan`
```json
{
  "type": "final_plan",
  "payload": {
    "run_id": "string",
    "final_plan": "string",
    "errors": ["string"],
    "timestamp": "string"
  }
}
```

### Safety Rules
- Treat `task_brief`, planner summaries, judge summaries, and `final_plan` as untrusted text.
- Do not render untrusted HTML; render as plain text only.
- Never execute scripts or inline event handlers from any payload content.


# CLI Notes (Context7)

## Codex CLI
- Non-interactive execution: `codex exec` (or `codex e`).
- JSON streaming: `codex exec --json "..."` outputs JSON Lines events to stdout.
- Structured output: `codex exec --output-schema ./schema.json -o ./output.json "..."`.
- Final message is written to stdout; streaming activity goes to stderr.
- Model override: `codex exec -m gpt-5.2-codex -c model_reasoning_effort=xhigh "..."`.

## Claude Code
- Launch interactive agent: `claude` in the repo directory.
- Non-interactive print mode: `claude -p "query"` (prints response and exits).
- JSON output: `claude -p "query" --output-format json`.
- Schema-validated JSON: `claude -p --json-schema '<schema>' "q

What is this skill?

End-to-end pipeline: task_spec → N parallel planners → schema validation → anonymized judge → final_plan

Dedicated components: Orchestrator, AgentRunner, Validator, Anonymizer, JudgeRunner, Merger, Logger

Retries invalid or empty plans up to 2 times; judge failure falls back to heuristic best plan

Treats plan content as untrusted; redacted structured logging and local UI protocol documented

JSON schema enforcement for council_plan, judge_output, and merged final output

Up to 2 retries for invalid, empty, or timed-out planner outputs

Parallel planner processes with post-hoc JSON extraction and schema validation

Compatible agents: Codex, Claude Code, Cursor, any compatible agent

Adoption & trust: 1.2k installs on skills.sh; 941 GitHub stars; 2/3 security scanners passed (skills.sh audits).

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

Also useful

BuildAgent skills & templates

Where it fits

Example use

ValidateScope & plan

Load a task_spec for a new feature and council competing scope plans before you commit sprint work.

Example use

BuildAgent skills & templates

Design AgentRunner and Validator boundaries before wiring planner CLIs into your repo automation.

Example use

ShipCode review

Run the judge pass on anonymized plans so shipping a large refactor is backed by a merged final_plan.

Example use

OperateIteration & experiments

Replay council runs with stricter prompts when production incidents trace back to a weak original plan.

SKILL.md

READMESKILL.md - Llm Council

# LLM Council Architecture

## Components
- Orchestrator: coordinates the end-to-end run, retries, and final output assembly.
- AgentRunner: launches planner and judge CLIs in background shells and captures output.
- Validator: validates JSON payloads against schemas, extracts JSON from noisy output.
- Anonymizer: removes provider names, system prompts, IDs, file paths, and tool traces.
- JudgeRunner: formats judge input, runs judge, validates response.
- Merger: reconciles judge output into a single final plan schema.
- Logger: structured logging with redaction.

## Data Flow
1. Load `task_spec` JSON.
2. Render planner prompts and spawn N planner processes in parallel (background shells).
3. Collect raw outputs and extract JSON.
4. Validate each `council_plan` against schema.
5. Retry invalid/empty/timeout plans up to 2 times.
6. Anonymize and label as Plan 1/2/3, then randomize order.
7. Build `judge_input` and run judge.
8. Validate `judge_output` and merge into `final_plan`.
9. Emit final output + metadata + warnings.

## Failure Handling
- Timeout: mark plan as failed, retry, or proceed with fewer valid plans.
- Invalid JSON: attempt extraction, then retry with stricter prompt.
- Refusal/empty: record warning, retry once with reduced prompt size.
- Judge failure: fall back to best-scoring plan by heuristic or return top valid plan.

## UI Protocol (Local Only)
This protocol is for local UI/server integration. Treat plan content as untrusted text.

### Endpoints
- `GET /ui/state`: returns the current UI state snapshot.
- `GET /ui/events`: Server-Sent Events (SSE) stream of updates.

### State Schema (JSON)
```json
{
  "run_id": "string",
  "task_brief": "string",
  "phase": "string",
  "planners": [
    {
      "id": "string",
      "status": "string",
      "summary": "string",
      "errors": ["string"]
    }
  ],
  "judge": {
    "status": "string",
    "summary": "string",
    "errors": ["string"]
  },
  "final_plan": "string",
  "errors": ["string"],
  "timestamps": {
    "started_at": "string",
    "updated_at": "string",
    "completed_at": "string"
  }
}
```

### SSE Events
All SSE events include a `type` and `payload` field, with the payload matching the shapes below.

#### `phase_change`
```json
{
  "type": "phase_change",
  "payload": {
    "run_id": "string",
    "phase": "string",
    "timestamp": "string"
  }
}
```

#### `planner_update`
```json
{
  "type": "planner_update",
  "payload": {
    "run_id": "string",
    "planner": {
      "id": "string",
      "status": "string",
      "summary": "string",
      "errors": ["string"]
    },
    "timestamp": "string"
  }
}
```

#### `judge_update`
```json
{
  "type": "judge_update",
  "payload": {
    "run_id": "string",
    "judge": {
      "status": "string",
      "summary": "string",
      "errors": ["string"]
    },
    "timestamp": "string"
  }
}
```

#### `final_plan`
```json
{
  "type": "final_plan",
  "payload": {
    "run_id": "string",
    "final_plan": "string",
    "errors": ["string"],
    "timestamp": "string"
  }
}
```

### Safety Rules
- Treat `task_brief`, planner summaries, judge summaries, and `final_plan` as untrusted text.
- Do not render untrusted HTML; render as plain text only.
- Never execute scripts or inline event handlers from any payload content.


# CLI Notes (Context7)

## Codex CLI
- Non-interactive execution: `codex exec` (or `codex e`).
- JSON streaming: `codex exec --json "..."` outputs JSON Lines events to stdout.
- Structured output: `codex exec --output-schema ./schema.json -o ./output.json "..."`.
- Final message is written to stdout; streaming activity goes to stderr.
- Model override: `codex exec -m gpt-5.2-codex -c model_reasoning_effort=xhigh "..."`.

## Claude Code
- Launch interactive agent: `claude` in the repo directory.
- Non-interactive print mode: `claude -p "query"` (prints response and exits).
- JSON output: `claude -p "query" --output-format json`.
- Schema-validated JSON: `claude -p --json-schema '<schema>' "q

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is llm-council for?

When should I use llm-council?

Is llm-council safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is llm-council for?

When should I use llm-council?

Is llm-council safe to install?

SKILL.md