Peer Review Loop

Name: Peer Review Loop
Author: juliusbrussee

juliusbrussee/cavekit

Run a Cavekit spec through a Ralph build loop where Codex adversarially reviews Claude’s implementation each iteration.

Install

npx skills add https://github.com/juliusbrussee/cavekit --skill peer-review-loop

What is this skill?

Claude implements from Cavekit specs while Codex acts as an adversarial peer reviewer
Primary path: Codex CLI via codex-review.sh; MCP server fallback when CLI delegation is unavailable
Explicit iteration, convergence detection, and completion criteria for Ralph-style loops
Peer reviewer enforces cavekit compliance to reduce silent spec drift across iterations
Two-model setup targets different blind spots than single-model self-review

Adoption & trust: 10 installs on skills.sh; 1k GitHub stars; 3/3 security scanners passed (skills.sh audits).

Recommended Skills

Microsoft Foundrymicrosoft/azure-skills

Microsoft Foundry skill guides agents through the full Azure AI Foundry lifecycle—containerizing agents, pushing to ACR,…377k installs·1.2k stars

Azure Aimicrosoft/azure-skills

azure-ai is a Prism-oriented quick reference for Microsoft Azure AI work, with the published body centered on the Azure …375k installs·1.2k stars

Azure Hosted Copilot Sdkmicrosoft/azure-skills

Azure Hosted Copilot SDK is Microsoft's entry skill for repos using @github/copilot-sdk—it detects CopilotClient usage, …346k installs·1.2k stars

Lark Eventlarksuite/cli

Lark real-time subscription skill via lark-cli event consume for building bots and streaming webhook-style agent workers…208k installs·13.7k stars

Running Claude Code Via Litellm Copilotxixu-me/skills

Running Claude Code via LiteLLM Copilot walks through pointing Claude Code at a local LiteLLM proxy that forwards Anthro…200k installs·61 stars

Setup Matt Pocock Skillsmattpocock/skills

One-time per-repo setup so Matt Pocock engineering skills share correct issue tracker, triage strings, and domain docume…180k installs·121k stars

Journey fit

Primary fit

Cross-model review is the canonical shelf in Ship/review because quality gates sit after implementation work. Peer review loops are structured code review and spec compliance checks, not one-off chat opinions.

Common Questions / FAQ

Is Peer Review Loop safe to install?

skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

SKILL.md

READMESKILL.md - Peer Review Loop

# Peer Review Loop — Cavekit + Ralph Loop + Codex Peer reviewer

Run a Cavekit cavekit through a Ralph Loop where Claude builds and Codex adversarially reviews.
This is the most rigorous automated quality process available: every few iterations, a completely
different model (different training data, different biases, different blind spots) challenges
your implementation.

---

## Why This Works

| Factor | Single-Model Loop | Peer Review Loop |
|--------|-------------------|------------------|
| Blind spots | Same model, same blind spots every iteration | Two models catch different classes of issues |
| Cavekit drift | Builder may silently deviate from cavekit | Peer reviewer checks cavekit compliance explicitly |
| Quality floor | Converges to "good enough for one model" | Converges to "survives cross-examination" |
| Dead ends | May retry failed approaches | Peer reviewer flags repeated patterns |

---

## Architecture

```
┌─────────────────────────────────────────────────────┐
│                   Ralph Loop                         │
│  (Stop hook feeds same prompt each iteration)        │
│                                                      │
│  ┌──────────┐    ┌──────────────┐    ┌────────────┐ │
│  │  Claude   │───▶│ Build from   │───▶│  Commit    │ │
│  │  (Build)  │    │ cavekit +  │    │  changes   │ │
│  └──────────┘    └──────────────┘    └──────┬─────┘ │
│       ▲                                      │       │
│       │                                      ▼       │
│  ┌──────────┐    ┌──────────────┐    ┌────────────┐ │
│  │  Fix      │◀──│ Parse        │◀──│  Codex CLI │ │
│  │  findings │    │ findings     │    │  (Review)  │ │
│  └──────────┘    └──────────────┘    └────────────┘ │
│                                                      │
│  Completion: all cavekit requirements met +         │
│              no CRITICAL/HIGH findings               │
└─────────────────────────────────────────────────────┘
```

### Review Invocation: Codex CLI (primary) vs MCP (legacy)

The peer review loop supports two invocation paths:

1. **Codex CLI delegation (primary)** — Uses `scripts/codex-review.sh` which
   calls `codex` directly in `--approval-mode full-auto` with a structured
   review prompt. Faster, no MCP server overhead, findings are parsed and
   appended to `context/impl/impl-review-findings.md` automatically.

2. **MCP server (legacy fallback)** — Configures Codex as an MCP server in
   `.mcp.json`. Claude calls the MCP tool on review iterations. Used only when
   Codex CLI delegation is unavailable (e.g., older Codex versions).

The build script (`setup-build.sh`) auto-detects which path to use: if
`codex-review.sh` is present and `codex` CLI is available, it uses CLI
delegation. Otherwise it falls back to MCP configuration.

---

## Quick Start

```bash
# Basic: implement a cavekit with peer review
/ck:peer-review-loop context/kits/cavekit-auth.md

# With options
/ck:peer-review-loop context/kits/cavekit-api.md --max-iterations 20 --codex-model gpt-5.4-mini

# Review-only mode (review existing code, don't build new)
/ck:peer-review-loop context/kits/cavekit-api.md --review-only

# Review every iteration instead of every 2nd
/ck:peer-review-loop context/kits/cavekit-auth.md --review-interval 1
```

---

## What the Command Does

1. **Validates** the cavekit file exists and Codex CLI is

What is this skill?

Claude implements from Cavekit specs while Codex acts as an adversarial peer reviewer

Primary path: Codex CLI via codex-review.sh; MCP server fallback when CLI delegation is unavailable

Explicit iteration, convergence detection, and completion criteria for Ralph-style loops

Peer reviewer enforces cavekit compliance to reduce silent spec drift across iterations

Two-model setup targets different blind spots than single-model self-review

Adoption & trust: 10 installs on skills.sh; 1k GitHub stars; 3/3 security scanners passed (skills.sh audits).

SKILL.md

READMESKILL.md - Peer Review Loop

# Peer Review Loop — Cavekit + Ralph Loop + Codex Peer reviewer

Run a Cavekit cavekit through a Ralph Loop where Claude builds and Codex adversarially reviews.
This is the most rigorous automated quality process available: every few iterations, a completely
different model (different training data, different biases, different blind spots) challenges
your implementation.

---

## Why This Works

| Factor | Single-Model Loop | Peer Review Loop |
|--------|-------------------|------------------|
| Blind spots | Same model, same blind spots every iteration | Two models catch different classes of issues |
| Cavekit drift | Builder may silently deviate from cavekit | Peer reviewer checks cavekit compliance explicitly |
| Quality floor | Converges to "good enough for one model" | Converges to "survives cross-examination" |
| Dead ends | May retry failed approaches | Peer reviewer flags repeated patterns |

---

## Architecture

```
┌─────────────────────────────────────────────────────┐
│                   Ralph Loop                         │
│  (Stop hook feeds same prompt each iteration)        │
│                                                      │
│  ┌──────────┐    ┌──────────────┐    ┌────────────┐ │
│  │  Claude   │───▶│ Build from   │───▶│  Commit    │ │
│  │  (Build)  │    │ cavekit +  │    │  changes   │ │
│  └──────────┘    └──────────────┘    └──────┬─────┘ │
│       ▲                                      │       │
│       │                                      ▼       │
│  ┌──────────┐    ┌──────────────┐    ┌────────────┐ │
│  │  Fix      │◀──│ Parse        │◀──│  Codex CLI │ │
│  │  findings │    │ findings     │    │  (Review)  │ │
│  └──────────┘    └──────────────┘    └────────────┘ │
│                                                      │
│  Completion: all cavekit requirements met +         │
│              no CRITICAL/HIGH findings               │
└─────────────────────────────────────────────────────┘
```

### Review Invocation: Codex CLI (primary) vs MCP (legacy)

The peer review loop supports two invocation paths:

1. **Codex CLI delegation (primary)** — Uses `scripts/codex-review.sh` which
   calls `codex` directly in `--approval-mode full-auto` with a structured
   review prompt. Faster, no MCP server overhead, findings are parsed and
   appended to `context/impl/impl-review-findings.md` automatically.

2. **MCP server (legacy fallback)** — Configures Codex as an MCP server in
   `.mcp.json`. Claude calls the MCP tool on review iterations. Used only when
   Codex CLI delegation is unavailable (e.g., older Codex versions).

The build script (`setup-build.sh`) auto-detects which path to use: if
`codex-review.sh` is present and `codex` CLI is available, it uses CLI
delegation. Otherwise it falls back to MCP configuration.

---

## Quick Start

```bash
# Basic: implement a cavekit with peer review
/ck:peer-review-loop context/kits/cavekit-auth.md

# With options
/ck:peer-review-loop context/kits/cavekit-api.md --max-iterations 20 --codex-model gpt-5.4-mini

# Review-only mode (review existing code, don't build new)
/ck:peer-review-loop context/kits/cavekit-api.md --review-only

# Review every iteration instead of every 2nd
/ck:peer-review-loop context/kits/cavekit-auth.md --review-interval 1
```

---

## What the Command Does

1. **Validates** the cavekit file exists and Codex CLI is

Install

What is this skill?

Recommended Skills

Journey fit

Is Peer Review Loop safe to install?

SKILL.md

This week for builders

Install

What is this skill?

Recommended Skills

Journey fit

Is Peer Review Loop safe to install?

SKILL.md