Peer Review

Name: Peer Review
Author: juliusbrussee

juliusbrussee/cavekit

Run a second model as a critical reviewer so your primary coding agent does not rubber-stamp its own output.

Install

npx skills add https://github.com/juliusbrussee/cavekit --skill peer-review

What is this skill?

Six review modes: Diff Critique, Design Challenge, Threaded Debate, Delegated Scrutiny, Deciding Vote, Coverage Audit
Explicit mandate: reviewer finds what the builder missed—not agreement or politeness
MCP-based peer setup so any model can act as reviewer against the builder agent
Peer review iteration loops alternating builder fixes and reviewer passes
Codex Loop Mode combining Cavekit, Ralph Loop, and Codex as reviewer via CLI or MCP fallback

Adoption & trust: 15 installs on skills.sh; 1k GitHub stars; 2/3 security scanners passed (skills.sh audits).

Recommended Skills

Improve Codebase Architecturemattpocock/skills

Improve Codebase Architecture is an agent skill that teaches how to deepen a cluster of shallow modules without breaking…226k installs·121k stars

Zoom Outmattpocock/skills

Lightweight meta-prompt skill that tells the agent to zoom out and deliver a domain-aligned overview of modules and call…181k installs·121k stars

Caveman Reviewjuliusbrussee/caveman

Formats code review as single actionable lines: location, problem, fix, with minimal noise.139k installs·70k stars

Requesting Code Reviewobra/superpowers

Requesting Code Review is an agent skill from the Superpowers collection that gives solo and indie builders a copy-ready…119k installs·221k stars

Receiving Code Reviewobra/superpowers

Superpowers methodology for agents receiving code review: prioritize technical correctness over social comfort, verify e…96.2k installs·221k stars

Request Refactor Planmattpocock/skills

request-refactor-plan is a structured agent workflow for solo and small-team maintainers who want refactors filed as act…30.5k installs·121k stars

Journey fit

Primary fit

Canonical shelf is Ship review because peer review is the quality gate before merge and launch, even though the same loop applies while building features. Diff critique, design challenge, and coverage audit map directly to human code review and pre-release scrutiny subphase.

Common Questions / FAQ

Is Peer Review safe to install?

skills.sh reports 2 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

SKILL.md

READMESKILL.md - Peer Review

# Peer Review

Use a second AI agent to review and challenge the first agent's work. The peer reviewer exists to find
what the builder missed -- not to agree, not to be polite, and not to rubber-stamp. This is the
single most effective quality gate you can add beyond automated tests.

## Core Principle

> **The peer reviewer's job is to find what the builder missed, not to agree.**

A review that says "looks good" is a wasted review. The peer review model should be given explicit
instructions to be critical, to challenge assumptions, and to look for what is *not* there rather
than what is.

---

## Why Peer Review Works

LLMs have blind spots. Every model has patterns it over-relies on, edge cases it misses, and
architectural assumptions it makes implicitly. A second model -- or the same model with a different
prompt and role -- catches a different set of issues.

**The analogy:** In traditional engineering, code review exists because the author has cognitive
blind spots about their own work. The same principle applies to AI agents, but the blind spots are
different: they are systematic patterns in training data, context window limitations, and prompt
interpretation biases.

**What peer review catches that automated tests miss:**
- Architectural over-engineering or under-engineering
- Missing error handling patterns
- Security vulnerabilities the builder didn't consider
- Cavekit requirements that were technically met but poorly implemented
- Dead code, unused imports, and unnecessary complexity
- Performance pitfalls that only manifest at scale
- Missing edge cases not covered by the cavekit

---

## Review Modes

| Mode | Timing | Mechanism |
|------|--------|-----------|
| **Diff Critique** | After implementation completes | A second model inspects the changeset with a fault-finding prompt; the builder incorporates valid fixes |
| **Design Challenge** | During the planning phase | A second model proposes alternative designs; the builder evaluates both against spec requirements and selects the stronger option |
| **Threaded Debate** | When exploring complex trade-offs | Multiple exchanges occur on a persistent conversation thread so context accumulates across turns |
| **Delegated Scrutiny** | For substantial review tasks | A dedicated teammate agent manages the full peer review interaction and delivers a consolidated findings report to the lead |
| **Deciding Vote** | When two approaches conflict | The lead presents both options to the peer review model, which analyzes trade-offs and recommends a path forward |
| **Coverage Audit** | During the validation phase | Test coverage data and gap analysis are fed to the peer review model for independent assessment of testing thoroughness |

### Choosing the Right Mode

```
Need peer review
├─ Reviewing completed code?
│   ├─ Small changeset (< 500 lines) → Diff Critique
│   └─ Large changeset or full feature → Delegated Scrutiny
├─ Designing architecture?
│   ├─ Single decision point → Deciding Vote
│   └─ Full system design → Design Challenge
├─ Debating trade-offs?
│   ├─ Need extended back-and-forth → Threaded Debate
│   └─ Need a decisive answer → Dec

What is this skill?

Six review modes: Diff Critique, Design Challenge, Threaded Debate, Delegated Scrutiny, Deciding Vote, Coverage Audit

Explicit mandate: reviewer finds what the builder missed—not agreement or politeness

MCP-based peer setup so any model can act as reviewer against the builder agent

Peer review iteration loops alternating builder fixes and reviewer passes

Codex Loop Mode combining Cavekit, Ralph Loop, and Codex as reviewer via CLI or MCP fallback

Adoption & trust: 15 installs on skills.sh; 1k GitHub stars; 2/3 security scanners passed (skills.sh audits).

Journey fit

Primary fit

SKILL.md

READMESKILL.md - Peer Review

# Peer Review

Use a second AI agent to review and challenge the first agent's work. The peer reviewer exists to find
what the builder missed -- not to agree, not to be polite, and not to rubber-stamp. This is the
single most effective quality gate you can add beyond automated tests.

## Core Principle

> **The peer reviewer's job is to find what the builder missed, not to agree.**

A review that says "looks good" is a wasted review. The peer review model should be given explicit
instructions to be critical, to challenge assumptions, and to look for what is *not* there rather
than what is.

---

## Why Peer Review Works

LLMs have blind spots. Every model has patterns it over-relies on, edge cases it misses, and
architectural assumptions it makes implicitly. A second model -- or the same model with a different
prompt and role -- catches a different set of issues.

**The analogy:** In traditional engineering, code review exists because the author has cognitive
blind spots about their own work. The same principle applies to AI agents, but the blind spots are
different: they are systematic patterns in training data, context window limitations, and prompt
interpretation biases.

**What peer review catches that automated tests miss:**
- Architectural over-engineering or under-engineering
- Missing error handling patterns
- Security vulnerabilities the builder didn't consider
- Cavekit requirements that were technically met but poorly implemented
- Dead code, unused imports, and unnecessary complexity
- Performance pitfalls that only manifest at scale
- Missing edge cases not covered by the cavekit

---

## Review Modes

| Mode | Timing | Mechanism |
|------|--------|-----------|
| **Diff Critique** | After implementation completes | A second model inspects the changeset with a fault-finding prompt; the builder incorporates valid fixes |
| **Design Challenge** | During the planning phase | A second model proposes alternative designs; the builder evaluates both against spec requirements and selects the stronger option |
| **Threaded Debate** | When exploring complex trade-offs | Multiple exchanges occur on a persistent conversation thread so context accumulates across turns |
| **Delegated Scrutiny** | For substantial review tasks | A dedicated teammate agent manages the full peer review interaction and delivers a consolidated findings report to the lead |
| **Deciding Vote** | When two approaches conflict | The lead presents both options to the peer review model, which analyzes trade-offs and recommends a path forward |
| **Coverage Audit** | During the validation phase | Test coverage data and gap analysis are fed to the peer review model for independent assessment of testing thoroughness |

### Choosing the Right Mode

```
Need peer review
├─ Reviewing completed code?
│   ├─ Small changeset (< 500 lines) → Diff Critique
│   └─ Large changeset or full feature → Delegated Scrutiny
├─ Designing architecture?
│   ├─ Single decision point → Deciding Vote
│   └─ Full system design → Design Challenge
├─ Debating trade-offs?
│   ├─ Need extended back-and-forth → Threaded Debate
│   └─ Need a decisive answer → Dec

Install

What is this skill?

Recommended Skills

Journey fit

Is Peer Review safe to install?

SKILL.md

This week for builders

Install

What is this skill?

Recommended Skills

Journey fit

Is Peer Review safe to install?

SKILL.md