
Santa Method
Adversarial dual-agent review with a convergence loop before user-facing or production output ships.
Overview
Santa Method is an agent skill most often used in Ship (also Launch distribution and Ship security) that runs dual independent adversarial reviews until both pass before output ships.
Install
npx skills add https://github.com/affaan-m/everything-claude-code --skill santa-methodWhat is this skill?
- Two independent review agents with no shared generator context
- Convergence loop: both reviewers must pass or output is sent back for fixes
- Phase 1 generator (make a list) then Phase 2 dual check (check it twice)
- Targets hallucination, compliance, brand, and factual accuracy at scale
- Explicitly skip when build/test/lint already gives deterministic verification
Adoption & trust: 3.7k installs on skills.sh; 210k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
A single agent reviewing its own output repeats the same blind spots that created wrong claims, unsafe code, or off-brand copy you are about to publish.
Who is it for?
Production-bound code, customer-facing docs, regulated copy, or large batch generations where one reviewer is not enough.
Skip if: Internal drafts, exploratory research, or tasks where test, lint, and build pipelines already deterministically verify correctness.
When should I use this skill?
Output will be published, deployed, or end-user facing; compliance, regulatory, or brand rules apply; production ships without human review; hallucination or accuracy risk is elevated.
What do I get? / Deliverables
Deliverables only ship after two isolated reviewers converge on pass, with naughty findings fixed in a loop instead of one self-review.
- Converged approved artifact after dual independent review
- Iteration log of issues fixed between review rounds
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Ship is where unreviewed agent output becomes production risk; Santa Method is shelved under review as the last structured gate before release. Dual independent reviewers and iterate-until-pass maps directly to code review and content QA subphase—not exploratory drafting.
Where it fits
Run dual reviewers on a PR-sized agent diff before merging to main without a human reviewer on call.
Stress-test production config or policy text for unsafe defaults before unattended deploy.
Verify landing page claims, pricing copy, and API references before public launch.
Batch-check educational or support articles for factual drift and off-brand tone at scale.
How it compares
Multi-agent adversarial QA workflow—not a substitute for CI and not a single-pass code-review checklist skill.
Common Questions / FAQ
Who is santa-method for?
Solo builders and indie teams using multiple agent roles who need publication- or deployment-grade verification without a full human editorial bench.
When should I use santa-method?
In Ship (review/security) before production deploy or unattended releases; in Launch (distribution) for customer-facing accuracy; in Grow (content) for factual marketing or docs—when compliance, brand, or hallucination risk is high.
Is santa-method safe to install?
It orchestrates review behavior, not arbitrary shell access by itself. Review the Security Audits panel on this Prism page for the hosting repo before install.
SKILL.md
READMESKILL.md - Santa Method
# Santa Method Multi-agent adversarial verification framework. Make a list, check it twice. If it's naughty, fix it until it's nice. The core insight: a single agent reviewing its own output shares the same biases, knowledge gaps, and systematic errors that produced the output. Two independent reviewers with no shared context break this failure mode. ## When to Activate Invoke this skill when: - Output will be published, deployed, or consumed by end users - Compliance, regulatory, or brand constraints must be enforced - Code ships to production without human review - Content accuracy matters (technical docs, educational material, customer-facing copy) - Batch generation at scale where spot-checking misses systemic patterns - Hallucination risk is elevated (claims, statistics, API references, legal language) Do NOT use for internal drafts, exploratory research, or tasks with deterministic verification (use build/test/lint pipelines for those). ## Architecture ``` ┌─────────────┐ │ GENERATOR │ Phase 1: Make a List │ (Agent A) │ Produce the deliverable └──────┬───────┘ │ output ▼ ┌──────────────────────────────┐ │ DUAL INDEPENDENT REVIEW │ Phase 2: Check It Twice │ │ │ ┌───────────┐ ┌───────────┐ │ Two agents, same rubric, │ │ Reviewer B │ │ Reviewer C │ │ no shared context │ └─────┬─────┘ └─────┬─────┘ │ │ │ │ │ └────────┼──────────────┼────────┘ │ │ ▼ ▼ ┌──────────────────────────────┐ │ VERDICT GATE │ Phase 3: Naughty or Nice │ │ │ B passes AND C passes → NICE │ Both must pass. │ Otherwise → NAUGHTY │ No exceptions. └──────┬──────────────┬─────────┘ │ │ NICE NAUGHTY │ │ ▼ ▼ [ SHIP ] ┌─────────────┐ │ FIX CYCLE │ Phase 4: Fix Until Nice │ │ │ iteration++ │ Collect all flags. │ if i > MAX: │ Fix all issues. │ escalate │ Re-run both reviewers. │ else: │ Loop until convergence. │ goto Ph.2 │ └──────────────┘ ``` ## Phase Details ### Phase 1: Make a List (Generate) Execute the primary task. No changes to your normal generation workflow. Santa Method is a post-generation verification layer, not a generation strategy. ```python # The generator runs as normal output = generate(task_spec) ``` ### Phase 2: Check It Twice (Independent Dual Review) Spawn two review agents in parallel. Critical invariants: 1. **Context isolation** — neither reviewer sees the other's assessment 2. **Identical rubric** — both receive the same evaluation criteria 3. **Same inputs** — both receive the original spec AND the generated output 4. **Structured output** — each returns a typed verdict, not prose ```python REVIEWER_PROMPT = """ You are an independent quality reviewer. You have NOT seen any other review of this output. ## Task Specification {task_spec} ## Output Under Review {output} ## Evaluation Rubric {rubric} ## Instructions Evaluate the output against EACH rubric criterion. For each: - PASS: criterion fully met, no issues - FAIL: specific issue found (cite the exact problem) Return your assessment as structured JSON: { "verdict": "PASS" | "FAIL", "checks": [ {"criterion": "...", "result": "PASS|FAIL", "detail": "..."} ], "critical_issues": ["..."], // blockers that must be fixed "suggestions": ["..."] // non-blocking improvements } Be rigorous. Your job is to find problems, not to approve. """ ``` ```python # Spawn reviewers in parallel (Claude Code suba