
Gan Style Harness
Run a strict Generator–Evaluator agent loop to build full applications with higher visual and functional quality than a single optimistic agent.
Overview
GAN-Style Harness is an agent skill most often used in Build (also Ship review, Build frontend) that runs a Generator–Evaluator loop for high-quality autonomous application development.
Install
npx skills add https://github.com/affaan-m/everything-claude-code --skill gan-style-harnessWhat is this skill?
- Separates Generator and ruthlessly strict Evaluator to break agent self-praise loops
- Based on Anthropic March 2026 long-running application harness design
- Targets full-stack and frontend design tasks where “AI slop” aesthetics fail the bar
- Uses Read, Write, Edit, Bash, Grep, Glob, and Task for autonomous multi-step runs
- Explicit when-NOT-to-use guardrail for quick single-file fixes
- Inspired by Anthropic harness design paper (March 24, 2026) for long-running application development
- Declares 7 agent tools: Read, Write, Edit, Bash, Grep, Glob, Task
Adoption & trust: 3.2k installs on skills.sh; 210k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
A single coding agent evaluates its own work too optimistically, so full-app builds stall at mediocre UI and missed issues.
Who is it for?
Indie builders launching full-stack or design-forward apps from a one-line spec who can fund longer agent runs.
Skip if: Quick single-file fixes, tiny config tweaks, or tasks where a standard single-agent session is sufficient.
When should I use this skill?
Building complete applications from a one-line prompt; frontend design requiring high visual quality; full-stack projects where AI slop aesthetics are unacceptable.
What do I get? / Deliverables
You get iterated application output driven by strict evaluator feedback until features and visual quality meet a higher bar.
- Iterated application codebase after evaluator passes
- Documented generator/evaluator turn history or review notes
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Long-running harnesses are adopted when you are actively building product with agents, not during initial idea sketches alone. Agent-tooling is the shelf for orchestration patterns—separate generator and evaluator roles and tool-backed iteration.
Where it fits
Spin a clickable full-stack prototype where evaluator rejects shallow UI before you commit to the stack.
Orchestrate Generator and Evaluator agents across repo tools for a multi-feature milestone.
Iterate marketing and app chrome until the evaluator flags layout, typography, or interaction issues.
Use the evaluator role as a pre-ship gate on completeness and regressions before tagging a release.
How it compares
Use as a multi-agent quality harness instead of one self-reviewing Claude session for whole-app generation.
Common Questions / FAQ
Who is gan-style-harness for?
Solo builders using agentic IDEs who need autonomous, long-running app builds with a separate strict reviewer—not casual snippet edits.
When should I use gan-style-harness?
In Build (agent-tooling) for full applications from a prompt; in Build (frontend) for high visual quality; in Ship (review) when evaluator passes gate quality before you ship.
Is gan-style-harness safe to install?
It declares broad tool use (shell, writes, subtasks); treat it as high-permission orchestration and review the Security Audits panel on this Prism page before enabling in production repos.
SKILL.md
READMESKILL.md - Gan Style Harness
# GAN-Style Harness Skill > Inspired by [Anthropic's Harness Design for Long-Running Application Development](https://www.anthropic.com/engineering/harness-design-long-running-apps) (March 24, 2026) A multi-agent harness that separates **generation** from **evaluation**, creating an adversarial feedback loop that drives quality far beyond what a single agent can achieve. ## Core Insight > When asked to evaluate their own work, agents are pathological optimists — they praise mediocre output and talk themselves out of legitimate issues. But engineering a **separate evaluator** to be ruthlessly strict is far more tractable than teaching a generator to self-critique. This is the same dynamic as GANs (Generative Adversarial Networks): the Generator produces, the Evaluator critiques, and that feedback drives the next iteration. ## When to Use - Building complete applications from a one-line prompt - Frontend design tasks requiring high visual quality - Full-stack projects that need working features, not just code - Any task where "AI slop" aesthetics are unacceptable - Projects where you want to invest $50-200 for production-quality output ## When NOT to Use - Quick single-file fixes (use standard `claude -p`) - Tasks with tight budget constraints (<$10) - Simple refactoring (use de-sloppify pattern instead) - Tasks that are already well-specified with tests (use TDD workflow) ## Architecture ``` ┌─────────────┐ │ PLANNER │ │ (Opus 4.6) │ └──────┬──────┘ │ Product Spec │ (features, sprints, design direction) ▼ ┌────────────────────────┐ │ │ │ GENERATOR-EVALUATOR │ │ FEEDBACK LOOP │ │ │ │ ┌──────────┐ │ │ │GENERATOR │--build-->│──┐ │ │(Opus 4.6)│ │ │ │ └────▲─────┘ │ │ │ │ │ │ live app │ feedback │ │ │ │ │ │ │ ┌────┴─────┐ │ │ │ │EVALUATOR │<-test----│──┘ │ │(Opus 4.6)│ │ │ │+Playwright│ │ │ └──────────┘ │ │ │ │ 5-15 iterations │ └────────────────────────┘ ``` ## The Three Agents ### 1. Planner Agent **Role:** Product manager — expands a brief prompt into a full product specification. **Key behaviors:** - Takes a one-line prompt and produces a 16-feature, multi-sprint specification - Defines user stories, technical requirements, and visual design direction - Is deliberately **ambitious** — conservative planning leads to underwhelming results - Produces evaluation criteria that the Evaluator will use later **Model:** Opus 4.6 (needs deep reasoning for spec expansion) ### 2. Generator Agent **Role:** Developer — implements features according to the spec. **Key behaviors:** - Works in structured sprints (or continuous mode with newer models) - Negotiates a "sprint contract" with the Evaluator before writing code - Uses full-stack tooling: React, FastAPI/Express, databases, CSS - Manages git for version control between iterations - Reads Evaluator feedback and incorporates it in next iteration **Model:** Opus 4.6 (needs strong coding capability) ### 3. Evaluator Agent **Role:** QA engineer — tests the live running application, not just code. **Key behaviors:** - Uses **Playwright MCP** to int