
Do Competitively
Run competitive parallel agent generations with meta-judge rubrics and synthesis when one-shot coding answers are not good enough for high-stakes work.
Overview
do-competitively is an agent skill most often used in Build (also Ship review, Validate scope) that runs competitive multi-agent generation, meta-judge evaluation, and synthesis for higher-quality outputs.
Install
npx skills add https://github.com/neolabhq/context-engineering-kit --skill do-competitivelyWhat is this skill?
- Generate-Critique-Synthesize (GCS) with adaptive polish, synthesize, or redesign strategies
- Meta-judge builds tailored rubrics before multi-judge evaluation
- Constitutional AI self-critique in generation and Chain-of-Verification in evaluation
- Orchestrator must not read sub-agent context files or reports to avoid context bloat
- Claims ~15–20% average cost savings via adaptive strategy selection
- 15–20% average cost savings via adaptive strategy selection (per skill doc)
Adoption & trust: 524 installs on skills.sh; 1.1k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
One agent draft is not trustworthy enough for a high-stakes task and you have no structured way to compare parallel solutions.
Who is it for?
High-stakes specs, designs, or implementations where you can describe task criteria and optional output paths upfront.
Skip if: Quick typo fixes, trivial scripts, or solo work where reading every sub-agent artifact in the main thread is acceptable.
When should I use this skill?
High-stakes task where quality matters more than speed; provide task description and optional output path or criteria.
What do I get? / Deliverables
You get a synthesized or polished result backed by multi-judge evidence, with strategy chosen to save cost versus naive re-runs.
- Synthesized or polished deliverable from competitive runs
- Evidence-backed evaluation outcome
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Primary shelf is build agent-tooling because the command orchestrates multi-agent generation and synthesis for implementation-quality outputs. Matches agent-tooling as an orchestration pattern (GCS) rather than a single integration or test script.
Where it fits
Run parallel architecture sketches and synthesize the winner before you commit to a stack.
Compete two API designs then polish or merge based on judge scores.
Multi-judge a sensitive refactor when reviewers disagree on approach.
Synthesize competing launch copy variants against a tailored rubric.
How it compares
Orchestration workflow for multi-agent quality, not a single checker skill or a plain MCP tool call.
Common Questions / FAQ
Who is do-competitively for?
Indie builders and small teams using agent stacks who need repeatable, judge-backed quality on important tasks without manually merging three chat transcripts.
When should I use do-competitively?
In validate when scoping architecture, in build for competitive implementations, and in ship review when split judges need synthesis—not for speed-first chores.
Is do-competitively safe to install?
It spawns sub-agents and may touch files per your task; check the Security Audits panel on this Prism page and scope permissions before use.
SKILL.md
READMESKILL.md - Do Competitively
# do-competitively <task> Execute tasks through competitive multi-agent generation, meta-judge evaluation specification, multi-judge evaluation, and evidence-based synthesis to produce superior results by combining the best elements from parallel implementations. </task> <context> This command implements the Generate-Critique-Synthesize (GCS) pattern with adaptive strategy selection for high-stakes tasks where quality matters more than speed. It combines competitive generation with meta-judge evaluation specification and multi-perspective evaluation, then intelligently selects the optimal synthesis strategy based on results. **Key features:** - Self-critique loops in generation (Constitutional AI) - Structured evaluation - Meta-judge produces tailored rubrics before judging - Verification loops in evaluation (Chain-of-Verification) - Adaptive strategy: polish clear winners, synthesize split decisions, redesign failures - Average 15-20% cost savings through intelligent strategy selection </context> CRITICAL: You are not implementation agent or judge, you shoudn't read files that provided as context for sub-agent or task. You shouldn't read reports, you shouldn't overwhelm your context with unneccesary information. You MUST follow process step by step. Any diviations will be considered as failure and you will be killed! ## Pattern: Generate-Critique-Synthesize (GCS) This command implements a multi-phase adaptive competitive orchestration pattern: ``` Phase 1: Competitive Generation with Self-Critique + Meta-Judge (IN PARALLEL) ┌─ Meta-Judge → Evaluation Specification YAML ───────────┐ Task ────┼─ Agent 2 → Draft → Critique → Revise → Solution B ───┐ │ ├─ Agent 3 → Draft → Critique → Revise → Solution C ───┼─┤ └─ Agent 1 → Draft → Critique → Revise → Solution A ───┘ │ │ Phase 2: Multi-Judge Evaluation with Verification │ ┌─ Judge 1 → Evaluate → Verify → Revise → Report A ─┐ │ ├─ Judge 2 → Evaluate → Verify → Revise → Report B ─┼────┤ └─ Judge 3 → Evaluate → Verify → Revise → Report C ─┘ │ │ Phase 2.5: Adaptive Strategy Selection │ Analyze Consensus ───────────────────────────────────────┤ ├─ Clear Winner? → SELECT_AND_POLISH │ ├─ All Flawed (<3.0)? → REDESIGN (return Phase 1) │ └─ Split Decision? → FULL_SYNTHESIS │ │ │ Phase 3: Evidence-Based Synthesis │ │ (Only if FULL_SYNTHESIS) │ │ Synthesizer ─────────────────────┴───────────────────────┴─→ Final Solution ``` ## Process ### Setup: Create Reports Directory Before starting, ensure the reports directory exists: ```bash mkdir -p .specs/reports ``` **Report naming convention:** `.specs/reports/{solution-name}-{YYYY-MM-DD}.[1|2|3].md` Where: - `{solution-name}` - Derived from output path (e.g., `users-api` from output `specs/api/users.md`) - `{YYYY-MM-DD}` - Current date - `[1|2|3]` - Judge number **Note:** Solutions remain in their specified output locations; only evaluation reports go to `.specs/reports/` ### Phase 1: Competitive Generation + Meta-Judge (IN PARALLEL) Launch **3 independent generator agents AND 1 meta-judge agent in parallel** (4 agents total, all recommended: Opus for quality): The meta-judge runs in parallel with the 3 generators because it does not need their output — it only needs the task description to generate evaluation criteria.