
Do In Steps
Break a large refactor or multi-file change into ordered sub-agent steps with model routing and judge-gated verification before the next step runs.
Overview
do-in-steps is a journey-wide agent skill that executes complex work through sequential sub-agent orchestration with meta-judge verification—usable whenever a solo builder needs to decompose and gate multi-step agent wor
Install
npx skills add https://github.com/neolabhq/context-engineering-kit --skill do-in-stepsWhat is this skill?
- Supervisor/orchestrator pattern: decompose complex tasks into ordered subtasks with dependency-aware sequencing
- Parallel meta-judge and implementation agents per step with LLM-as-a-judge verification before proceeding
- Per-step model selection (Opus/Sonnet/Haiku) based on subtask complexity
- Isolated context windows plus structured handoff summaries from completed steps to the next
- Zero-shot chain-of-thought prefix and self-critique baked into each sub-agent run
Adoption & trust: 534 installs on skills.sh; 1.1k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have a large task that spans many files or dependent steps, and one monolithic agent run loses context, skips verification, or picks the wrong model for easy vs hard subtasks.
Who is it for?
Multi-step refactors, cross-service changes, or any complex agent job where per-step verification and model tiering matter more than a single-shot answer.
Skip if: Single-file edits, trivial Q&A, or tasks where you already have an approved plan and only need one focused implementation pass without orchestration overhead.
When should I use this skill?
User supplies a complex task description (argument-hint) that benefits from sequential sub-agent execution with verification between steps.
What do I get? / Deliverables
You get an ordered sequence of verified sub-agent results with context handoffs between steps, so the parent agent can continue only after each slice passes independent judge criteria.
- Ordered subtask plan
- Per-step verified outputs and context summaries for downstream steps
Recommended Skills
Journey fit
Useful at every journey phase - explore requirements and options before committing to a direction.
Where it fits
Refactor a service class and update all consumers with a gated sub-agent per module.
Roll out an API change across client, server, and docs with ordered steps and judge checks.
Run a multi-part security or quality fix where each patch must pass verification before the next.
Triage and fix a production regression spanning logging, code, and config with isolated sub-agents.
How it compares
Use instead of asking one agent to do everything in one thread without step gates or isolated sub-agent contexts.
Common Questions / FAQ
Who is do-in-steps for?
Solo and indie builders who run Claude Code or similar agents on large, dependent tasks and want supervisor-style sequencing with judges between steps.
When should I use do-in-steps?
During build when refactoring across modules, during ship when validating multi-part fixes, or during operate when iterating on production issues—anytime the task needs ordered sub-agents with verification before continuing.
Is do-in-steps safe to install?
Review the Security Audits panel on this Prism page and treat orchestration skills as high-trust: they may spawn sub-agents and run extended agent loops on your repo.
SKILL.md
READMESKILL.md - Do In Steps
# do-in-steps <task> Execute a complex task by decomposing it into sequential subtasks and orchestrating sub-agents to complete each step in order. Automatically analyze the task to identify dependencies, select optimal models for each subtask, pass relevant context from completed steps to subsequent ones, and verify each step with an independent judge (using a meta-judge evaluation specification) before proceeding. </task> <context> This command implements the **Supervisor/Orchestrator pattern** for sequential task execution with context passing and **meta-judge → LLM-as-a-judge verification**. You (the orchestrator) analyze a complex task, decompose it into ordered subtasks, then for each step dispatch a meta-judge AND implementation agent **in parallel**. The meta-judge generates step-specific evaluation criteria while the implementation runs concurrently. Each sub-agent receives: - **Isolated context** - Clean context window for its specific subtask - **Optimal model** - Selected based on subtask complexity (Opus/Sonnet/Haiku) - **Previous step context** - Summary of relevant outputs from preceding steps - **Structured reasoning** - Zero-shot CoT prefix for systematic thinking - **Self-critique** - Internal verification before submission - **Structured evaluation** - Meta-judge produces tailored rubrics and checklists per step before judging occurs - **External judge** - LLM-as-a-judge verification using meta-judge specification with iteration loop - **Parallel speed** - Meta-judge and implementation agent run in parallel per step; meta-judge specification reused across retries within that step </context> **CRITICAL:** You are the orchestrator only - you MUST NOT perform the task yourself. IF you read, write or run bash tools you failed task imidiatly. It is single most critical criteria for you. If you used anyting except sub-agents you will be killed immediatly!!!! Your role is to: 1. Analyze and decompose the task 2. Select optimal models and agents for each subtask 3. **For each step: dispatch meta-judge AND implementation agent in parallel** (meta-judge FIRST in dispatch order) 4. **Wait for BOTH to complete, then dispatch judge with meta-judge's specification** 5. **Iterate if judge fails the step (max 3 retries), reusing same meta-judge specification** 6. Collect outputs and pass context forward 7. Report final results ## RED FLAGS - Never Do These **NEVER:** - Read implementation files to understand code details (let sub-agents do this) - Write code or make changes to source files directly - Skip decomposition and jump to implementation - Perform multiple steps yourself "to save time" - Overflow your context by reading step outputs in detail - Read judge reports in full (only parse structured headers) - Skip judge verification and proceed next step - Provide score threshold to the judge in any format **ALWAYS:** - Use Task tool to dispatch sub-agents for ALL implementation work - Dispatch meta-judge AND implementation agent **in parallel per step** (meta-judge FIRST in dispatch order) - Wait for BOTH meta-judge and implementation to complete before dispatching judge - Pass step's meta-judge evaluation specification to the judge agent - Include `CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}` in prompts to meta-judge and judge agents - Reuse same meta-judge specification across retries within a step (never re-run meta-judge for retries) - Dispatch a NEW meta-judge for each new step (each step gets its own tailored specification) - Use Task tool to dispatch **independent judges** for step verification - Pass only necessary context summaries, not full file contents - Get pass from judge verification before proceeding to next step - Iterate with