Do In Steps

Name: Do In Steps
Author: neolabhq

neolabhq/context-engineering-kit

Break a large refactor or multi-file change into ordered sub-agent steps with model routing and judge-gated verification before the next step runs.

Overview

do-in-steps is a journey-wide agent skill that executes complex work through sequential sub-agent orchestration with meta-judge verification—usable whenever a solo builder needs to decompose and gate multi-step agent wor

Install

npx skills add https://github.com/neolabhq/context-engineering-kit --skill do-in-steps

What is this skill?

Supervisor/orchestrator pattern: decompose complex tasks into ordered subtasks with dependency-aware sequencing
Parallel meta-judge and implementation agents per step with LLM-as-a-judge verification before proceeding
Per-step model selection (Opus/Sonnet/Haiku) based on subtask complexity
Isolated context windows plus structured handoff summaries from completed steps to the next
Zero-shot chain-of-thought prefix and self-critique baked into each sub-agent run

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 534 installs on skills.sh; 1.1k GitHub stars; 3/3 security scanners passed (skills.sh audits).

What problem does it solve?

You have a large task that spans many files or dependent steps, and one monolithic agent run loses context, skips verification, or picks the wrong model for easy vs hard subtasks.

Who is it for?

Multi-step refactors, cross-service changes, or any complex agent job where per-step verification and model tiering matter more than a single-shot answer.

Skip if: Single-file edits, trivial Q&A, or tasks where you already have an approved plan and only need one focused implementation pass without orchestration overhead.

When should I use this skill?

User supplies a complex task description (argument-hint) that benefits from sequential sub-agent execution with verification between steps.

What do I get? / Deliverables

You get an ordered sequence of verified sub-agent results with context handoffs between steps, so the parent agent can continue only after each slice passes independent judge criteria.

Ordered subtask plan
Per-step verified outputs and context summaries for downstream steps

Recommended Skills

Microsoft Foundrymicrosoft/azure-skills

Microsoft Foundry skill guides agents through the full Azure AI Foundry lifecycle—containerizing agents, pushing to ACR,…377k installs·1.2k stars

Azure Aimicrosoft/azure-skills

azure-ai is a Prism-oriented quick reference for Microsoft Azure AI work, with the published body centered on the Azure …375k installs·1.2k stars

Azure Hosted Copilot Sdkmicrosoft/azure-skills

Azure Hosted Copilot SDK is Microsoft's entry skill for repos using @github/copilot-sdk—it detects CopilotClient usage, …346k installs·1.2k stars

Lark Eventlarksuite/cli

Lark real-time subscription skill via lark-cli event consume for building bots and streaming webhook-style agent workers…208k installs·13.7k stars

Running Claude Code Via Litellm Copilotxixu-me/skills

Running Claude Code via LiteLLM Copilot walks through pointing Claude Code at a local LiteLLM proxy that forwards Anthro…200k installs·61 stars

Setup Matt Pocock Skillsmattpocock/skills

One-time per-repo setup so Matt Pocock engineering skills share correct issue tracker, triage strings, and domain docume…180k installs·121k stars

Journey fit

Useful at every journey phase - explore requirements and options before committing to a direction.

Where it fits

Example use

BuildAgent skills & templates

Refactor a service class and update all consumers with a gated sub-agent per module.

Example use

BuildIntegrations & version control

Roll out an API change across client, server, and docs with ordered steps and judge checks.

Example use

ShipCode review

Run a multi-part security or quality fix where each patch must pass verification before the next.

Example use

OperateIteration & experiments

Triage and fix a production regression spanning logging, code, and config with isolated sub-agents.

How it compares

Use instead of asking one agent to do everything in one thread without step gates or isolated sub-agent contexts.

Common Questions / FAQ

Who is do-in-steps for?

Solo and indie builders who run Claude Code or similar agents on large, dependent tasks and want supervisor-style sequencing with judges between steps.

When should I use do-in-steps?

During build when refactoring across modules, during ship when validating multi-part fixes, or during operate when iterating on production issues—anytime the task needs ordered sub-agents with verification before continuing.

Is do-in-steps safe to install?

Review the Security Audits panel on this Prism page and treat orchestration skills as high-trust: they may spawn sub-agents and run extended agent loops on your repo.

SKILL.md

READMESKILL.md - Do In Steps

# do-in-steps

<task>
Execute a complex task by decomposing it into sequential subtasks and orchestrating sub-agents to complete each step in order. Automatically analyze the task to identify dependencies, select optimal models for each subtask, pass relevant context from completed steps to subsequent ones, and verify each step with an independent judge (using a meta-judge evaluation specification) before proceeding.
</task>

<context>
This command implements the **Supervisor/Orchestrator pattern** for sequential task execution with context passing and **meta-judge → LLM-as-a-judge verification**. You (the orchestrator) analyze a complex task, decompose it into ordered subtasks, then for each step dispatch a meta-judge AND implementation agent **in parallel**. The meta-judge generates step-specific evaluation criteria while the implementation runs concurrently. Each sub-agent receives:
- **Isolated context** - Clean context window for its specific subtask
- **Optimal model** - Selected based on subtask complexity (Opus/Sonnet/Haiku)
- **Previous step context** - Summary of relevant outputs from preceding steps
- **Structured reasoning** - Zero-shot CoT prefix for systematic thinking
- **Self-critique** - Internal verification before submission
- **Structured evaluation** - Meta-judge produces tailored rubrics and checklists per step before judging occurs
- **External judge** - LLM-as-a-judge verification using meta-judge specification with iteration loop
- **Parallel speed** - Meta-judge and implementation agent run in parallel per step; meta-judge specification reused across retries within that step

</context>

**CRITICAL:** You are the orchestrator only - you MUST NOT perform the task yourself. IF you read, write or run bash tools you failed task imidiatly. It is single most critical criteria for you. If you used anyting except sub-agents you will be killed immediatly!!!! Your role is to:

1. Analyze and decompose the task
2. Select optimal models and agents for each subtask
3. **For each step: dispatch meta-judge AND implementation agent in parallel** (meta-judge FIRST in dispatch order)
4. **Wait for BOTH to complete, then dispatch judge with meta-judge's specification**
5. **Iterate if judge fails the step (max 3 retries), reusing same meta-judge specification**
6. Collect outputs and pass context forward
7. Report final results

## RED FLAGS - Never Do These

**NEVER:**

- Read implementation files to understand code details (let sub-agents do this)
- Write code or make changes to source files directly
- Skip decomposition and jump to implementation
- Perform multiple steps yourself "to save time"
- Overflow your context by reading step outputs in detail
- Read judge reports in full (only parse structured headers)
- Skip judge verification and proceed next step
- Provide score threshold to the judge in any format

**ALWAYS:**

- Use Task tool to dispatch sub-agents for ALL implementation work
- Dispatch meta-judge AND implementation agent **in parallel per step** (meta-judge FIRST in dispatch order)
- Wait for BOTH meta-judge and implementation to complete before dispatching judge
- Pass step's meta-judge evaluation specification to the judge agent
- Include `CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}` in prompts to meta-judge and judge agents
- Reuse same meta-judge specification across retries within a step (never re-run meta-judge for retries)
- Dispatch a NEW meta-judge for each new step (each step gets its own tailored specification)
- Use Task tool to dispatch **independent judges** for step verification
- Pass only necessary context summaries, not full file contents
- Get pass from judge verification before proceeding to next step
- Iterate with

What is this skill?

Supervisor/orchestrator pattern: decompose complex tasks into ordered subtasks with dependency-aware sequencing

Parallel meta-judge and implementation agents per step with LLM-as-a-judge verification before proceeding

Per-step model selection (Opus/Sonnet/Haiku) based on subtask complexity

Isolated context windows plus structured handoff summaries from completed steps to the next

Zero-shot chain-of-thought prefix and self-critique baked into each sub-agent run

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 534 installs on skills.sh; 1.1k GitHub stars; 3/3 security scanners passed (skills.sh audits).

Who is it for?

Multi-step refactors, cross-service changes, or any complex agent job where per-step verification and model tiering matter more than a single-shot answer.

Skip if: Single-file edits, trivial Q&A, or tasks where you already have an approved plan and only need one focused implementation pass without orchestration overhead.

Journey fit

Useful at every journey phase - explore requirements and options before committing to a direction.

Where it fits

Example use

BuildAgent skills & templates

Refactor a service class and update all consumers with a gated sub-agent per module.

Example use

BuildIntegrations & version control

Roll out an API change across client, server, and docs with ordered steps and judge checks.

Example use

ShipCode review

Run a multi-part security or quality fix where each patch must pass verification before the next.

Example use

OperateIteration & experiments

Triage and fix a production regression spanning logging, code, and config with isolated sub-agents.

SKILL.md

READMESKILL.md - Do In Steps

# do-in-steps

<task>
Execute a complex task by decomposing it into sequential subtasks and orchestrating sub-agents to complete each step in order. Automatically analyze the task to identify dependencies, select optimal models for each subtask, pass relevant context from completed steps to subsequent ones, and verify each step with an independent judge (using a meta-judge evaluation specification) before proceeding.
</task>

<context>
This command implements the **Supervisor/Orchestrator pattern** for sequential task execution with context passing and **meta-judge → LLM-as-a-judge verification**. You (the orchestrator) analyze a complex task, decompose it into ordered subtasks, then for each step dispatch a meta-judge AND implementation agent **in parallel**. The meta-judge generates step-specific evaluation criteria while the implementation runs concurrently. Each sub-agent receives:
- **Isolated context** - Clean context window for its specific subtask
- **Optimal model** - Selected based on subtask complexity (Opus/Sonnet/Haiku)
- **Previous step context** - Summary of relevant outputs from preceding steps
- **Structured reasoning** - Zero-shot CoT prefix for systematic thinking
- **Self-critique** - Internal verification before submission
- **Structured evaluation** - Meta-judge produces tailored rubrics and checklists per step before judging occurs
- **External judge** - LLM-as-a-judge verification using meta-judge specification with iteration loop
- **Parallel speed** - Meta-judge and implementation agent run in parallel per step; meta-judge specification reused across retries within that step

</context>

**CRITICAL:** You are the orchestrator only - you MUST NOT perform the task yourself. IF you read, write or run bash tools you failed task imidiatly. It is single most critical criteria for you. If you used anyting except sub-agents you will be killed immediatly!!!! Your role is to:

1. Analyze and decompose the task
2. Select optimal models and agents for each subtask
3. **For each step: dispatch meta-judge AND implementation agent in parallel** (meta-judge FIRST in dispatch order)
4. **Wait for BOTH to complete, then dispatch judge with meta-judge's specification**
5. **Iterate if judge fails the step (max 3 retries), reusing same meta-judge specification**
6. Collect outputs and pass context forward
7. Report final results

## RED FLAGS - Never Do These

**NEVER:**

- Read implementation files to understand code details (let sub-agents do this)
- Write code or make changes to source files directly
- Skip decomposition and jump to implementation
- Perform multiple steps yourself "to save time"
- Overflow your context by reading step outputs in detail
- Read judge reports in full (only parse structured headers)
- Skip judge verification and proceed next step
- Provide score threshold to the judge in any format

**ALWAYS:**

- Use Task tool to dispatch sub-agents for ALL implementation work
- Dispatch meta-judge AND implementation agent **in parallel per step** (meta-judge FIRST in dispatch order)
- Wait for BOTH meta-judge and implementation to complete before dispatching judge
- Pass step's meta-judge evaluation specification to the judge agent
- Include `CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}` in prompts to meta-judge and judge agents
- Reuse same meta-judge specification across retries within a step (never re-run meta-judge for retries)
- Dispatch a NEW meta-judge for each new step (each step gets its own tailored specification)
- Use Task tool to dispatch **independent judges** for step verification
- Pass only necessary context summaries, not full file contents
- Get pass from judge verification before proceeding to next step
- Iterate with

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is do-in-steps for?

When should I use do-in-steps?

Is do-in-steps safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is do-in-steps for?

When should I use do-in-steps?

Is do-in-steps safe to install?

SKILL.md