
Evolve
Run goal-driven autonomous improvement loops on your repo—measure fitness, fix the worst issue, and repeat until caps or kill switches fire.
Overview
evolve is an agent skill most often used in Operate (also Ship review and Build PM) that runs autonomous measure-fix-remeasure improvement loops against repo goals.
Install
npx skills add https://github.com/boshu2/agentops --skill evolveWhat is this skill?
- Top-level `ao evolve` entrypoint wrapping RPI supervisor loops
- Measure what's wrong, fix the worst thing, measure again—explicit compounding model
- Integrates post-mortem harvest, repo analysis, and `/rpi` for research through validation
- Output contract: code changes plus GOALS.md fitness deltas
- Kill switches: max-cycle cap, regression breaker, and operator stop conditions
- Declared dependencies: rpi, post-mortem, compile
- Output contract includes GOALS.md fitness deltas
Adoption & trust: 865 installs on skills.sh; 384 GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
Shipped work leaves quality and goal drift unaddressed because no closed loop picks the next fix and validates it automatically.
Who is it for?
Operators running AgentOps locally who want supervised autonomous iteration after post-mortems with explicit fitness tracking.
Skip if: Beginners seeking a single prompt to brainstorm features, or projects without git-backed repos and defined improvement goals.
When should I use this skill?
Triggers include evolve, improve everything, autonomous improvement, run until done, postmortem and continue, and analyze repo and keep going.
What do I get? / Deliverables
The repo receives iterated code changes and updated GOALS.md fitness until you hit a kill switch, cycle cap, or regression breaker—then invoke rpi-aligned follow-ups from harvested items.
- Iterative code changes from supervised cycles
- Updated GOALS.md fitness deltas
- Harvested follow-up work items for the next loop
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Operate/iterate is the canonical shelf because evolve is a compounding production-improvement loop, not a one-shot feature generator. Iterate fits continuous measure-fix-remeasure cycles tied to GOALS.md deltas and operator cadence after shipped work.
Where it fits
After deploy, run evolve to close the loop on the worst reliability or quality goal in GOALS.md.
Finish a post-mortem on a release and let evolve select the next fix validated through RPI.
When the backlog is stale, evolve analyzes the repo and spawns the highest-value RPI work item.
How it compares
AgentOps workflow skill for local compounding loops—not a hosted managed-agent grader and not a one-off code review checklist.
Common Questions / FAQ
Who is evolve for?
Developers using boshu2/agentops who want terminal-native autonomous improvement with RPI, post-mortem, and compile dependencies.
When should I use evolve?
In Operate when iterating on production-adjacent repos; after Ship when harvesting post-mortem findings; in Build PM when turning analysis into the next prioritized work item via supervised loops.
Is evolve safe to install?
It declares code-changing output and shell-oriented operator flows—review permissions, repo backups, and the Security Audits panel on this page before enabling autonomous loops.
Workflow Chain
Requires first: rpi, post mortem
Then invoke: rpi
SKILL.md
READMESKILL.md - Evolve
# /evolve — Goal-Driven Compounding Loop > **Cross-vendor analog:** Anthropic Managed Agents Outcomes (May 2026). Both close the loop "agent runs → grader scores against a rubric → agent retries"; AgentOps does it locally against any model. > Measure what's wrong. Fix the worst thing. Measure again. Compound. **V2 command surface:** keep the name `evolve`. Use `ao evolve` for the terminal-native loop. It is the top-level operator entrypoint for `ao rpi loop --supervisor`, preserving the old `/evolve` concept while reusing the v2 RPI loop engine. **Operator cadence:** post-mortem finished work, analyze the current repo state, select or create the next highest-value work item, let `/rpi` handle research, planning, pre-mortem, implementation, and validation, then harvest follow-ups and repeat until a kill switch, max-cycle cap, regression breaker, or real dormancy stops the run. Always-on autonomous loop over `/rpi`. Work selection order: 1. **Harvested `.agents/rpi/next-work.jsonl` work** (freshest concrete follow-up) 2. **Open ready beads work** (`bd ready`) 3. **Failing goals and directive gaps** (`ao goals measure`) 4. **Testing improvements** (missing/thin coverage, missing regression tests) 5. **Validation tightening and bug-hunt passes** (gates, audits, bug sweeps) 6. **Complexity / TODO / FIXME / drift / dead code / stale docs / stale research mining** 7. **Concrete feature suggestions** derived from repo purpose when no sharper work exists **Work generators** that feed the selection ladder (auto-invoked, skip with `--no-lifecycle`): - `Skill(skill="test", args="coverage")` → files with <40% coverage become queue items (Step 3.4) - `Skill(skill="refactor", args="--sweep all --dry-run")` → functions with CC > 20 become queue items (Step 3.6) - `Skill(skill="deps", args="audit")` → deps with CVSS >= 7.0 or 2+ major versions behind become queue items (Step 3.5) - `Skill(skill="perf", args="profile --quick")` → perf findings become queue items when hot paths detected (Step 3.5) **Dormancy is last resort.** Empty current queues mean "run the generator layers", not "stop". Only go dormant after the queue layers and generator layers come up empty across multiple consecutive passes. ```bash /evolve # Run until kill switch, max-cycles, or real dormancy /evolve --max-cycles=5 # Cap at 5 cycles /evolve --dry-run # Show what would be worked on, don't execute /evolve --beads-only # Skip goals measurement, work beads backlog only /evolve --quality # Quality-first mode: prioritize post-mortem findings /evolve --quality --max-cycles=10 # Quality mode with cycle cap /evolve --compile # Mine → Defrag warmup before first cycle /evolve --compile --max-cycles=5 # Warm knowledge base then run 5 cycles /evolve --test-first # Default strict-quality /rpi execution path /evolve --no-test-first # Explicit opt-out from test-first mode ``` ## Delineation vs /dream | Lane | Runs | Mutates code? | Mutates corpus? | Outer loop? | Budget | |------|------|---------------|-----------------|-------------|--------| | `/dream` | nightly, private local | **No** | **Yes (heavy)** | **Yes (convergence)** | wall-clock + plateau | | `/evolve` | daytime, operator-driven | Yes (via `/rpi`) | Yes (light) | Yes | cycle cap | Dream owns the knowledge compou