
Evolving Ai Agents
Run benchmark-driven evolution cycles on coding or terminal agents with the A-Evolve `agent_evolve` API and built-in seeds like swe, terminal, and mcp.
Overview
evolving-ai-agents is an agent skill most often used in Build (also Ship testing, Operate iterate) that documents the A-Evolve `Evolver` API for benchmark-driven agent evolution.
Install
npx skills add https://github.com/orchestra-research/ai-research-skills --skill evolving-ai-agentsWhat is this skill?
- `ae.Evolver` entry point with `run(cycles)` returning `EvolutionResult`
- Built-in agent seeds: swe, terminal, mcp—or custom `BaseAgent` / workspace paths
- Built-in benchmarks: swe-verified, mcp-atlas, terminal2, skill-bench, arc-agi-3 via `BenchmarkAdapter`
- Seed workspaces copied to a working directory with manifest checks for `entrypoint` and `evolvable_layers`
- Custom `EvolutionEngine` hook (default `AEvolveEngine`) and `EvolveConfig` tuning
- 5 built-in benchmark names documented
- 3 built-in agent seed names: swe, terminal, mcp
Adoption & trust: 1 installs on skills.sh; 9.4k GitHub stars; 1/3 security scanners passed (skills.sh audits).
What problem does it solve?
Your agent plateaus on hand-tuned prompts and you lack a documented loop to evolve layers against SWE, terminal, or MCP benchmarks.
Who is it for?
Indie builders experimenting with self-improving agents who already run Python and want named benchmarks plus manifest-validated seed workspaces.
Skip if: Beginners who only need a one-shot Claude skill without local Python evolution infrastructure or benchmark maintenance.
When should I use this skill?
User is implementing or debugging A-Evolve / agent_evolve Evolver setup, seeds, benchmarks, or evolution cycles.
What do I get? / Deliverables
You can configure `ae.Evolver` with seeds, benchmarks, and cycles so evolved workspace state reflects measurable benchmark gains instead of guesswork.
- Configured Evolver run
- EvolutionResult after benchmark cycles
- Validated agent workspace with entrypoint and evolvable_layers
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Canonical shelf is Build / agent-tooling because the reference documents configuring Evolver, benchmarks, and evolvable layers—the core work of improving agent behavior. Agent-tooling is where you wire `ae.Evolver`, pick benchmarks (swe-verified, mcp-atlas, terminal2, skill-bench, arc-agi-3), and manage workspace copies from seed manifests.
Where it fits
Copy the swe seed workspace and point Evolver at swe-verified before merging evolvable prompt layers.
Re-run evolution cycles against terminal2 after a harness change to confirm scores still improve.
Schedule additional Evolver cycles when MCP-Atlas regressions show up in production traces.
How it compares
API reference for an evolution engine—not a single-turn codegen skill or an MCP data connector.
Common Questions / FAQ
Who is evolving-ai-agents for?
Solo developers building or tuning autonomous agents who need the `agent_evolve` Evolver API, built-in seeds, and benchmark adapters spelled out in one place.
When should I use evolving-ai-agents?
In Build when wiring evolution into your agent repo; in Ship when regression-testing evolved layers against swe-verified or mcp-atlas; in Operate when iterating cycles after production failures.
Is evolving-ai-agents safe to install?
Evolution runs local workspaces and benchmarks that may execute agent code; review the Security Audits panel on this page and sandbox runs before trusting evolved entrypoints.
SKILL.md
READMESKILL.md - Evolving Ai Agents
# A-Evolve API Reference ## Top-Level Module: `agent_evolve` ```python import agent_evolve as ae ``` ### `ae.Evolver` Main entry point for running evolution. ```python class Evolver: def __init__( self, agent: str | BaseAgent, benchmark: str | BenchmarkAdapter, config: EvolveConfig | None = None, engine: EvolutionEngine | None = None, workspace_dir: str | None = None, ): ... def run(self, cycles: int | None = None) -> EvolutionResult: ... ``` **Parameters**: - `agent`: One of: - Built-in seed name: `"swe"`, `"terminal"`, `"mcp"` - Path to workspace directory: `"./my-agent"` - `BaseAgent` instance - `benchmark`: One of: - Built-in name: `"swe-verified"`, `"mcp-atlas"`, `"terminal2"`, `"skill-bench"`, `"arc-agi-3"` - `BenchmarkAdapter` instance - `config`: Evolution configuration. Defaults to `EvolveConfig()`. - `engine`: Custom evolution engine. Defaults to `AEvolveEngine`. - `workspace_dir`: Override working directory for evolved state. **Resolution logic**: - String agent names are matched against built-in seed workspaces, then treated as paths - Seed workspaces are copied to a working directory before evolution begins - Manifest validation ensures `entrypoint` and `evolvable_layers` are present --- ## Core Types: `agent_evolve.types` ### `Task` ```python @dataclass class Task: id: str # Unique identifier input: str # Task description or input data metadata: dict = field(default_factory=dict) # Extra context ``` ### `Trajectory` ```python @dataclass class Trajectory: task_id: str # Matches Task.id output: str # Agent's final answer/patch/action steps: list[dict] = field(default_factory=list) # Tool calls conversation: list[dict] = field(default_factory=list) # Full messages ``` ### `Feedback` ```python @dataclass class Feedback: success: bool # Binary pass/fail score: float # 0.0 to 1.0 continuous score detail: str = "" # Human-readable explanation raw: dict = field(default_factory=dict) # Benchmark-specific data ``` ### `Observation` ```python @dataclass class Observation: task: Task trajectory: Trajectory feedback: Feedback ``` ### `SkillMeta` ```python @dataclass class SkillMeta: name: str # Unique skill identifier description: str # What it does and when to trigger path: str # Filesystem path to SKILL.md ``` ### `StepResult` ```python @dataclass class StepResult: mutated: bool # Whether workspace was changed summary: str # Description of changes metadata: dict = field(default_factory=dict) ``` ### `CycleRecord` ```python @dataclass class CycleRecord: cycle: int # Cycle number score: float # Average score this cycle mutated: bool # Whether workspace was changed engine_name: str = "" # Name of the engine used summary: str = "" # What the engine did observation_batch: str = "" # Path to observation JSONL metadata: dict = field(default_factory=dict) ``` ### `EvolutionResult` ```python @dataclass class EvolutionResult: cycles_completed: int final_score: float score_history: list[float] = field(default_factory=list) # Score per cycle converged: bool = False details: dict = field(default_factory=dict) ``` --- ## Protocol: `agent_evolve.protocol.base_agent` ### `BaseAgent` ```python class BaseAgent: def __init__(self, workspace_dir: str | Path): ... def solve(self, task: Task) -> Trajectory: """Override: solve a single task and return trajectory.""" raise NotImplementedError def reload_from_fs(self): """Re-read prompts, skills, memory from workspace after evolution.""" ... def export_to_fs(se