
Run
Run one autoresearch experiment iteration: load history, edit the target file, commit, and keep or discard based on evaluation.
Overview
Run is an agent skill most often used in Operate (also Build, Ship) that runs one autoresearch experiment iteration—edit, commit, evaluate, keep or discard.
Install
npx skills add https://github.com/alirezarezvani/claude-skills --skill runWhat is this skill?
- /ar:run command for a single experiment iteration with optional experiment path
- Loads config.cfg, program.md, results.tsv and checks out autoresearch/{domain}/{name} branch
- Strategy escalation across runs 1–5, 6–15, 16–30, and 30+
- One change per iteration: edit target file, commit, evaluate, keep or discard
- setup_experiment.py --list when no experiment is specified
- Strategy escalation buckets at runs 1–5, 6–15, 16–30, and 30+
Adoption & trust: 1.4k installs on skills.sh; 17.5k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have an autoresearch experiment with messy history and need one disciplined iteration instead of unfocused multi-file edits.
Who is it for?
Builders already using the autoresearch folder layout and git branches who want agent-driven single-step optimization loops.
Skip if: Greenfield features with no baseline metric, or anyone unwilling to use git branches and scripted experiment configs.
When should I use this skill?
Run a single experiment iteration: review history, decide a change, edit the target file, evaluate, keep or discard via /ar:run.
What do I get? / Deliverables
One committed change on the experiment branch is evaluated against prior results.tsv rows so you keep winners and avoid repeated failures.
- One git commit on the experiment branch
- Updated evaluation outcome recorded for the iteration
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Canonical shelf is Operate/iterate because the skill closes the loop on measured experiments after the product exists, even though it touches build-time code edits. Iterate subphase fits disciplined trial-and-error with results.tsv history and git branches rather than one-off prototyping.
Where it fits
Apply one API latency tweak to the declared target file after reading which prior commits were kept.
Run iteration 12 under systematic exploration to vary a single parameter before broader release.
Checkout autoresearch/engineering/api-speed and decide the next change from results.tsv crashes and discards.
How it compares
Use instead of ad-hoc “try this refactor” chat without logged experiment history or keep/discard rules.
Common Questions / FAQ
Who is run for?
Solo developers running structured autoresearch experiments who want the agent to perform one bounded iteration per invocation.
When should I use run?
In Operate/iterate when tuning a tracked experiment; also during Build when the target file is application code and during Ship when validating perf or reliability gains before wider rollout.
Is run safe to install?
It instructs git checkout, file edits, and python setup scripts—review repo trust and the Security Audits panel on this Prism page before enabling.
SKILL.md
READMESKILL.md - Run
# /ar:run — Single Experiment Iteration Run exactly ONE experiment iteration: review history, decide a change, edit, commit, evaluate. ## Usage ``` /ar:run engineering/api-speed # Run one iteration /ar:run # List experiments, let user pick ``` ## What It Does ### Step 1: Resolve experiment If no experiment specified, run `python {skill_path}/scripts/setup_experiment.py --list` and ask the user to pick. ### Step 2: Load context ```bash # Read experiment config cat .autoresearch/{domain}/{name}/config.cfg # Read strategy and constraints cat .autoresearch/{domain}/{name}/program.md # Read experiment history cat .autoresearch/{domain}/{name}/results.tsv # Checkout the experiment branch git checkout autoresearch/{domain}/{name} ``` ### Step 3: Decide what to try Review results.tsv: - What changes were kept? What pattern do they share? - What was discarded? Avoid repeating those approaches. - What crashed? Understand why. - How many runs so far? (Escalate strategy accordingly) **Strategy escalation:** - Runs 1-5: Low-hanging fruit (obvious improvements) - Runs 6-15: Systematic exploration (vary one parameter) - Runs 16-30: Structural changes (algorithm swaps) - Runs 30+: Radical experiments (completely different approaches) ### Step 4: Make ONE change Edit only the target file specified in config.cfg. Change one thing. Keep it simple. ### Step 5: Commit and evaluate ```bash git add {target} git commit -m "experiment: {short description of what changed}" python {skill_path}/scripts/run_experiment.py \ --experiment {domain}/{name} --single ``` ### Step 6: Report result Read the script output. Tell the user: - **KEEP**: "Improvement! {metric}: {value} ({delta} from previous best)" - **DISCARD**: "No improvement. {metric}: {value} vs best {best}. Reverted." - **CRASH**: "Evaluation failed: {reason}. Reverted." ### Step 7: Self-improvement check After every 10th experiment (check results.tsv line count), update the Strategy section of program.md with patterns learned. ## Rules - ONE change per iteration. Don't change 5 things at once. - NEVER modify the evaluator (evaluate.py). It's ground truth. - Simplicity wins. Equal performance with simpler code is an improvement. - No new dependencies.