
Codex Autoresearch Loop
Hands off a one-sentence measurable goal so Codex autonomously modifies your repo, verifies, commits wins, and reverts failures until you stop or hit a cap.
Overview
Codex Autoresearch Loop is an agent skill most often used in Build (also Ship testing, Operate iterate) that runs an autonomous modify-verify-keep-or-revert git loop until a measurable codebase goal is reached or stopped
Install
npx skills add https://github.com/aradotso/trending-skills --skill codex-autoresearch-loopWhat is this skill?
- Autonomous modify→verify→retain-or-revert loop until interrupted or a cap
- Measurable goal in one sentence; Codex confirms plan before unattended iteration
- Successful improvements stack in git; failures revert automatically
- Generalized autoresearch beyond ML training to any software metric (coverage, types, tests)
- Install under `.agents/skills/codex-autoresearch/` via copy or `$skill-installer`
- modify→verify→keep/revert loop
- goal stated in one sentence
Adoption & trust: 1.1k installs on skills.sh; 31 GitHub stars; 0/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have a clear metric target (tests green, types clean, coverage up) but cannot sit through hundreds of agent edit-verify cycles yourself.
Who is it for?
Solo builders on Codex with a git repo and a verifier you trust (test suite, typecheck, lint score) who want stacked commits from overnight iteration.
Skip if: Subjective UX or architecture choices with no automated verifier, repos without git revert discipline, or teams that forbid unattended file writes.
When should I use this skill?
User asks to run autoresearch, iterate until tests pass, improve code overnight, set up modify-verify loop, optimize a metric continuously, or run the codex autoresearch skill.
What do I get? / Deliverables
Codex iterates unattended with wins committed and failures reverted until you interrupt, a cap triggers, or the stated measurable goal is satisfied.
- Stacked git commits from retained improvements
- Reverted failed attempts with clean working tree
- Progress toward the stated measurable goal
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
The skill is installed as agent procedural knowledge in the project and drives Codex through repeated code changes—canonical shelf is builder agent tooling during implementation. Autoresearch is a Codex-native autonomous loop (modify→verify→keep/revert), not a one-shot test runner or deploy script.
Where it fits
Run the loop to raise unit test pass rate after a large refactor without manual prompt repetition.
Keep iterating until CI-equivalent tests pass locally before opening a release PR.
Nightly runs to chip away at lint or coverage thresholds you defined as a stop condition.
How it compares
Use for closed-loop metric chasing with automatic revert—not for a single interactive fix session or a human-only code review pass.
Common Questions / FAQ
Who is codex-autoresearch-loop for?
Indie developers and small teams using Codex who want autonomous improvement loops on objective software metrics without manual babysitting.
When should I use codex-autoresearch-loop?
During Build when tightening agent-driven implementation, during Ship when iterating until tests or typechecks pass, and during Operate when continuously optimizing a tracked metric until you say stop.
Is codex-autoresearch-loop safe to install?
It performs autonomous edits and git operations; review the Security Audits panel on this Prism page and run it on a branch with backups before unattended loops on production-critical trees.
SKILL.md
READMESKILL.md - Codex Autoresearch Loop
# Codex Autoresearch > Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection. Codex Autoresearch is a Codex skill that runs an autonomous modify→verify→keep/revert loop on your codebase. You describe a measurable goal in one sentence; Codex confirms the plan, then iterates unattended — every improvement stacks in git, every failure reverts automatically — until interrupted or a cap is reached. Inspired by Karpathy's autoresearch concept, generalized beyond ML training to any software metric. --- ## Installation **Option A — manual copy into your project:** ```bash git clone https://github.com/leo-lilinxiao/codex-autoresearch.git cp -r codex-autoresearch your-project/.agents/skills/codex-autoresearch ``` **Option B — Codex skill installer:** ```text $skill-installer install https://github.com/leo-lilinxiao/codex-autoresearch ``` The skill lives at `.agents/skills/codex-autoresearch/` inside your project. No config file is required before first use. --- ## How to Activate Open Codex in your project directory and prefix your goal with `$codex-autoresearch`: ```text $codex-autoresearch I want to get rid of all `any` types in my TypeScript code ``` Codex will: 1. Scan the repo and infer scope, metric, verify command, and guard command. 2. Present a confirmation summary — reply `go` (or correct anything). 3. Run the loop unattended until you interrupt it or the goal is met. You never write config. Codex infers everything. --- ## Confirmation Flow Before the loop starts Codex always shows what it found and asks you to confirm. Example exchange: ``` Codex: I found 47 `any` occurrences across src/**/*.ts. Confirmed: - Target: eliminate `any` types in src/**/*.ts - Metric: `any` count (current: 47), direction: lower - Verify: grep + tsc --noEmit as guard Need to confirm: - Run until all gone, or cap at N iterations? Reply "go" to start, or tell me what to change. You: Go, run overnight. Codex: Starting — baseline: 47. Iterating until interrupted. ``` Up to five confirmation rounds are possible. After that, Codex proceeds. --- ## The Loop (internals) ``` PHASE 0: Probe environment (CPU/GPU/RAM/toolchains), check for session resume PHASE 1: Read context + lessons file from prior run (if any) LOOP (forever or N times): 1. Review current state, git history, results log, lessons 2. Pick ONE hypothesis (apply perspectives, filter by environment) -- or N hypotheses if parallel mode is active 3. Make ONE atomic change 4. git commit (before verification) 5. Run verify command → did the target metric improve? Run guard command → did anything else break? 6. Improved → keep (extract lesson) Worse → approved rollback strategy (git revert) Crashed → fix or skip 7. Log the result to results log 8. Health check (disk, git, verify health) 9. If 3+ discards → REFINE; 5+ → PIVOT; 2 PIVOTs → web search 10. Repeat. Never stop. Never ask. ``` The loop runs **unbounded** unless you say `Iterations: N` during confirmation. --- ## Dual-Gate Verification Two commands serve distinct purposes: | Gate | Purpose | Fails means | |------|---------|-------------| | **Verify** | Did the target metric improve? | Change discarded, reverted | | **Guard** | Did anything else break? | Change reworked (up to 2 attempts), then reverted | Guard files are **never modified** by the loop. Example veri