Codex Autoresearch

Name: Codex Autoresearch
Author: leo-lilinxiao

leo-lilinxiao/codex-autoresearch

Run Codex in an unattended improve-verify loop for fixes, audits, debugging, and ship-readiness instead of one-off chat turns.

Overview

codex-autoresearch is a journey-wide agent skill that runs an unattended improve-verify loop in Codex CLI—usable whenever a solo builder needs autonomous iteration toward a verifiable outcome before committing.

Install

npx skills add https://github.com/leo-lilinxiao/codex-autoresearch --skill codex-autoresearch

What is this skill?

Autonomous Modify → Verify → Keep/Discard → Repeat loop for Codex CLI
Request modes: loop, plan, debug, fix, security, ship, and exec
Loads core principles, structured output spec, and runtime hard invariants for execution modes
Session resume, environment awareness, and interaction wizard references for interactive launches
Explicitly excludes ordinary one-shot coding help or casual Q&A
7 classified request modes: loop, plan, debug, fix, security, ship, exec
4-step activation flow: classify, load core references, load situational references, execute selected mode

Compatible agents: Codex, any compatible agent

Adoption & trust: 1 installs on skills.sh; 1.8k GitHub stars; 1/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

What problem does it solve?

You need Codex to keep working toward a measurable goal with verify-and-rollback discipline, not stop after one reply.

Who is it for?

Overnight improve-verify runs, systematic debug/fix cycles, security passes, and ship-readiness with explicit verification.

Skip if: Ordinary one-shot coding help, casual Q&A, or tasks without a clear verify signal or user-approved autonomous scope.

When should I use this skill?

User wants Codex to plan or run an unattended improve-verify loop toward a measurable outcome, especially overnight; includes repeated debugging, fixing, security auditing, and ship-readiness—not ordinary one-shot coding

What do I get? / Deliverables

Codex runs classified long-running iterations with structured outputs and invariants until verification passes or you resume/control an existing session.

Structured iteration output per references
Kept changes that pass verification
Resumable session state when using resume protocol

Recommended Skills

Microsoft Foundrymicrosoft/azure-skills

Microsoft Foundry skill guides agents through the full Azure AI Foundry lifecycle—containerizing agents, pushing to ACR,…377k installs·1.2k stars

Azure Aimicrosoft/azure-skills

azure-ai is a Prism-oriented quick reference for Microsoft Azure AI work, with the published body centered on the Azure …375k installs·1.2k stars

Azure Hosted Copilot Sdkmicrosoft/azure-skills

Azure Hosted Copilot SDK is Microsoft's entry skill for repos using @github/copilot-sdk—it detects CopilotClient usage, …346k installs·1.2k stars

Lark Eventlarksuite/cli

Lark real-time subscription skill via lark-cli event consume for building bots and streaming webhook-style agent workers…208k installs·13.7k stars

Running Claude Code Via Litellm Copilotxixu-me/skills

Running Claude Code via LiteLLM Copilot walks through pointing Claude Code at a local LiteLLM proxy that forwards Anthro…200k installs·61 stars

Setup Matt Pocock Skillsmattpocock/skills

One-time per-repo setup so Matt Pocock engineering skills share correct issue tracker, triage strings, and domain docume…180k installs·121k stars

Journey fit

Useful at every journey phase - explore requirements and options before committing to a direction.

Where it fits

Example use

BuildBackend, data & payments

Run a fix/exec loop until tests pass after a regression you cannot close in one chat turn.

Example use

ShipCode review

Iterate modify-verify until review criteria and structured output spec are satisfied.

Example use

ShipSecurity

Classify as security mode and load runtime invariants before an autonomous audit pass.

Example use

ShipCI/CD & deploy

Use ship mode to converge on launch checklist items with verifiable completion signals.

Example use

OperateError tracking

Resume an existing session to debug recurring production failures with keep/discard on each patch.

How it compares

Codex-oriented autonomous loop orchestration—not a single-purpose linter skill or a hosted CI product.

Common Questions / FAQ

Who is codex-autoresearch for?

Solo builders using Codex CLI who want unattended or long-session iteration with explicit verify and keep/discard semantics.

When should I use codex-autoresearch?

During Build for sustained fix/debug loops; during Ship for review, security auditing, and ship-readiness; during Operate when repeated verify-fix cycles track production issues—whenever one-shot chat is not enough.

Is codex-autoresearch safe to install?

Execution modes can change code and run verification; review the Security Audits panel on this Prism page and scope permissions before unattended runs.

SKILL.md

READMESKILL.md - Codex Autoresearch

# codex-autoresearch

Autonomous goal-directed iteration. Modify -> Verify -> Keep/Discard -> Repeat.

## When Activated

1. Classify the request as `loop`, `plan`, `debug`, `fix`, `security`, `ship`, or `exec`, and parse any inline config from the prompt.
2. Load `references/core-principles.md` and `references/structured-output-spec.md`. For active execution modes (`loop`, `debug`, `fix`, `security`, `ship`, `exec`), also load `references/runtime-hard-invariants.md`.
3. Load only the additional references the current situation needs:
   - `references/session-resume-protocol.md` for every interactive launch or existing-run control path, before deciding fresh vs resumable
   - `references/environment-awareness.md` before choosing hardware-sensitive work
   - `references/interaction-wizard.md` for every new interactive launch (`loop`, `debug`, `fix`, `security`, `ship`) before execution begins
   - `references/results-logging.md` only when debugging TSV/state semantics or helper behavior directly
4. Load the selected mode workflow reference plus only the detailed cross-cutting protocols that actually apply (`lessons`, `pivot`, `health-check`, `parallel`, `web-search`, `hypothesis-perspectives`).
5. Use the bundled helper scripts when stateful artifacts or runtime control are involved. Resolve them relative to the loaded skill bundle root (`<skill-root>/scripts/...`), not the target repo root. In the common repo-local install this means commands such as `python3 .agents/skills/codex-autoresearch/scripts/autoresearch_init_run.py --repo <primary_repo> --workspace-root <workspace_root> ...`. New-run helpers (`autoresearch_init_run.py` and `autoresearch_runtime_ctl.py launch/create-launch`) require both `--repo <primary_repo>` and `--workspace-root <workspace_root>`. Existing-run control-plane helpers (`autoresearch_resume_check.py`, `autoresearch_resume_prompt.py`, `autoresearch_supervisor_status.py`, `autoresearch_health_check.py`, `autoresearch_runtime_ctl.py status/stop/start`) require `--repo <primary_repo>` and resolve the workspace-owned Results directory from the repo-local pointer plus canonical context. `autoresearch_launch_gate.py --repo <primary_repo>` is the pre-wizard gate: it returns `fresh` for a clean repo with no prior artifacts and otherwise uses the same pointer/context recovery path.
6. Execute the selected workflow exactly as written and produce the required structured output and artifacts.

## Core Loop

1. Read the relevant context.
2. Define a mechanical success metric.
3. Establish a baseline.
4. Make one focused change.
5. Verify with a command.
6. Keep or discard the change.
7. Log the result.
8. Repeat.

## Modes

| Mode | Purpose | Primary Reference |
|------|---------|-------------------|
| `loop` | Run the autonomous improvement loop | `references/loop-workflow.md` |
| `plan` | Convert a vague goal into a launch-ready config | `references/plan-workflow.md` |
| `debug` | Hunt bugs with evidence and hypotheses | `references/debug-workflow.md` |
| `fix` | Iteratively reduce errors to zero | `references/fix-workflow.md` |
| `security` | Run a structured security audit | `references/security-workflow.md` |
| `ship` | Gate and execute a ship workflow | `references/ship-workflow.md` |
| `exec` | Non-interactive CI/CD mode with JSON output | `references/exec-workflow.md` |

Use `Mode: <name>` in the prompt to force a specific subworkflow.

## Required Config

For the generic loop, the following fields are needed internally. Codex infers them fro

What is this skill?

Autonomous Modify → Verify → Keep/Discard → Repeat loop for Codex CLI

Request modes: loop, plan, debug, fix, security, ship, and exec

Loads core principles, structured output spec, and runtime hard invariants for execution modes

Session resume, environment awareness, and interaction wizard references for interactive launches

Explicitly excludes ordinary one-shot coding help or casual Q&A

7 classified request modes: loop, plan, debug, fix, security, ship, exec

4-step activation flow: classify, load core references, load situational references, execute selected mode

Compatible agents: Codex, any compatible agent

Adoption & trust: 1 installs on skills.sh; 1.8k GitHub stars; 1/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

Journey fit

Useful at every journey phase - explore requirements and options before committing to a direction.

Where it fits

Example use

BuildBackend, data & payments

Run a fix/exec loop until tests pass after a regression you cannot close in one chat turn.

Example use

ShipCode review

Iterate modify-verify until review criteria and structured output spec are satisfied.

Example use

ShipSecurity

Classify as security mode and load runtime invariants before an autonomous audit pass.

Example use

ShipCI/CD & deploy

Use ship mode to converge on launch checklist items with verifiable completion signals.

Example use

OperateError tracking

Resume an existing session to debug recurring production failures with keep/discard on each patch.

SKILL.md

READMESKILL.md - Codex Autoresearch

# codex-autoresearch

Autonomous goal-directed iteration. Modify -> Verify -> Keep/Discard -> Repeat.

## When Activated

1. Classify the request as `loop`, `plan`, `debug`, `fix`, `security`, `ship`, or `exec`, and parse any inline config from the prompt.
2. Load `references/core-principles.md` and `references/structured-output-spec.md`. For active execution modes (`loop`, `debug`, `fix`, `security`, `ship`, `exec`), also load `references/runtime-hard-invariants.md`.
3. Load only the additional references the current situation needs:
   - `references/session-resume-protocol.md` for every interactive launch or existing-run control path, before deciding fresh vs resumable
   - `references/environment-awareness.md` before choosing hardware-sensitive work
   - `references/interaction-wizard.md` for every new interactive launch (`loop`, `debug`, `fix`, `security`, `ship`) before execution begins
   - `references/results-logging.md` only when debugging TSV/state semantics or helper behavior directly
4. Load the selected mode workflow reference plus only the detailed cross-cutting protocols that actually apply (`lessons`, `pivot`, `health-check`, `parallel`, `web-search`, `hypothesis-perspectives`).
5. Use the bundled helper scripts when stateful artifacts or runtime control are involved. Resolve them relative to the loaded skill bundle root (`<skill-root>/scripts/...`), not the target repo root. In the common repo-local install this means commands such as `python3 .agents/skills/codex-autoresearch/scripts/autoresearch_init_run.py --repo <primary_repo> --workspace-root <workspace_root> ...`. New-run helpers (`autoresearch_init_run.py` and `autoresearch_runtime_ctl.py launch/create-launch`) require both `--repo <primary_repo>` and `--workspace-root <workspace_root>`. Existing-run control-plane helpers (`autoresearch_resume_check.py`, `autoresearch_resume_prompt.py`, `autoresearch_supervisor_status.py`, `autoresearch_health_check.py`, `autoresearch_runtime_ctl.py status/stop/start`) require `--repo <primary_repo>` and resolve the workspace-owned Results directory from the repo-local pointer plus canonical context. `autoresearch_launch_gate.py --repo <primary_repo>` is the pre-wizard gate: it returns `fresh` for a clean repo with no prior artifacts and otherwise uses the same pointer/context recovery path.
6. Execute the selected workflow exactly as written and produce the required structured output and artifacts.

## Core Loop

1. Read the relevant context.
2. Define a mechanical success metric.
3. Establish a baseline.
4. Make one focused change.
5. Verify with a command.
6. Keep or discard the change.
7. Log the result.
8. Repeat.

## Modes

| Mode | Purpose | Primary Reference |
|------|---------|-------------------|
| `loop` | Run the autonomous improvement loop | `references/loop-workflow.md` |
| `plan` | Convert a vague goal into a launch-ready config | `references/plan-workflow.md` |
| `debug` | Hunt bugs with evidence and hypotheses | `references/debug-workflow.md` |
| `fix` | Iteratively reduce errors to zero | `references/fix-workflow.md` |
| `security` | Run a structured security audit | `references/security-workflow.md` |
| `ship` | Gate and execute a ship workflow | `references/ship-workflow.md` |
| `exec` | Non-interactive CI/CD mode with JSON output | `references/exec-workflow.md` |

Use `Mode: <name>` in the prompt to force a specific subworkflow.

## Required Config

For the generic loop, the following fields are needed internally. Codex infers them fro

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is codex-autoresearch for?

When should I use codex-autoresearch?

Is codex-autoresearch safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is codex-autoresearch for?

When should I use codex-autoresearch?

Is codex-autoresearch safe to install?

SKILL.md