
Ai Paper Reproduction
Run a README-first, minimal-trustworthy reproduction of an AI paper repo with auditable `repro_outputs/` and conservative patching.
Overview
AI Paper Reproduction is an agent skill most often used in Validate (also Build integrations, Ship testing) that orchestrates README-first, minimal-trustworthy runs of AI paper repositories into an auditable `repro_outpu
Install
npx skills add https://github.com/lllllllama/ai-paper-reproduction-skill --skill ai-paper-reproductionWhat is this skill?
- Orchestrates end-to-end reproduction: intake, setup, trusted execution, optional training and gap analysis
- README-first target selection for the smallest documented inference or evaluation run
- Enforces conservative patch rules with recorded assumptions, deviations, and human decision points
- Writes a standardized auditable `repro_outputs/` evidence bundle
- Explicitly excludes literature summaries, silent protocol changes, and scratch research outside the repo
- Standardized `repro_outputs/` reporting bundle
- Multi-stage flow spanning intake, setup, execution, and optional training or gap resolution
Adoption & trust: 9.1k installs on skills.sh; 412 GitHub stars; 0/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have an AI paper repo and need defensible proof that its documented inference or eval path runs, without unbounded agent experimentation or hand-wavy summaries.
Who is it for?
Indie builders, ML engineers, or reviewers validating a paper codebase before citing metrics, demoing results, or building on top of the repo.
Skip if: Pure literature reviews, designing new benchmarks from scratch, repos with no documented reproduction path, or tasks that need broad open-ended research outside repository evidence.
When should I use this skill?
The user wants an end-to-end, minimal-trustworthy reproduction of an AI paper repository with README-grounded target selection and auditable outputs.
What do I get? / Deliverables
You get a conservative, evidence-backed reproduction run with logged deviations and a standardized `repro_outputs/` package suitable for quick human or model audit.
- Populated `repro_outputs/` evidence bundle
- Recorded assumptions, deviations, patches, and human decision points
- Minimal trustworthy execution log for the chosen inference or eval target
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Canonical shelf is Validate because the first commitment is proving the published repository path works before you treat results as product truth. Prototype subphase fits a smallest documented inference or evaluation target—not a full reimplementation—from the repo’s own scripts and configs.
Where it fits
Select the smallest documented eval script from the README and record whether metrics match the paper within stated tolerance.
Wire a verified inference entrypoint into your own agent tooling only after the orchestrator confirms the upstream command path.
Re-run the trusted execution stage after dependency bumps to catch silent numeric drift before you ship a demo.
Refresh `repro_outputs/` after upstream maintainers change configs so production claims stay tied to reproducible evidence.
How it compares
Use this structured reproduction orchestrator instead of a generic “clone and try random commands” coding session.
Common Questions / FAQ
Who is ai-paper-reproduction for?
It is for solo builders and small teams who must reproduce AI paper repositories with traceable commands, patches, and outputs—not merely summarize the paper.
When should I use ai-paper-reproduction?
Use it in Validate when proving a repo’s smallest inference or eval path; in Build when integrating verified scripts into your stack; and in Ship when you need regression-style re-runs before trusting reported numbers.
Is ai-paper-reproduction safe to install?
Check the Security Audits panel on this Prism page; reproduction flows may execute upstream training or inference code, use network installs, and modify the repo under stated patch rules—review permissions and repo trust first.
SKILL.md
READMESKILL.md - Ai Paper Reproduction
# ai-paper-reproduction ## Use when - The user wants the agent to reproduce an AI paper repository. - The target is a code repository with a README, scripts, configs, or documented commands. - The goal is a minimal trustworthy run, not unlimited experimentation. - The user needs standardized outputs that another human or model can audit quickly. - The task spans more than one stage, such as intake plus setup, or setup plus execution plus reporting. ## Do not use when - The task is a general literature review or paper summary. - The task is to design a new model, benchmark suite, or training pipeline from scratch. - The repository is not centered on AI or does not expose a documented reproduction path. - The user primarily wants a deep code refactor rather than README-first reproduction. - The user is explicitly asking for only one narrow phase that a sub-skill already covers cleanly. - The user is explicitly authorizing exploratory branch-only experimentation instead of trusted reproduction. ## Success criteria - README is treated as the primary source of reproduction intent. - A minimum trustworthy target is selected and justified. - Documented inference is preferred over evaluation, and evaluation is preferred over training. - Any repo edits remain conservative, explicit, and auditable. - Assumptions, protocol deviations, and human decision points are surfaced rather than hidden. - `repro_outputs/` is generated with consistent structure and stable machine-readable fields. - Final user-facing explanation is short and follows the user's language when practical. ## Interaction and usability policy - Keep the workflow simple enough for a new user to understand quickly. - Prefer short, concrete plans over exhaustive research. - Expose commands, assumptions, blockers, and evidence. - Avoid turning the skill into an opaque automation layer. - Preserve a low learning cost for both humans and downstream agents. ## Language policy - Human-readable Markdown outputs should follow the user's language when it is clear. - If the user's language is unclear, default to concise English. - Machine-readable fields, filenames, keys, and enum values stay in stable English. - Paths, package names, CLI commands, config keys, and code identifiers remain unchanged. See `references/language-policy.md`. ## Reproduction policy Core priority order: 1. documented inference 2. documented evaluation 3. documented training startup or partial verification 4. full training only when the user explicitly asks later Rules: - README-first: use repository files to clarify, not casually override, the README. - Aim for minimal trustworthy reproduction rather than maximum task coverage. - Treat smoke tests, startup verification, and early-step checks as valid training evidence when full training is not appropriate. - In trusted reproduction, a documented training command should first be checked through startup verification or a short monitoring window, then paused for explicit human confirmation before broader training continues. - In explicitly authorized explore-lane execution, the training record can continue without the trusted-lane confirmation pause, but it must stay isolated from