
Minimal Run And Audit
Run documented inference, evaluation, or smoke commands and normalize evidence into standardized repro_outputs with patch notes.
Install
npx skills add https://github.com/lllllllama/ai-paper-reproduction-skill --skill minimal-run-and-auditWhat is this skill?
- README-first execution only
- Standardized repro_outputs reporting
- Patch notes when repo files change
Adoption & trust: 140k installs on skills.sh; 412 GitHub stars; 1/3 security scanners passed (skills.sh audits).
Recommended Skills
Paper Context Resolverlllllllama/ai-paper-reproduction-skill
Repo Intake And Planlllllllama/ai-paper-reproduction-skill
Env And Assets Bootstraplllllllama/ai-paper-reproduction-skill
Analyze Projectlllllllama/rigorpilot-skills
Ai Research Reproductionlllllllama/rigorpilot-skills
Run Trainlllllllama/rigorpilot-skills
Journey fit
Primary fit
Ship-stage testing verifies the chosen reproduction path and captures auditable artifacts without launching full training. Testing subphase fits short verification runs, smoke tests, and structured reporting rather than greenfield implementation.
Common Questions / FAQ
Is Minimal Run And Audit safe to install?
skills.sh reports 1 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.
SKILL.md
READMESKILL.md - Minimal Run And Audit
# minimal-run-and-audit Use the shared operating principles in `../../references/agent-operating-principles.md`; this skill should make run evidence auditable without turning every command into a rigid protocol. ## When to apply - After a reproduction target and setup plan exist. - When the main skill needs execution evidence and normalized outputs. - When a smoke test, documented inference run, documented evaluation run, or other short non-training verification is appropriate. - When the user already knows what command should be attempted and wants execution plus reporting only. ## When not to apply - During initial repo scanning. - When environment or assets are still undefined enough to make execution meaningless. - When the task is a literature lookup rather than repository execution. - When the user is still deciding which reproduction target should count as the main run. ## Clear boundaries - This skill owns normalized reporting for an attempted command. - It may receive execution evidence from the main skill or a thin helper. - It does not choose the overall target on its own. - It does not perform broad paper analysis. - It does not own training startup, resume, or long-running training state. - It should not normalize risky code edits into acceptable practice. - It must not hide changes that alter evaluation, preprocessing, checkpoints, metrics, or other scientific meaning. ## Input expectations - selected reproduction goal - runnable commands or smoke commands - environment and asset assumptions - optional patch metadata ## Output expectations - execution result summary - standardized `repro_outputs/` files - `SCIENTIFIC_CHANGELOG.md` for changed scientific meaning and evidence status - `COMPARABILITY_REPORT.md` for README/paper/baseline comparability - clear distinction between verified, partial, and blocked states - `PATCHES.md` when repo files changed ## Notes Use `references/reporting-policy.md`, `../../references/research-rigor-principles.md`, `scripts/run_command.py`, and `scripts/write_outputs.py`. #!/usr/bin/env python3 """Compatibility wrapper for trusted verify output bundles.""" from __future__ import annotations import importlib.util from pathlib import Path def load_shared_module(): module_path = Path(__file__).resolve().parents[3] / "shared" / "scripts" / "write_run_bundle.py" spec = importlib.util.spec_from_file_location("write_run_bundle", module_path) if spec is None or spec.loader is None: raise RuntimeError(f"Unable to load shared writer module from {module_path}") module = importlib.util.module_from_spec(spec) spec.loader.exec_module(module) return module def main() -> int: module = load_shared_module() return module.main(default_mode="repro", default_output_dir="repro_outputs") if __name__ == "__main__": raise SystemExit(main()) # Reporting Policy ## Tone Keep reports short, factual, and easy to audit. ## Requirements - separate facts from inferences - mention the documented command explicitly - mention whether the non-training run was full, partial, smoke-only, sanity-only, or blocked - explain the main blocker without burying it - when patches were applied, mention patch state briefly in `SUMMARY.md` and keep the full audit in `PATCHES.md` ## Output priorities 1. clear overall result 2. copyable commands 3. concise process trace 4. stable machine-rea