
Run Train
Run a already-chosen deep-learning training command with conservative startup, short-run, full kickoff, or resume checks while writing command, config, seed, logs, checkpoints, status, and metrics int
Overview
run-train is an agent skill most often used in Build (also Operate, Ship) that executes a pre-selected deep-learning training command conservatively and writes normalized training evidence to train_outputs/.
Install
npx skills add https://github.com/lllllllama/rigorpilot-skills --skill run-trainWhat is this skill?
- Conservative modes: startup verification, short-run verification, full kickoff, and resume with bounded evidence
- Standardized train_outputs/ for command, config, seed, log, checkpoint, status, and metric reporting
- Explicit out-of-scope: environment setup, inference-only runs, sweeps, and speculative idea implementation
- Shared RigorPilot operating principles keep monitoring repo-specific while evidence stays normalized
- Does not select commands or perform repository intake—executes a documented or selected command only
- Writes evidence to standardized train_outputs/ including command, config, seed, log, checkpoint, status, and metrics
Adoption & trust: 32.3k installs on skills.sh; 412 GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have a training command from the repo docs but need bounded, auditable execution without your agent improvising sweeps, setup, or speculative code changes.
Who is it for?
Researchers reproducing a paper repo who want verified kickoff or resume with standardized logs and checkpoints before trusting a long job.
Skip if: First-time repo scans, conda/pip environment installs, multi-variant hyperparameter sweeps, inference-only demos, or tasks still missing a chosen training command.
When should I use this skill?
When the training command has already been selected and should be executed conservatively for startup verification, short-run verification, full kickoff, or resume with structured reporting.
What do I get? / Deliverables
A conservative training run completes with structured status, checkpoint, and metric evidence under train_outputs/, ready for resume or safe-debug if something breaks.
- train_outputs/ bundle with training status, checkpoints, and metric evidence
- Conservative run record matching startup, short-run, full, or resume mode
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Training execution is the core build step after a reproduction path is chosen; canonical shelf is backend ML pipelines even though verification touches ship-style gates. Backend subphase fits long-running experiment execution and checkpoint/metric artifacts rather than frontend or docs work.
Where it fits
Smoke a documented finetune command with a short-run verification before committing GPU hours to full training.
Kick off full pretrain or finetune after README command selection with seed, config, and metric files under train_outputs/.
Run startup verification to confirm the training script launches without shape or device errors before a production-scale schedule.
Resume from the last checkpoint with structured status reporting instead of manually guessing resume flags.
How it compares
Use instead of asking the agent to freestyle `python train.py` without standardized evidence folders or conservative verification modes.
Common Questions / FAQ
Who is run-train for?
Solo builders and indie researchers using agentic coding on deep-learning repositories who need disciplined training execution with checkpoint and metric evidence, not open-ended experimentation.
When should I use run-train?
Use it in Build when kicking off or resuming training after a command is documented; use short-run verification before a long Operate-style job; avoid it during Validate intake or when you only need eval or inference paths.
Is run-train safe to install?
Review the Security Audits panel on this Prism page and treat training runs as shell execution with GPU and filesystem access; scope commands to the repo you trust before running.
Workflow Chain
Requires first: repo intake and plan
Then invoke: safe debug
SKILL.md
READMESKILL.md - Run Train
# run-train Use the shared operating principles in `../../references/agent-operating-principles.md`; this skill should keep training evidence bounded while leaving repository-specific monitoring details to the model. ## When to apply - When the training command has already been selected and should be executed conservatively. - When the researcher wants startup verification, short-run verification, full training kickoff, or resume handling. - When the run needs structured training status, checkpoint, and metric reporting. ## When not to apply - When the main task is environment setup or asset download. - When the researcher wants inference-only or evaluation-only execution. - When the task is speculative exploration, multi-variant sweeps, or autonomous idea implementation. - When the user still needs repository intake or paper gap resolution. ## Clear boundaries - This skill executes a selected training command and normalizes the resulting evidence. - It does not choose the overall research goal on its own. - It does not own exploratory branching or speculative code adaptation. - It should record partial, blocked, resumed, and kicked-off states clearly. - It should preserve reproducibility context such as configs, seeds, checkpoints, logs, metrics, and runtime assumptions when available. ## Input expectations - selected training goal - runnable training command - environment and asset assumptions - run mode such as startup verification, short-run verification, full kickoff, or resume ## Output expectations - `train_outputs/SUMMARY.md` - `train_outputs/COMMANDS.md` - `train_outputs/LOG.md` - `train_outputs/SCIENTIFIC_CHANGELOG.md` - `train_outputs/COMPARABILITY_REPORT.md` - `train_outputs/status.json` ## Notes Use `references/training-policy.md`, `../../references/deep-learning-experiment-principles.md`, `scripts/run_training.py`, and `scripts/write_outputs.py`. display_name: Run Train short_description: Execute a selected training command conservatively and write standardized train_outputs. default_prompt: Run the selected training command conservatively, capture startup, resume, and status evidence, and write SUMMARY.md COMMANDS.md LOG.md and status.json into train_outputs. # Training Policy ## Purpose Use this skill for trusted-lane training execution after a training command has already been selected. ## Requirements - state whether the run is startup verification, short-run verification, full kickoff, or resume - record the exact training command - record dataset and checkpoint assumptions explicitly - separate blocked, partial, resumed, and verified states - record `max_steps`, `completed_steps`, `best_metric`, `best_checkpoint`, and `stop_reason` when available - keep conclusions conservative when the run is short or partial - treat long-running supervision as a finite monitoring window unless another skill has already planned a broader schedule - extract useful step, epoch, metric, and checkpoint hints from logs when they appear, but do not invent them when logs are silent - leave subset design, early-comparison strategy, and experiment planning to the orchestrator or an explore skill when they are needed ## Avoid - exploratory sweeps - speculative architecture changes - implying that startup verification equals full reproduction success #!/usr/bin/env python3 """Execute a selected training command and normalize conservative training evidence.""" from __future__ import annotations import argpa