
Minimal Run And Audit
Execute a selected smoke, inference, eval, or sanity command conservatively and emit auditable SUMMARY.md, COMMANDS.md, LOG.md, and status.json evidence.
Overview
Minimal-run-and-audit is an agent skill most often used in Ship (also Validate) that runs conservative smoke, inference, eval, or sanity commands and writes auditable SUMMARY.md, COMMANDS.md, LOG.md, and status.json.
Install
npx skills add https://github.com/lllllllama/rigorpilot-skills --skill minimal-run-and-auditWhat is this skill?
- Executes short non-training commands via bundled Python runner with normalized evidence
- Reporting separates facts from inferences; states full vs partial vs smoke vs blocked
- Output priorities: overall result, copyable commands, concise trace, status.json, PATCHES.md when patched
- Extracts metrics from logs with structured regex normalization
- Explicitly excludes training startup or resume from this skill scope
- 5 output priorities in reporting policy
- 5 run classifications: full, partial, smoke-only, sanity-only, blocked
Adoption & trust: 32.3k installs on skills.sh; 412 GitHub stars; 1/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have env and assets ready but no disciplined record of whether the documented non-training command actually ran and what it produced.
Who is it for?
Solo builders reproducing ML repos who need one auditable execution artifact before scaling experiments or declaring reproduce success.
Skip if: Full training schedules, resume-from-checkpoint training, or exploratory sweeps without a single selected command.
When should I use this skill?
Run the selected smoke, inference, evaluation, or sanity command conservatively, capture execution evidence, and write SUMMARY.md COMMANDS.md LOG.md and status.json.
What do I get? / Deliverables
You get copyable commands, normalized logs, clear blocked-or-pass overall result, and status.json suitable for explore-run authorization or iterate fixes.
- SUMMARY.md with overall result and patch note
- COMMANDS.md with copyable invocations
- LOG.md process trace
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Non-training execution evidence is canonical on Ship because you are verifying documented commands before launch or deeper exploration. Testing subphase covers partial, smoke-only, sanity-only, or blocked runs with explicit command traces—not full training jobs.
Where it fits
Run the README’s shortest eval command to see if the reproduction target is viable before scope expands.
Document partial inference success and metric lines extracted from LOG.md for a release checklist.
Re-run sanity command after a dependency bump and compare new status.json to prior blocked state.
How it compares
Skill workflow for evidence-backed smoke runs—not a replacement for your experiment tracker or CI matrix by itself.
Common Questions / FAQ
Who is minimal-run-and-audit for?
Agent users and indie researchers running RigorPilot who must prove a README command ran with commands and logs a reviewer can copy.
When should I use minimal-run-and-audit?
After env-and-assets-bootstrap in validate prototype checks; in ship testing for inference or eval smoke; before explore-run when you need a trusted baseline run.
Is minimal-run-and-audit safe to install?
Review Security Audits on this page; the skill executes shell commands and may read secrets from env—scope approvals to the documented command only.
Workflow Chain
Requires first: env and assets bootstrap
Then invoke: explore run
SKILL.md
READMESKILL.md - Minimal Run And Audit
display_name: Rigor Run short_description: Rigor Run mode for selected inference, evaluation, smoke, or sanity execution evidence. default_prompt: Run the selected smoke, inference, evaluation, or sanity command conservatively, capture execution evidence, and write SUMMARY.md COMMANDS.md LOG.md and status.json. # Reporting Policy ## Tone Keep reports short, factual, and easy to audit. ## Requirements - separate facts from inferences - mention the documented command explicitly - mention whether the non-training run was full, partial, smoke-only, sanity-only, or blocked - explain the main blocker without burying it - when patches were applied, mention patch state briefly in `SUMMARY.md` and keep the full audit in `PATCHES.md` ## Output priorities 1. clear overall result 2. copyable commands 3. concise process trace 4. stable machine-readable state 5. patch evidence when relevant ## Avoid - long narrative journals - vague "it should work" language - hiding unsupported assumptions - treating training startup or resume as part of this skill #!/usr/bin/env python3 """Execute a short non-training command and normalize the evidence.""" from __future__ import annotations import argparse import json import re import shlex import subprocess from pathlib import Path from typing import Any, Dict, Iterable, List, Optional, Tuple METRIC_RE = re.compile( r"\b([A-Za-z][A-Za-z0-9_.-]{1,31})\s*[:=]\s*(-?\d+(?:\.\d+)?(?:[eE][+-]?\d+)?)" ) def combine_logs(parts: Iterable[str]) -> str: return "\n".join(part for part in parts if part).strip() def parse_metrics(text: str) -> Dict[str, Any]: observed_metrics: Dict[str, float] = {} best_metric: Optional[Dict[str, Any]] = None for match in METRIC_RE.finditer(text): name = match.group(1) value = float(match.group(2)) observed_metrics[name] = value priority_names = [ name for name in observed_metrics if not any(token in name.lower() for token in {"loss", "lr", "time", "mem"}) ] if priority_names: chosen = priority_names[-1] best_metric = {"name": chosen, "value": observed_metrics[chosen]} elif observed_metrics: chosen = list(observed_metrics)[-1] best_metric = {"name": chosen, "value": observed_metrics[chosen]} return { "observed_metrics": observed_metrics, "best_metric": best_metric, } def split_command(command: str) -> List[str]: return shlex.split(command, posix=True) def run_git(repo: Path, args: List[str]) -> subprocess.CompletedProcess[str]: return subprocess.run( ["git", *args], cwd=repo, capture_output=True, text=True, timeout=15, check=False, ) def git_status_snapshot(repo: Path) -> Tuple[Optional[Dict[str, str]], Dict[str, Any]]: probe = run_git(repo, ["rev-parse", "--is-inside-work-tree"]) if probe.returncode != 0 or probe.stdout.strip() != "true": return None, { "collection_method": "git-status-diff", "available": False, "reason": "git-unavailable-or-not-a-worktree", } result = run_git(repo, ["status", "--porcelain=v1", "--untracked-files=all"]) if result.returncode != 0: return None, { "collection_method": "git-status-diff", "available": False, "reason": "git-status-failed", "stderr": result.stderr.strip(), } snapshot: Dict[str, str] = {} for raw_line in result.stdout.splitlines(): line = raw_line.rstrip() if len(line) < 4: continue status = line[:2] path = line[3:] if " -> " in path: _old, _arrow, path = path.partition(" -> ") normalized = path.replace("\\", "/").strip() if normalized: snapshot[normalized] = status return snapshot, { "collection_method": "git-status-diff", "available": True, "status_entries": len(sna