Minimal Run And Audit

Name: Minimal Run And Audit
Author: lllllllama

lllllllama/rigorpilot-skills

176k installs
512 repo stars
Updated July 26, 2026
lllllllama/rigorpilot-skills

This is a copy of minimal-run-and-audit by lllllllama - installs and ranking accrue to the original listing.

minimal-run-and-audit is a Rigor Run agent skill that runs conservative smoke, inference, evaluation, or sanity checks on an agent and writes auditable evidence files for developers who need documented proof that a non-t

About

minimal-run-and-audit is the Rigor Run leaf mode from rigorpilot-skills for developers validating agent inference and evaluation pipelines before trusting results. The skill runs a selected smoke, inference, evaluation, or sanity command conservatively, captures execution evidence, and writes four artifacts: SUMMARY.md, COMMANDS.md, LOG.md, and status.json. Reporting policy requires separating facts from inferences, naming the documented command explicitly, and stating whether the run was full, partial, smoke-only, sanity-only, or blocked. Developers reach for minimal-run-and-audit when an agent change needs a reproducible audit trail instead of ad-hoc terminal output, especially after patches or when a blocker must be surfaced clearly in SUMMARY.md without burying the root cause.

Runs selected smoke, inference, evaluation or sanity commands conservatively
Captures execution evidence and writes SUMMARY.md, COMMANDS.md, LOG.md plus status.json
Separates facts from inferences and explicitly names the documented command
Reports whether run was full, partial, smoke-only, sanity-only or blocked
Keeps reports short, factual and easy to audit while avoiding narrative journals

Minimal Run And Audit by the numbers

175,906 all-time installs (skills.sh)
+25,326 installs in the week ending Jul 28, 2026 (Skillselion tracking)
Security screen: MEDIUM risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

npx skills add https://github.com/lllllllama/rigorpilot-skills --skill minimal-run-and-audit

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/lllllllama/rigorpilot-skills/minimal-run-and-audit.svg)](https://skillselion.com/skills/lllllllama/rigorpilot-skills/minimal-run-and-audit)

Installs	176k
repo stars	★ 512
Security audit	1 / 3 scanners passed
Last updated	July 26, 2026
Repository	lllllllama/rigorpilot-skills ↗

How do you audit agent inference runs with evidence files?

Run conservative smoke, inference, evaluation or sanity checks on an agent and produce clean, auditable evidence files.

Who is it for?

Developers running Rigor Pilot agent pipelines who need short, factual, auditable reports after smoke, inference, evaluation, or sanity execution.

Skip if: Developers seeking open-ended exploratory experiments or multi-variant tuning without a fixed conservative run policy.

When should I use this skill?

User authorizes a conservative smoke, inference, evaluation, or sanity run and wants auditable execution evidence written to disk.

What you get

SUMMARY.md, COMMANDS.md, LOG.md, and status.json documenting run type, commands, logs, and blocked or partial status.

SUMMARY.md
COMMANDS.md
LOG.md

By the numbers

Writes 4 evidence files: SUMMARY.md, COMMANDS.md, LOG.md, and status.json

Files

SKILL.mdMarkdownGitHub ↗

minimal-run-and-audit

Use this as the Rigor Run skill. The installed slug remains minimal-run-and-audit for compatibility.

Use the shared operating principles in ../../references/agent-operating-principles.md; this skill should make run evidence auditable without turning every command into a rigid protocol.

When to apply

After a reproduction target and setup plan exist.
When the main skill needs execution evidence and normalized outputs.
When a smoke test, documented inference run, documented evaluation run, or other short non-training verification is appropriate.
When the user already knows what command should be attempted and wants execution plus reporting only.

When not to apply

During initial repo scanning.
When environment or assets are still undefined enough to make execution meaningless.
When the task is a literature lookup rather than repository execution.
When the user is still deciding which reproduction target should count as the main run.

Clear boundaries

This skill owns normalized reporting for an attempted command.
It may receive execution evidence from the main skill or a thin helper.
It does not choose the overall target on its own.
It does not perform broad paper analysis.
It does not own training startup, resume, or long-running training state.
It should not normalize risky code edits into acceptable practice.
It must not hide changes that alter evaluation, preprocessing, checkpoints,

metrics, or other scientific meaning.

Input expectations

selected reproduction goal
runnable commands or smoke commands
environment and asset assumptions
optional patch metadata

Output expectations

execution result summary
standardized repro_outputs/ files
SCIENTIFIC_CHANGELOG.md for changed scientific meaning and evidence status
COMPARABILITY_REPORT.md for README/paper/baseline comparability
clear distinction between verified, partial, and blocked states
PATCHES.md when repo files changed

Notes

Use references/reporting-policy.md, ../../references/research-rigor-principles.md, scripts/run_command.py, and scripts/write_outputs.py.

display_name: Rigor Run
short_description: Rigor Run mode for selected inference, evaluation, smoke, or sanity execution evidence.
default_prompt: Run the selected smoke, inference, evaluation, or sanity command conservatively, capture execution evidence, and write SUMMARY.md COMMANDS.md LOG.md and status.json.

#!/usr/bin/env python3
"""Execute a short non-training command and normalize the evidence."""

from __future__ import annotations

import argparse
import json
import re
import shlex
import subprocess
from pathlib import Path
from typing import Any, Dict, Iterable, List, Optional, Tuple


METRIC_RE = re.compile(
    r"\b([A-Za-z][A-Za-z0-9_.-]{1,31})\s*[:=]\s*(-?\d+(?:\.\d+)?(?:[eE][+-]?\d+)?)"
)


def combine_logs(parts: Iterable[str]) -> str:
    return "\n".join(part for part in parts if part).strip()


def parse_metrics(text: str) -> Dict[str, Any]:
    observed_metrics: Dict[str, float] = {}
    best_metric: Optional[Dict[str, Any]] = None

    for match in METRIC_RE.finditer(text):
        name = match.group(1)
        value = float(match.group(2))
        observed_metrics[name] = value

    priority_names = [
        name for name in observed_metrics
        if not any(token in name.lower() for token in {"loss", "lr", "time", "mem"})
    ]
    if priority_names:
        chosen = priority_names[-1]
        best_metric = {"name": chosen, "value": observed_metrics[chosen]}
    elif observed_metrics:
        chosen = list(observed_metrics)[-1]
        best_metric = {"name": chosen, "value": observed_metrics[chosen]}

    return {
        "observed_metrics": observed_metrics,
        "best_metric": best_metric,
    }


def split_command(command: str) -> List[str]:
    return shlex.split(command, posix=True)


def run_git(repo: Path, args: List[str]) -> subprocess.CompletedProcess[str]:
    return subprocess.run(
        ["git", *args],
        cwd=repo,
        capture_output=True,
        text=True,
        timeout=15,
        check=False,
    )


def git_status_snapshot(repo: Path) -> Tuple[Optional[Dict[str, str]], Dict[str, Any]]:
    probe = run_git(repo, ["rev-parse", "--is-inside-work-tree"])
    if probe.returncode != 0 or probe.stdout.strip() != "true":
        return None, {
            "collection_method": "git-status-diff",
            "available": False,
            "reason": "git-unavailable-or-not-a-worktree",
        }

    result = run_git(repo, ["status", "--porcelain=v1", "--untracked-files=all"])
    if result.returncode != 0:
        return None, {
            "collection_method": "git-status-diff",
            "available": False,
            "reason": "git-status-failed",
            "stderr": result.stderr.strip(),
        }

    snapshot: Dict[str, str] = {}
    for raw_line in result.stdout.splitlines():
        line = raw_line.rstrip()
        if len(line) < 4:
            continue
        status = line[:2]
        path = line[3:]
        if " -> " in path:
            _old, _arrow, path = path.partition(" -> ")
        normalized = path.replace("\\", "/").strip()
        if normalized:
            snapshot[normalized] = status
    return snapshot, {
        "collection_method": "git-status-diff",
        "available": True,
        "status_entries": len(snapshot),
    }


def diff_status_snapshots(
    before: Optional[Dict[str, str]],
    after: Optional[Dict[str, str]],
) -> Dict[str, List[str]]:
    if before is None or after is None:
        return {
            "changed_files": [],
            "new_files": [],
            "deleted_files": [],
            "touched_paths": [],
            "touched_symbols": [],
        }

    changed_files: List[str] = []
    new_files: List[str] = []
    deleted_files: List[str] = []

    for path, status in after.items():
        previous_status = before.get(path)
        if previous_status == status:
            continue
        normalized_status = status.replace(" ", "")
        if "D" in normalized_status:
            deleted_files.append(path)
            continue
        if "?" in normalized_status or "A" in normalized_status:
            new_files.append(path)
            continue
        changed_files.append(path)

    touched_paths = []
    for path in [*changed_files, *new_files, *deleted_files]:
        if path not in touched_paths:
            touched_paths.append(path)
    return {
        "changed_files": changed_files,
        "new_files": new_files,
        "deleted_files": deleted_files,
        "touched_paths": touched_paths,
        "touched_symbols": [],
    }


def execute_command(repo: Path, command: str, timeout: int) -> Dict[str, Any]:
    before_status, before_capture = git_status_snapshot(repo)
    try:
        result = subprocess.run(
            split_command(command),
            cwd=repo,
            capture_output=True,
            text=True,
            timeout=timeout,
            check=False,
        )
        execution = {
            "returncode": result.returncode,
            "timed_out": False,
            "stdout": result.stdout or "",
            "stderr": result.stderr or "",
        }
        after_status, after_capture = git_status_snapshot(repo)
        execution.update(diff_status_snapshots(before_status, after_status))
        execution["evidence_capture"] = {
            **after_capture,
            "before_status_entries": before_capture.get("status_entries"),
        }
        return execution
    except FileNotFoundError as exc:
        return {
            "returncode": None,
            "timed_out": False,
            "launch_error": str(exc),
            "stdout": "",
            "stderr": "",
            "changed_files": [],
            "new_files": [],
            "deleted_files": [],
            "touched_paths": [],
            "touched_symbols": [],
            "evidence_capture": before_capture,
        }
    except subprocess.TimeoutExpired as exc:
        after_status, after_capture = git_status_snapshot(repo)
        execution = {
            "returncode": None,
            "timed_out": True,
            "stdout": exc.stdout or "",
            "stderr": exc.stderr or "",
        }
        execution.update(diff_status_snapshots(before_status, after_status))
        execution["evidence_capture"] = {
            **after_capture,
            "before_status_entries": before_capture.get("status_entries"),
        }
        return execution


def decide_outcome(command: str, timeout: int, execution: Dict[str, Any], metric_data: Dict[str, Any]) -> Dict[str, Any]:
    combined_text = combine_logs(
        [
            f"STDOUT:\n{execution['stdout'].strip()}" if execution.get("stdout", "").strip() else "",
            f"STDERR:\n{execution['stderr'].strip()}" if execution.get("stderr", "").strip() else "",
        ]
    )

    if execution.get("launch_error"):
        return {
            "status": "blocked",
            "documented_command_status": "blocked",
            "main_blocker": f"Executable not found for command: {execution['launch_error']}",
            "execution_log": [f"Command failed before launch: {execution['launch_error']}"],
            "monitoring_scope": "no_run",
        }

    if execution.get("timed_out"):
        return {
            "status": "partial",
            "documented_command_status": "partial",
            "main_blocker": f"Selected command did not finish within {timeout} seconds.",
            "execution_log": [combined_text or f"Command timed out after {timeout} seconds."],
            "monitoring_scope": f"timeout:{timeout}s",
        }

    if execution.get("returncode") == 0:
        return {
            "status": "success",
            "documented_command_status": "success",
            "main_blocker": "None.",
            "execution_log": [combined_text] if combined_text else [],
            "monitoring_scope": "process_completion",
        }

    return {
        "status": "partial",
        "documented_command_status": "partial",
        "main_blocker": f"Selected command exited with code {execution.get('returncode')}.",
        "execution_log": [combined_text] if combined_text else [f"Command `{command}` exited non-zero."],
        "monitoring_scope": "process_completion",
    }


def main() -> int:
    parser = argparse.ArgumentParser(description="Run a short non-training command and summarize the evidence.")
    parser.add_argument("--repo", required=True, help="Path to the target repository.")
    parser.add_argument("--command", required=True, help="Command to execute.")
    parser.add_argument("--timeout", type=int, default=60, help="Execution timeout in seconds.")
    args = parser.parse_args()

    repo = Path(args.repo).resolve()
    execution = execute_command(repo, args.command, args.timeout)
    metric_data = parse_metrics(combine_logs([execution.get("stdout", ""), execution.get("stderr", "")]))
    outcome = decide_outcome(args.command, args.timeout, execution, metric_data)

    payload = {
        "status": outcome["status"],
        "documented_command_status": outcome["documented_command_status"],
        "main_blocker": outcome["main_blocker"],
        "execution_log": outcome["execution_log"],
        "monitoring_scope": outcome["monitoring_scope"],
        "best_metric": metric_data["best_metric"],
        "observed_metrics": metric_data["observed_metrics"],
        "changed_files": execution.get("changed_files", []),
        "new_files": execution.get("new_files", []),
        "deleted_files": execution.get("deleted_files", []),
        "touched_paths": execution.get("touched_paths", []),
        "touched_symbols": execution.get("touched_symbols", []),
        "evidence_capture": execution.get("evidence_capture", {}),
    }
    print(json.dumps(payload, indent=2, ensure_ascii=False))
    return 0


if __name__ == "__main__":
    raise SystemExit(main())

#!/usr/bin/env python3
"""Compatibility wrapper for trusted verify output bundles."""

from __future__ import annotations

import importlib.util
from pathlib import Path


def load_shared_module():
    module_path = Path(__file__).resolve().parents[3] / "shared" / "scripts" / "write_run_bundle.py"
    spec = importlib.util.spec_from_file_location("write_run_bundle", module_path)
    if spec is None or spec.loader is None:
        raise RuntimeError(f"Unable to load shared writer module from {module_path}")
    module = importlib.util.module_from_spec(spec)
    spec.loader.exec_module(module)
    return module


def main() -> int:
    module = load_shared_module()
    return module.main(default_mode="repro", default_output_dir="repro_outputs")


if __name__ == "__main__":
    raise SystemExit(main())

Related skills

Setup Matt Pocock SkillsScaffold the per-repo configuration that Matt Pocock’s engineering agent skills rely on so they understand the issue tracker, triage labels, and domain documentation la462k185k

Lark Skill MakerQuickly turn any Lark/Feishu OpenAPI call or multi-step workflow into a reusable agent skill with its own SKILL.md.379k15.8k

CavemanSlash token usage by roughly 75% while keeping every technical detail intact when working with Claude Code, Cursor or similar agents.378k92.5k

Lark AppsConnect Claude, Cursor or custom agents directly to Lark (Feishu) for messaging, document automation, approval workflows and enterprise data access.375k

Running Claude Code Via Litellm CopilotRun Claude Code at a fraction of the cost by routing requests through LiteLLM to the GitHub Copilot Chat API.270k72

Codex PetGenerate a complete Codex Pet spritesheet and metadata from one reference image without needing an OpenAI key or Codex Pro.246k8

How it compares

Pick minimal-run-and-audit over exploratory run skills when the goal is auditable verification evidence, not branching experiment comparison.

FAQ

What files does minimal-run-and-audit produce?

minimal-run-and-audit writes SUMMARY.md, COMMANDS.md, LOG.md, and status.json after a conservative smoke, inference, evaluation, or sanity run. SUMMARY.md states run type, blockers, and patch state; COMMANDS.md lists the documented command.

When should minimal-run-and-audit run instead of explore-run?

minimal-run-and-audit fits fixed conservative verification with auditable evidence. explore-run fits authorized isolated experiments with CHANGESET.md and TOP_RUNS.md when exploratory execution is explicitly approved.

Is Minimal Run And Audit safe to install?

skills.sh reports 1 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

AI & Agent Buildingagentsautomation

Minimal Run And Audit

About

Minimal Run And Audit by the numbers

Add your badge

How do you audit agent inference runs with evidence files?

Who is it for?

When should I use this skill?

What you get

By the numbers

Files

minimal-run-and-audit

When to apply

When not to apply

Clear boundaries

Input expectations

Output expectations

Notes

Reporting Policy

Tone

Requirements

Output priorities

Avoid

Related skills

How it compares

FAQ

What files does minimal-run-and-audit produce?

When should minimal-run-and-audit run instead of explore-run?

Is Minimal Run And Audit safe to install?

About

Minimal Run And Audit by the numbers

Add your badge

How do you audit agent inference runs with evidence files?

Who is it for?

When should I use this skill?

What you get

By the numbers

Files

minimal-run-and-audit

When to apply

When not to apply

Clear boundaries

Input expectations

Output expectations

Notes

Reporting Policy

Tone

Requirements

Output priorities

Avoid

Related skills

How it compares

FAQ

What files does minimal-run-and-audit produce?

When should minimal-run-and-audit run instead of explore-run?

Is Minimal Run And Audit safe to install?

This week in AI coding