Minimal Run And Audit

Name: Minimal Run And Audit
Author: lllllllama

lllllllama/ai-paper-reproduction-skill

140k installs
512 repo stars
Updated July 26, 2026
lllllllama/ai-paper-reproduction-skill

Minimal-run-and-audit is a rigor skill that captures standardized execution evidence from paper reproductions.

About

Minimal-run-and-audit is a rigor skill that executes documented smoke tests and evaluation commands, then captures evidence in standardized formats. It normalizes repro_outputs/ files, documents scientific changes via SCIENTIFIC_CHANGELOG, and provides a COMPARABILITY_REPORT. Use when running reproduction commands and needing auditable, comparable evidence.

Executes smoke tests and documented inference/evaluation commands with evidence capture
Normalizes repro_outputs/ files and generates SCIENTIFIC_CHANGELOG for methodology changes
Provides standardized COMPARABILITY_REPORT for paper-baseline alignment

Minimal Run And Audit by the numbers

139,885 all-time installs (skills.sh)
+30 installs in the week ending Jul 28, 2026 (Skillselion tracking)
Ranked #7 of 2,066 Data Science & ML skills by installs in the Skillselion catalog
Security screen: MEDIUM risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

At a glance

minimal-run-and-audit capabilities & compatibility

Capabilities: command execution · evidence normalization · conflict documentation

npx skills add https://github.com/lllllllama/ai-paper-reproduction-skill --skill minimal-run-and-audit

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/lllllllama/ai-paper-reproduction-skill/minimal-run-and-audit.svg)](https://skillselion.com/skills/lllllllama/ai-paper-reproduction-skill/minimal-run-and-audit)

Installs	140k
repo stars	★ 512
Security audit	1 / 3 scanners passed
Last updated	July 26, 2026
Repository	lllllllama/ai-paper-reproduction-skill ↗

What it does

ML researchers capture auditable execution evidence for paper reproductions with standardized reporting.

Who is it for?

ML researchers needing auditable, comparable reproduction evidence

Skip if: Developers still scoping the repository, bootstrapping conda environments, running full training jobs, or needing primary paper detail lookup.

When should I use this skill?

Running documented smoke tests or evaluation commands and need normalized output reporting

What you get

Normalized repro_outputs/ files, SCIENTIFIC_CHANGELOG, and COMPARABILITY_REPORT for baseline alignment

repro_outputs/ evidence files
execution patch notes

Files

SKILL.mdMarkdownGitHub ↗

minimal-run-and-audit

Use this as the Rigor Run skill. The installed slug remains minimal-run-and-audit for compatibility.

Use the shared operating principles in ../../references/agent-operating-principles.md; this skill should make run evidence auditable without turning every command into a rigid protocol.

When to apply

After a reproduction target and setup plan exist.
When the main skill needs execution evidence and normalized outputs.
When a smoke test, documented inference run, documented evaluation run, or other short non-training verification is appropriate.
When the user already knows what command should be attempted and wants execution plus reporting only.

When not to apply

During initial repo scanning.
When environment or assets are still undefined enough to make execution meaningless.
When the task is a literature lookup rather than repository execution.
When the user is still deciding which reproduction target should count as the main run.

Clear boundaries

This skill owns normalized reporting for an attempted command.
It may receive execution evidence from the main skill or a thin helper.
It does not choose the overall target on its own.
It does not perform broad paper analysis.
It does not own training startup, resume, or long-running training state.
It should not normalize risky code edits into acceptable practice.
It must not hide changes that alter evaluation, preprocessing, checkpoints,

metrics, or other scientific meaning.

Input expectations

selected reproduction goal
runnable commands or smoke commands
environment and asset assumptions
optional patch metadata

Output expectations

execution result summary
standardized repro_outputs/ files
SCIENTIFIC_CHANGELOG.md for changed scientific meaning and evidence status
COMPARABILITY_REPORT.md for README/paper/baseline comparability
clear distinction between verified, partial, and blocked states
PATCHES.md when repo files changed

Notes

Use references/reporting-policy.md, ../../references/research-rigor-principles.md, scripts/run_command.py, and scripts/write_outputs.py.

display_name: Rigor Run
short_description: Rigor Run mode for selected inference, evaluation, smoke, or sanity execution evidence.
default_prompt: Run the selected smoke, inference, evaluation, or sanity command conservatively, capture execution evidence, and write SUMMARY.md COMMANDS.md LOG.md and status.json.

#!/usr/bin/env python3
"""Execute a short non-training command and normalize the evidence."""

from __future__ import annotations

import argparse
import json
import re
import shlex
import subprocess
from pathlib import Path
from typing import Any, Dict, Iterable, List, Optional, Tuple


METRIC_RE = re.compile(
    r"\b([A-Za-z][A-Za-z0-9_.-]{1,31})\s*[:=]\s*(-?\d+(?:\.\d+)?(?:[eE][+-]?\d+)?)"
)


def combine_logs(parts: Iterable[str]) -> str:
    return "\n".join(part for part in parts if part).strip()


def parse_metrics(text: str) -> Dict[str, Any]:
    observed_metrics: Dict[str, float] = {}
    best_metric: Optional[Dict[str, Any]] = None

    for match in METRIC_RE.finditer(text):
        name = match.group(1)
        value = float(match.group(2))
        observed_metrics[name] = value

    priority_names = [
        name for name in observed_metrics
        if not any(token in name.lower() for token in {"loss", "lr", "time", "mem"})
    ]
    if priority_names:
        chosen = priority_names[-1]
        best_metric = {"name": chosen, "value": observed_metrics[chosen]}
    elif observed_metrics:
        chosen = list(observed_metrics)[-1]
        best_metric = {"name": chosen, "value": observed_metrics[chosen]}

    return {
        "observed_metrics": observed_metrics,
        "best_metric": best_metric,
    }


def split_command(command: str) -> List[str]:
    return shlex.split(command, posix=True)


def run_git(repo: Path, args: List[str]) -> subprocess.CompletedProcess[str]:
    return subprocess.run(
        ["git", *args],
        cwd=repo,
        capture_output=True,
        text=True,
        timeout=15,
        check=False,
    )


def git_status_snapshot(repo: Path) -> Tuple[Optional[Dict[str, str]], Dict[str, Any]]:
    probe = run_git(repo, ["rev-parse", "--is-inside-work-tree"])
    if probe.returncode != 0 or probe.stdout.strip() != "true":
        return None, {
            "collection_method": "git-status-diff",
            "available": False,
            "reason": "git-unavailable-or-not-a-worktree",
        }

    result = run_git(repo, ["status", "--porcelain=v1", "--untracked-files=all"])
    if result.returncode != 0:
        return None, {
            "collection_method": "git-status-diff",
            "available": False,
            "reason": "git-status-failed",
            "stderr": result.stderr.strip(),
        }

    snapshot: Dict[str, str] = {}
    for raw_line in result.stdout.splitlines():
        line = raw_line.rstrip()
        if len(line) < 4:
            continue
        status = line[:2]
        path = line[3:]
        if " -> " in path:
            _old, _arrow, path = path.partition(" -> ")
        normalized = path.replace("\\", "/").strip()
        if normalized:
            snapshot[normalized] = status
    return snapshot, {
        "collection_method": "git-status-diff",
        "available": True,
        "status_entries": len(snapshot),
    }


def diff_status_snapshots(
    before: Optional[Dict[str, str]],
    after: Optional[Dict[str, str]],
) -> Dict[str, List[str]]:
    if before is None or after is None:
        return {
            "changed_files": [],
            "new_files": [],
            "deleted_files": [],
            "touched_paths": [],
            "touched_symbols": [],
        }

    changed_files: List[str] = []
    new_files: List[str] = []
    deleted_files: List[str] = []

    for path, status in after.items():
        previous_status = before.get(path)
        if previous_status == status:
            continue
        normalized_status = status.replace(" ", "")
        if "D" in normalized_status:
            deleted_files.append(path)
            continue
        if "?" in normalized_status or "A" in normalized_status:
            new_files.append(path)
            continue
        changed_files.append(path)

    touched_paths = []
    for path in [*changed_files, *new_files, *deleted_files]:
        if path not in touched_paths:
            touched_paths.append(path)
    return {
        "changed_files": changed_files,
        "new_files": new_files,
        "deleted_files": deleted_files,
        "touched_paths": touched_paths,
        "touched_symbols": [],
    }


def execute_command(repo: Path, command: str, timeout: int) -> Dict[str, Any]:
    before_status, before_capture = git_status_snapshot(repo)
    try:
        result = subprocess.run(
            split_command(command),
            cwd=repo,
            capture_output=True,
            text=True,
            timeout=timeout,
            check=False,
        )
        execution = {
            "returncode": result.returncode,
            "timed_out": False,
            "stdout": result.stdout or "",
            "stderr": result.stderr or "",
        }
        after_status, after_capture = git_status_snapshot(repo)
        execution.update(diff_status_snapshots(before_status, after_status))
        execution["evidence_capture"] = {
            **after_capture,
            "before_status_entries": before_capture.get("status_entries"),
        }
        return execution
    except FileNotFoundError as exc:
        return {
            "returncode": None,
            "timed_out": False,
            "launch_error": str(exc),
            "stdout": "",
            "stderr": "",
            "changed_files": [],
            "new_files": [],
            "deleted_files": [],
            "touched_paths": [],
            "touched_symbols": [],
            "evidence_capture": before_capture,
        }
    except subprocess.TimeoutExpired as exc:
        after_status, after_capture = git_status_snapshot(repo)
        execution = {
            "returncode": None,
            "timed_out": True,
            "stdout": exc.stdout or "",
            "stderr": exc.stderr or "",
        }
        execution.update(diff_status_snapshots(before_status, after_status))
        execution["evidence_capture"] = {
            **after_capture,
            "before_status_entries": before_capture.get("status_entries"),
        }
        return execution


def decide_outcome(command: str, timeout: int, execution: Dict[str, Any], metric_data: Dict[str, Any]) -> Dict[str, Any]:
    combined_text = combine_logs(
        [
            f"STDOUT:\n{execution['stdout'].strip()}" if execution.get("stdout", "").strip() else "",
            f"STDERR:\n{execution['stderr'].strip()}" if execution.get("stderr", "").strip() else "",
        ]
    )

    if execution.get("launch_error"):
        return {
            "status": "blocked",
            "documented_command_status": "blocked",
            "main_blocker": f"Executable not found for command: {execution['launch_error']}",
            "execution_log": [f"Command failed before launch: {execution['launch_error']}"],
            "monitoring_scope": "no_run",
        }

    if execution.get("timed_out"):
        return {
            "status": "partial",
            "documented_command_status": "partial",
            "main_blocker": f"Selected command did not finish within {timeout} seconds.",
            "execution_log": [combined_text or f"Command timed out after {timeout} seconds."],
            "monitoring_scope": f"timeout:{timeout}s",
        }

    if execution.get("returncode") == 0:
        return {
            "status": "success",
            "documented_command_status": "success",
            "main_blocker": "None.",
            "execution_log": [combined_text] if combined_text else [],
            "monitoring_scope": "process_completion",
        }

    return {
        "status": "partial",
        "documented_command_status": "partial",
        "main_blocker": f"Selected command exited with code {execution.get('returncode')}.",
        "execution_log": [combined_text] if combined_text else [f"Command `{command}` exited non-zero."],
        "monitoring_scope": "process_completion",
    }


def main() -> int:
    parser = argparse.ArgumentParser(description="Run a short non-training command and summarize the evidence.")
    parser.add_argument("--repo", required=True, help="Path to the target repository.")
    parser.add_argument("--command", required=True, help="Command to execute.")
    parser.add_argument("--timeout", type=int, default=60, help="Execution timeout in seconds.")
    args = parser.parse_args()

    repo = Path(args.repo).resolve()
    execution = execute_command(repo, args.command, args.timeout)
    metric_data = parse_metrics(combine_logs([execution.get("stdout", ""), execution.get("stderr", "")]))
    outcome = decide_outcome(args.command, args.timeout, execution, metric_data)

    payload = {
        "status": outcome["status"],
        "documented_command_status": outcome["documented_command_status"],
        "main_blocker": outcome["main_blocker"],
        "execution_log": outcome["execution_log"],
        "monitoring_scope": outcome["monitoring_scope"],
        "best_metric": metric_data["best_metric"],
        "observed_metrics": metric_data["observed_metrics"],
        "changed_files": execution.get("changed_files", []),
        "new_files": execution.get("new_files", []),
        "deleted_files": execution.get("deleted_files", []),
        "touched_paths": execution.get("touched_paths", []),
        "touched_symbols": execution.get("touched_symbols", []),
        "evidence_capture": execution.get("evidence_capture", {}),
    }
    print(json.dumps(payload, indent=2, ensure_ascii=False))
    return 0


if __name__ == "__main__":
    raise SystemExit(main())

#!/usr/bin/env python3
"""Compatibility wrapper for trusted verify output bundles."""

from __future__ import annotations

import importlib.util
from pathlib import Path


def load_shared_module():
    module_path = Path(__file__).resolve().parents[3] / "shared" / "scripts" / "write_run_bundle.py"
    spec = importlib.util.spec_from_file_location("write_run_bundle", module_path)
    if spec is None or spec.loader is None:
        raise RuntimeError(f"Unable to load shared writer module from {module_path}")
    module = importlib.util.module_from_spec(spec)
    spec.loader.exec_module(module)
    return module


def main() -> int:
    module = load_shared_module()
    return module.main(default_mode="repro", default_output_dir="repro_outputs")


if __name__ == "__main__":
    raise SystemExit(main())

Related skills

Microsoft FoundryDeploy, evaluate, and continuously improve Microsoft Foundry agents from a single agent interface.478k1.3k

Ai Research ReproductionOrchestrate trustworthy, auditable reproduction of deep learning repositories directly from their READMEs.164k507

Run TrainSafely execute selected deep learning training commands with standardized evidence capture.164k507

Explore RunSafely run isolated exploratory experiments with clear recording and conservative selection before committing changes.164k507

Paper Context ResolverFetch precise reproduction-critical details like dataset splits, preprocessing steps, or evaluation protocols from the original academic paper when the repo README leav141k507

Repo Intake And PlanScan unfamiliar AI research repositories and receive a minimal, trustworthy reproduction target before investing significant time.140k507

Forks & variants (3)

Minimal Run And Audit has 3 known copies in the catalog totaling 176k installs. They canonicalize to this original listing.

lllllllama - 176k installs
lllllllama - 29 installs
lllllllama - 11 installs

FAQ

What does minimal-run-and-audit write after execution?

minimal-run-and-audit writes standardized repro_outputs/ files containing normalized execution evidence from the smoke test or inference command, plus patch notes when repository files were changed to enable the run.

What should minimal-run-and-audit not be used for?

minimal-run-and-audit should not be used for training execution, initial repo intake, environment setup, paper lookup, or end-to-end orchestration alone. It captures evidence from a selected minimal run only.

Is Minimal Run And Audit safe to install?

skills.sh reports 1 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

Data Science & MLresearch

Minimal Run And Audit

About

Minimal Run And Audit by the numbers

minimal-run-and-audit capabilities & compatibility

Add your badge

What it does

Who is it for?

When should I use this skill?

What you get

Files

minimal-run-and-audit

When to apply

When not to apply

Clear boundaries

Input expectations

Output expectations

Notes

Reporting Policy

Tone

Requirements

Output priorities

Avoid

Related skills

Forks & variants (3)

FAQ

What does minimal-run-and-audit write after execution?

What should minimal-run-and-audit not be used for?

Is Minimal Run And Audit safe to install?

About

Minimal Run And Audit by the numbers

minimal-run-and-audit capabilities & compatibility

Add your badge

What it does

Who is it for?

When should I use this skill?

What you get

Files

minimal-run-and-audit

When to apply

When not to apply

Clear boundaries

Input expectations

Output expectations

Notes

Reporting Policy

Tone

Requirements

Output priorities

Avoid

Related skills

Forks & variants (3)

FAQ

What does minimal-run-and-audit write after execution?

What should minimal-run-and-audit not be used for?

Is Minimal Run And Audit safe to install?

This week in AI coding