Repo Intake And Plan

Name: Repo Intake And Plan
Author: lllllllama

lllllllama/ai-paper-reproduction-skill

140k installs
512 repo stars
Updated July 26, 2026
lllllllama/ai-paper-reproduction-skill

Repo-intake-and-plan is a helper skill that scans deep learning repos and returns a minimum trustworthy reproduction plan.

About

Repo-intake-and-plan is a helper skill that scans deep learning repositories to extract structure, documented commands, and candidate classification. It returns a minimum trustworthy reproduction plan by mapping repo structure and identifying inference, evaluation, and training targets. Use at the beginning of reproduction work to understand what commands are available and which are safest to run.

Fast README-first scan of deep learning repositories extracting structure and documented commands
Classifies inference, evaluation, and training candidates conservatively
Returns minimum trustworthy reproduction plan with risk assessment

Repo Intake And Plan by the numbers

139,956 all-time installs (skills.sh)
+22 installs in the week ending Aug 2, 2026 (Skillselion tracking)
Ranked #6 of 2,065 Data Science & ML skills by installs in the Skillselion catalog
Security screen: LOW risk (skills.sh audit)
Data as of Aug 3, 2026 (Skillselion catalog sync)

At a glance

repo-intake-and-plan capabilities & compatibility

Capabilities: repo scanning · command extraction · target classification

npx skills add https://github.com/lllllllama/ai-paper-reproduction-skill --skill repo-intake-and-plan

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/lllllllama/ai-paper-reproduction-skill/repo-intake-and-plan.svg)](https://skillselion.com/skills/lllllllama/ai-paper-reproduction-skill/repo-intake-and-plan)

Installs	140k
repo stars	★ 512
Security audit	3 / 3 scanners passed
Last updated	July 26, 2026
Repository	lllllllama/ai-paper-reproduction-skill ↗

What it does

ML researchers scan repositories to understand structure and extract minimal trustworthy reproduction plans.

Who is it for?

ML researchers planning paper reproductions without running code initially

Skip if: Developers who already know the exact command to run, need paper-level scientific detail lookup, or want full training reproduction without scoping first.

When should I use this skill?

Starting repo reproduction and need to map structure, extract commands, and classify targets

What you get

Repository structure summary, documented commands, candidate classification, and minimum reproduction plan

reproduction target recommendation
classified execution paths

By the numbers

Checks 8 primary manifest files including README, requirements.txt, pyproject.toml, and Dockerfile
Inspects configs/ and config/ directories for high-signal reproduction clues

Files

SKILL.mdMarkdownGitHub ↗

repo-intake-and-plan

Use this as the Rigor Intake helper. The installed slug remains repo-intake-and-plan for compatibility.

When to apply

At the beginning of README-first reproduction work.
When the main skill needs a fast map of repo structure and documented commands.
When inference, evaluation, and training candidates must be classified conservatively.
When the user explicitly wants to inspect the repo first and not run anything yet.

When not to apply

When execution has already started and the task is now about running commands or writing outputs.
When the target is not a repository-backed reproduction task.
When the user only wants paper interpretation without repo inspection.
When the user already has a selected documented command and only needs setup or execution.

Clear boundaries

This skill scans and plans.
This skill is helper-tier and should usually be orchestrator-invoked.
It does not install environments.
It does not prepare large assets.
It does not execute substantive reproduction commands.
It does not decide high-risk patching.

Input expectations

Target repository path.
Access to README and common project files if present.
Optional user hints about desired priority, such as inference-first.

Output expectations

concise repo structure summary
documented command inventory
inferred candidate categories: inference, evaluation, training, other
minimum trustworthy reproduction recommendation
notable ambiguity or risk list

Notes

Use references/repo-scan-rules.md and helper scripts under scripts/.

display_name: Rigor Intake
short_description: Rigor Intake helper for scanning a repo and recommending the smallest trustworthy reproduction target.
default_prompt: Scan this repository, read the README and common project files, extract documented commands, classify inference evaluation and training paths, and recommend the smallest trustworthy reproduction target.

#!/usr/bin/env python3
"""Extract shell-like commands from README content and classify them."""

from __future__ import annotations

import argparse
import json
import re
from pathlib import Path
from typing import Dict, List, Optional


CODE_BLOCK_RE = re.compile(r"```(?P<lang>[^\n`]*)\n(?P<body>.*?)```", re.DOTALL | re.IGNORECASE)
INLINE_CMD_RE = re.compile(r"^\s*(?:\$|>|PS> )\s*(.+)$")
HEADING_RE = re.compile(r"^(?P<marks>#{1,6})\s+(?P<title>.+?)\s*$")
COMMAND_PREFIXES = (
    "python ",
    "python3 ",
    "pip ",
    "pip3 ",
    "conda ",
    "bash ",
    "sh ",
    "chmod ",
    "export ",
    "set ",
    "CUDA_VISIBLE_DEVICES=",
    "./",
    "accelerate ",
    "torchrun ",
    "deepspeed ",
    "make ",
    "docker ",
)


def collect_headings(readme_text: str) -> List[Dict[str, object]]:
    headings: List[Dict[str, object]] = []
    offset = 0
    for line in readme_text.splitlines(keepends=True):
        matched = HEADING_RE.match(line.strip())
        if matched:
            headings.append(
                {
                    "offset": offset,
                    "level": len(matched.group("marks")),
                    "title": matched.group("title").strip(),
                }
            )
        offset += len(line)
    return headings


def nearest_heading(headings: List[Dict[str, object]], offset: int) -> Optional[str]:
    current: Optional[str] = None
    for heading in headings:
        if int(heading["offset"]) > offset:
            break
        current = str(heading["title"])
    return current


def infer_section_category(section: Optional[str]) -> Optional[str]:
    if not section:
        return None
    lowered = section.lower()
    if any(word in lowered for word in ["inference", "usage", "demo", "example", "text-to-image", "image-to-image", "transcribe"]):
        return "inference"
    if any(word in lowered for word in ["evaluation", "evaluate", "benchmark", "metrics", "validation"]):
        return "evaluation"
    if any(word in lowered for word in ["training", "train", "finetune", "fine-tune", "pretrain"]):
        return "training"
    return None


def infer_section_kind(section: Optional[str]) -> Optional[str]:
    if not section:
        return None
    lowered = section.lower()
    if any(word in lowered for word in ["install", "installation", "setup", "environment", "requirements"]):
        return "setup"
    if any(word in lowered for word in ["download", "checkpoint", "weights", "dataset", "data preparation"]):
        return "asset"
    if any(word in lowered for word in ["usage", "demo", "example", "inference", "evaluation", "training", "text-to-image", "image-to-image"]):
        return "run"
    return None


def classify(command: str, section: Optional[str] = None) -> str:
    section_category = infer_section_category(section)
    if section_category:
        return section_category

    lowered = command.lower()
    if any(
        word in lowered
        for word in [
            "infer",
            "inference",
            "predict",
            "generate",
            "sample",
            "demo",
            "txt2img",
            "img2img",
            "transcribe",
            "whisper ",
            "amg.py",
        ]
    ):
        return "inference"
    if any(word in lowered for word in ["eval", "evaluate", "validation", "validate", "benchmark", "score"]):
        return "evaluation"
    if any(word in lowered for word in ["train", "training", "finetune", "fine-tune", "pretrain", "pre-train"]):
        return "training"
    return "other"


def command_kind(command: str, section: Optional[str] = None) -> str:
    section_kind = infer_section_kind(section)
    if section_kind:
        return section_kind

    lowered = command.lower().strip()
    setup_prefixes = (
        "pip install",
        "pip3 install",
        "conda install",
        "conda env create",
        "conda create",
        "conda activate",
        "python -m pip install",
        "git clone",
        "cd ",
    )
    asset_prefixes = ("wget ", "curl ", "mkdir ", "tar ", "unzip ", "7z ", "aria2c ")
    if lowered.startswith(setup_prefixes):
        return "setup"
    if lowered.startswith(asset_prefixes):
        return "asset"
    if "--help" in lowered or " -h" in lowered:
        return "smoke"
    return "run"


def looks_like_command(line: str) -> bool:
    candidate = re.sub(r"^(?:\$|PS> )\s*", "", line.strip())
    if not candidate or candidate.startswith("#"):
        return False
    if candidate.startswith(("python", "pip", "conda", "bash", "sh", "make", "docker")):
        return True
    if candidate.startswith(COMMAND_PREFIXES):
        return True
    if re.search(r"\s--[A-Za-z0-9_-]+", candidate):
        return True
    if re.search(r"\b(?:python|pip|conda|torchrun|deepspeed|accelerate|bash|sh)\b", candidate):
        return True
    if re.search(r"[\\/].+\.(?:py|sh|bat)", candidate):
        return True
    if candidate.startswith(("cd ", "ls ", "mkdir ", "wget ", "curl ", "git ")):
        return True
    return False


def clean_lines(block: str) -> List[str]:
    commands: List[str] = []
    for raw_line in block.splitlines():
        line = raw_line.strip()
        if not line or line.startswith("#"):
            continue
        if not looks_like_command(line):
            continue
        line = re.sub(r"^(?:\$|PS> )\s*", "", line)
        commands.append(line)
    return commands


def extract_commands(readme_text: str) -> Dict[str, object]:
    commands: List[Dict[str, str]] = []
    warnings: List[str] = []
    seen = set()
    headings = collect_headings(readme_text)

    for match in CODE_BLOCK_RE.finditer(readme_text):
        lang = (match.group("lang") or "").strip().lower()
        if lang and lang not in {"bash", "shell", "sh", "zsh", "powershell", "cmd"}:
            continue

        section = nearest_heading(headings, match.start())
        lines = clean_lines(match.group("body"))
        if not lines:
            continue

        for line in lines:
            if line not in seen:
                commands.append(
                    {
                        "command": line,
                        "category": classify(line, section),
                        "kind": command_kind(line, section),
                        "section": section,
                        "source": "code_block",
                    }
                )
                seen.add(line)

    running_offset = 0
    for line in readme_text.splitlines(keepends=True):
        matched = INLINE_CMD_RE.match(line)
        if not matched:
            running_offset += len(line)
            continue
        command = matched.group(1).strip()
        if not looks_like_command(command):
            running_offset += len(line)
            continue
        section = nearest_heading(headings, running_offset)
        if command and command not in seen:
            commands.append(
                {
                    "command": command,
                    "category": classify(command, section),
                    "kind": command_kind(command, section),
                    "section": section,
                    "source": "inline",
                }
            )
            seen.add(command)
        running_offset += len(line)

    if not commands:
        warnings.append("No shell-like commands were extracted from the README.")

    counts: Dict[str, int] = {}
    for item in commands:
        category = item["category"]
        counts[category] = counts.get(category, 0) + 1

    return {
        "commands": commands,
        "counts": counts,
        "warnings": warnings,
    }


def main() -> int:
    parser = argparse.ArgumentParser(description="Extract shell-like commands from a README.")
    parser.add_argument("--readme", required=True, help="Path to the README file.")
    parser.add_argument("--json", action="store_true", help="Emit JSON output.")
    args = parser.parse_args()

    readme_path = Path(args.readme)
    text = readme_path.read_text(encoding="utf-8", errors="replace")
    data = extract_commands(text)

    if args.json:
        print(json.dumps(data, indent=2, ensure_ascii=False))
    else:
        for item in data["commands"]:
            print(f"[{item['category']}] {item['command']}")
        if data["warnings"]:
            print("Warnings:")
            for warning in data["warnings"]:
                print(f"- {warning}")
    return 0


if __name__ == "__main__":
    raise SystemExit(main())

#!/usr/bin/env python3
"""Scan a repository for README-first reproduction signals."""

from __future__ import annotations

import argparse
import json
from datetime import datetime, timezone
from pathlib import Path
from typing import Dict, List, Optional


KEY_FILES = [
    "README.md",
    "README",
    "requirements.txt",
    "environment.yml",
    "environment.yaml",
    "pyproject.toml",
    "setup.py",
    "setup.cfg",
    "Dockerfile",
]

SIGNAL_DIRS = [
    "configs",
    "config",
    "scripts",
    "tools",
    "examples",
    "notebooks",
    "checkpoints",
]


def first_existing(root: Path, names: List[str]) -> Optional[Path]:
    for name in names:
        candidate = root / name
        if candidate.exists():
            return candidate
    return None


def scan_repo(root: Path) -> Dict[str, object]:
    if not root.exists():
        raise FileNotFoundError(f"Repository path does not exist: {root}")

    top_level = sorted(item.name for item in root.iterdir())
    detected_files = [name for name in KEY_FILES if (root / name).exists()]
    detected_dirs = [name for name in SIGNAL_DIRS if (root / name).exists()]
    readme = first_existing(root, ["README.md", "README"])

    warnings: List[str] = []
    if readme is None:
        warnings.append("No README file was found at the repository root.")
    if not detected_files:
        warnings.append("No common environment or packaging files were detected.")

    return {
        "generated_at": datetime.now(timezone.utc).isoformat(),
        "repo_path": str(root.resolve()),
        "readme_path": str(readme.resolve()) if readme else None,
        "detected_files": detected_files,
        "detected_dirs": detected_dirs,
        "structure": {
            "top_level": top_level,
            "top_level_file_count": sum(1 for item in root.iterdir() if item.is_file()),
            "top_level_dir_count": sum(1 for item in root.iterdir() if item.is_dir()),
        },
        "warnings": warnings,
    }


def main() -> int:
    parser = argparse.ArgumentParser(description="Scan a repository for key reproduction signals.")
    parser.add_argument("--repo", required=True, help="Path to the target repository.")
    parser.add_argument("--json", action="store_true", help="Emit JSON instead of a human summary.")
    args = parser.parse_args()

    data = scan_repo(Path(args.repo))
    if args.json:
        print(json.dumps(data, indent=2, ensure_ascii=False))
    else:
        print(f"Repository: {data['repo_path']}")
        print(f"README: {data['readme_path'] or 'not found'}")
        print("Detected files:", ", ".join(data["detected_files"]) or "none")
        print("Detected dirs:", ", ".join(data["detected_dirs"]) or "none")
        if data["warnings"]:
            print("Warnings:")
            for item in data["warnings"]:
                print(f"- {item}")
    return 0


if __name__ == "__main__":
    raise SystemExit(main())

Related skills

Microsoft FoundryDeploy, evaluate, and continuously improve Microsoft Foundry agents from a single agent interface.490k1.3k

Ai Research ReproductionOrchestrate trustworthy, auditable reproduction of deep learning repositories directly from their READMEs.176k511

Run TrainSafely execute selected deep learning training commands with standardized evidence capture.176k511

Explore RunSafely run isolated exploratory experiments with clear recording and conservative selection before committing changes.176k511

Paper Context ResolverFetch precise reproduction-critical details like dataset splits, preprocessing steps, or evaluation protocols from the original academic paper when the repo README leav141k511

Env And Assets BootstrapCreate a reproducible, conservative conda environment plus required checkpoints, datasets and caches before attempting to run any AI research paper reproduction.140k511

Forks & variants (3)

Repo Intake And Plan has 3 known copies in the catalog totaling 185k installs. They canonicalize to this original listing.

lllllllama - 185k installs
lllllllama - 29 installs
lllllllama - 11 installs

FAQ

Which files does repo-intake-and-plan check first?

repo-intake-and-plan checks README.md, requirements.txt, environment.yml, pyproject.toml, setup.py, setup.cfg, and Dockerfile first, then inspects configs/ and config/ directories for command and configuration clues.

What does repo-intake-and-plan output?

repo-intake-and-plan outputs a recommended smallest trustworthy reproduction target plus classified inference, evaluation, and training paths extracted from documented repository commands and configuration files.

Is Repo Intake And Plan safe to install?

skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

Data Science & MLresearchautomation

Repo Intake And Plan

About

Repo Intake And Plan by the numbers

repo-intake-and-plan capabilities & compatibility

Add your badge

What it does

Who is it for?

When should I use this skill?

What you get

By the numbers

Files

repo-intake-and-plan

When to apply

When not to apply

Clear boundaries

Input expectations

Output expectations

Notes

Repo Scan Rules

Primary files

High-signal directories

Extraction priorities

Classification guidance

Conservative behavior

Related skills

Forks & variants (3)

FAQ

Which files does repo-intake-and-plan check first?

What does repo-intake-and-plan output?

Is Repo Intake And Plan safe to install?

About

Repo Intake And Plan by the numbers

repo-intake-and-plan capabilities & compatibility

Add your badge

What it does

Who is it for?

When should I use this skill?

What you get

By the numbers

Files

repo-intake-and-plan

When to apply

When not to apply

Clear boundaries

Input expectations

Output expectations

Notes

Repo Scan Rules

Primary files

High-signal directories

Extraction priorities

Classification guidance

Conservative behavior

Related skills

Forks & variants (3)

FAQ

Which files does repo-intake-and-plan check first?

What does repo-intake-and-plan output?

Is Repo Intake And Plan safe to install?

This week in AI coding