
Repo Intake And Plan
Scan a deep-learning research repo’s README and configs, extract documented commands, classify inference/eval/train paths, and recommend the smallest trustworthy reproduction target.
Overview
repo-intake-and-plan is an agent skill most often used in Validate (also Idea, Build) that scans a research repo and recommends the smallest trustworthy reproduction target from documented commands.
Install
npx skills add https://github.com/lllllllama/rigorpilot-skills --skill repo-intake-and-planWhat is this skill?
- Primary file scan order: README, requirements, environment.yml, pyproject.toml, setup.py, Dockerfile
- High-signal dirs: configs, scripts, tools, examples, notebooks, checkpoints
- Six-tier extraction priority from explicit README commands through training entrypoints
- Command classification: inference, evaluation, training, other with inferred tags flagged
- Conservative behavior: prefer README evidence, record ambiguity instead of overcommitting
- Six extraction priorities from explicit README commands through training entrypoints
- Primary scan covers README, requirements, environment files, pyproject.toml, setup.py, and Dockerfile
Adoption & trust: 32.3k installs on skills.sh; 412 GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You cloned a deep-learning repo but cannot tell which README command is the safest minimal path to reproduce results.
Who is it for?
Indie researchers onboarding to paper codebases who need command inventory and scope before running GPUs.
Skip if: Executing training, installing CUDA drivers, or resolving missing paper details without user direction.
When should I use this skill?
Scan this repository, read README and common project files, extract documented commands, classify inference evaluation and training paths, and recommend the smallest trustworthy reproduction target.
What do I get? / Deliverables
You receive classified inference, evaluation, and training entrypoints plus a conservative recommendation for the next execution skill such as run-train.
- Classified command inventory (inference, evaluation, training, other)
- Recommended smallest trustworthy reproduction target with ambiguity notes
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Intake and scoping belong in Validate before you commit GPU time or implementation depth in Build. Scope subphase matches narrowing from whole repo to one minimal verified entrypoint.
Where it fits
Compare whether a repo supports quick inference demos versus full metric reproduction before picking a project to fork.
Map documented eval versus train commands to decide the minimum viable reproduce step.
Choose an examples/ or scripts/ entrypoint labeled inference for a one-hour sanity check.
Hand a classified training path to run-train after intake confirms README evidence.
How it compares
Use instead of ad-hoc repo grep that guesses train.py roles without README-first evidence tiers.
Common Questions / FAQ
Who is repo-intake-and-plan for?
Solo and indie builders adopting third-party ML research repos who want agent-guided intake before reproduction or fine-tuning work.
When should I use repo-intake-and-plan?
Use it in Validate to scope reproduction; in Idea when comparing demo versus train paths; at the start of Build before backend training—before run-train when intake is still unclear.
Is repo-intake-and-plan safe to install?
Consult the Security Audits panel on this page; intake is read-heavy though agents may read secrets paths if misconfigured—point tools at the repo root you trust.
Workflow Chain
Then invoke: run train
SKILL.md
READMESKILL.md - Repo Intake And Plan
display_name: Rigor Intake short_description: Rigor Intake helper for scanning a repo and recommending the smallest trustworthy reproduction target. default_prompt: Scan this repository, read the README and common project files, extract documented commands, classify inference evaluation and training paths, and recommend the smallest trustworthy reproduction target. # Repo Scan Rules ## Primary files Always check these first when present: - `README.md` - `README` - `requirements.txt` - `environment.yml` - `environment.yaml` - `pyproject.toml` - `setup.py` - `setup.cfg` - `Dockerfile` ## High-signal directories Inspect for command or configuration clues: - `configs/` - `config/` - `scripts/` - `tools/` - `examples/` - `notebooks/` - `checkpoints/` ## Extraction priorities 1. explicit README commands 2. setup instructions 3. documented inference or demo entrypoints 4. documented evaluation entrypoints 5. documented training entrypoints 6. config references and asset path hints ## Classification guidance - `inference`: demo, predict, generate, sample, infer, test-time forward use - `evaluation`: eval, validate, benchmark, score, reproduce metrics - `training`: train, finetune, pretrain, launch long-running experiments - `other`: install, download, preprocess, export, convert, utility ## Conservative behavior - prefer explicit README evidence over filename guesses - mark guessed classifications as inferred - record ambiguity instead of overcommitting #!/usr/bin/env python3 """Extract shell-like commands from README content and classify them.""" from __future__ import annotations import argparse import json import re from pathlib import Path from typing import Dict, List, Optional CODE_BLOCK_RE = re.compile(r"```(?P<lang>[^\n`]*)\n(?P<body>.*?)```", re.DOTALL | re.IGNORECASE) INLINE_CMD_RE = re.compile(r"^\s*(?:\$|>|PS> )\s*(.+)$") HEADING_RE = re.compile(r"^(?P<marks>#{1,6})\s+(?P<title>.+?)\s*$") COMMAND_PREFIXES = ( "python ", "python3 ", "pip ", "pip3 ", "conda ", "bash ", "sh ", "chmod ", "export ", "set ", "CUDA_VISIBLE_DEVICES=", "./", "accelerate ", "torchrun ", "deepspeed ", "make ", "docker ", ) def collect_headings(readme_text: str) -> List[Dict[str, object]]: headings: List[Dict[str, object]] = [] offset = 0 for line in readme_text.splitlines(keepends=True): matched = HEADING_RE.match(line.strip()) if matched: headings.append( { "offset": offset, "level": len(matched.group("marks")), "title": matched.group("title").strip(), } ) offset += len(line) return headings def nearest_heading(headings: List[Dict[str, object]], offset: int) -> Optional[str]: current: Optional[str] = None for heading in headings: if int(heading["offset"]) > offset: break current = str(heading["title"]) return current def infer_section_category(section: Optional[str]) -> Optional[str]: if not section: return None lowered = section.lower() if any(word in lowered for word in ["inference", "usage", "demo", "example", "text-to-image", "image-to-image", "transcribe"]): return "inference" if any(word in lowered for word in ["evaluation", "evaluate", "benchmark", "metrics", "validation"]): return "evaluation" if any(word in lowered for word in ["training", "train", "finetune", "fine-tune", "pretrain"]): return "training" return None def infer_section_kind(section: Optional[str]) -> Optional[str]: if not section: return None lowered = section.lower() if any(word in lowered for word in ["install", "installation", "setup", "environment", "requirements"]): return "setup" if any(word in lowered for word in ["download", "checkpoint", "weights", "dataset", "data preparation"]): retur