
Env And Assets Bootstrap
Bootstrap a conservative conda-first environment and document checkpoint, dataset, and cache assumptions before reproducing a research README target.
Overview
Env-and-assets-bootstrap is an agent skill most often used in Validate (also Ship) that prepares conda-first environments and conservative checkpoint, dataset, and cache assumptions before a README-documented reproductio
Install
npx skills add https://github.com/lllllllama/rigorpilot-skills --skill env-and-assets-bootstrapWhat is this skill?
- Evidence order: README links, configs, code constants, then careful filename inference
- Tracks checkpoints, tokenizers, datasets, caches, and output dirs with present/missing/downloaded status
- Prefers conda/Anaconda for DL research repos per documented README trust order
- Never labels an asset canonical without repo or paper support
- Records requested asset, source URL, target path, and conservative download policy
- 4-step environment trust order (README, env files, package metadata, import inference)
- 5 common asset groups (checkpoints, tokenizers, datasets, caches, outputs)
Adoption & trust: 32.3k installs on skills.sh; 412 GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You are about to reproduce a research repo but dependencies and asset paths are ambiguous, so a blind run will fail or use the wrong weights.
Who is it for?
Solo builders reproducing ML or DL repos who want conda-first setup and explicit asset provenance before any execution skill runs.
Skip if: Teams that only need a one-line pip install with no checkpoints, or projects where the spec is already frozen and assets are fully cached with no audit need.
When should I use this skill?
Prepare a conservative conda-first environment plus checkpoint, dataset, and cache assumptions for this README-documented reproduction target before any run.
What do I get? / Deliverables
You get a documented, conservative env and asset ledger aligned to README evidence so the next Rigor Run can execute with copyable commands and honest blocker notes.
- Asset status table with source, path, and present/missing/downloaded state
- Documented environment assumptions aligned to README order of evidence
- Inputs ready for downstream SUMMARY.md and status.json from run skills
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Reproduction work belongs on the validate shelf because you are proving the published artifact runs before you treat results as trusted. Prototype is where you stand up env files, asset paths, and documented sources—not full product build.
Where it fits
Map README checkpoint links and conda steps before claiming the baseline build is ready.
Ensure eval caches and tokenizer files are logged before a smoke inference command.
Re-audit asset paths after upstream repo changes a default dataset URL.
How it compares
Use instead of letting the agent guess download URLs or mix unofficial checkpoints with documented sources.
Common Questions / FAQ
Who is env-and-assets-bootstrap for?
Indie ML builders and agent users running RigorPilot who must isolate environments and prove which assets exist before smoke or eval commands.
When should I use env-and-assets-bootstrap?
In validate when prototyping a paper reproduction; in ship when testing needs documented caches; anytime a README-documented target needs conda-first bootstrap before minimal-run-and-audit.
Is env-and-assets-bootstrap safe to install?
Review the Security Audits panel on this Prism page and treat network downloads and local paths as sensitive before approving agent execution.
Workflow Chain
Then invoke: minimal run and audit
SKILL.md
READMESKILL.md - Env And Assets Bootstrap
display_name: Rigor Setup short_description: Rigor Setup mode for conservative environment and asset assumptions before a reproduction run. default_prompt: Prepare a conservative conda-first environment plus checkpoint, dataset, and cache assumptions for this README-documented reproduction target before any run. # Assets Policy ## Goal Prepare checkpoints, datasets, and caches conservatively and transparently. ## Order of evidence 1. README links and paths 2. config files and default arguments 3. code-level constants or path joins 4. careful inference from filenames ## Behavior - prefer documented asset sources - preserve source URLs or identifiers when recording downloads - avoid mirroring unofficial files unless the project explicitly points to them - never claim an asset is the canonical one unless the repository or primary paper source supports it ## Common asset groups - model checkpoints - tokenizer files - dataset archives or prepared splits - cache directories - output directories ## Reporting Record: - requested asset - source - target local path - status: present, missing, downloaded, skipped, unknown # Environment Policy ## Default preference Prefer conda or Anaconda-style setup for deep learning research repositories because it is common in research code and helps isolate conflicting dependencies. ## Order of trust 1. README environment instructions 2. repository environment files 3. package metadata files 4. conservative inference from imports or script names ## OS guidance - Linux is the default reference environment. - Support Windows and macOS where practical. - If a repository is clearly Linux-only, record that rather than pretending otherwise. - When virtualenv activation is needed, emit platform-specific commands instead of a fake one-size-fits-all activation step. - Prefer Python entrypoints over shell-only helpers when the same setup logic should run on Windows, macOS, and Linux. ## Dependency handling - prefer existing `environment.yml` - otherwise translate README requirements into a simple conda-plus-pip setup - avoid aggressive upgrades unless needed for a verified fix - record version uncertainty explicitly ## Out of scope by default - container orchestration - cluster schedulers - custom CUDA builds unless clearly required #!/usr/bin/env python3 """Bootstrap a conservative research environment on Windows, macOS, or Linux.""" from __future__ import annotations import argparse import shutil import subprocess import sys from pathlib import Path from typing import Iterable, List, Optional from plan_setup import ENV_FILES, find_first, parse_env_name, venv_activation_commands CONDA_ENV_FILES = {"environment.yml", "environment.yaml", "conda.yml"} def format_command(command: Iterable[str]) -> str: return " ".join(str(part) for part in command) def run_command(command: List[str], *, cwd: Path, dry_run: bool) -> None: print(f"+ {format_command(command)}") if dry_run: return subprocess.run(command, cwd=cwd, check=True) def choose_manager(preferred: str) -> Optional[str]: if preferred != "auto": if shutil.which(preferred): return preferred raise FileNotFoundError(f"Requested manager `{preferred}` was not found on PATH.") for candidate in ["conda", "mamba"]: if shutil.which(candidate): return candidate return None def venv_python(env_dir: Path) -> Path: if sys.platform.startswith("win"): return env_dir / "Scripts" / "python.exe" return env_dir / "bin" / "python" def print_activation_instructions(env_name: Optional[str], using_conda: bool) -> None: if using_conda: target = env_name or "<env-name>" print(f"Activate with: conda activate {target}") return print("Activate the virtualenv with one of:") for item in venv_activation_commands(): platforms = ", ".join(item.get("platforms", [])) print(f" [{platforms}] {item['comman