
Setup
Bootstrap a new autoresearch experiment with domain, target file, eval command, metric, and direction so agents can run optimize-and-measure loops.
Overview
Setup is an agent skill most often used in Build (also Ship) that interactively creates autoresearch experiment configuration for target files and eval metrics.
Install
npx skills add https://github.com/alirezarezvani/claude-skills --skill setupWhat is this skill?
- /ar:setup slash command with interactive or one-line CLI arguments
- Collects domain, name, target file, eval command, metric, and optimize direction
- Optional evaluator and scope parameters with --list and --list-evaluators
- Verifies target file exists before saving experiment config
- Delegates to setup_experiment.py under the skill scripts path
- 7 core setup parameters: domain, name, target, eval, metric, direction, optional evaluator
Adoption & trust: 1.4k installs on skills.sh; 17.5k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You want an agent to optimize a file automatically but lack a consistent experiment record linking the target, eval command, and success metric.
Who is it for?
Indie builders running autoresearch-style loops on engineering benchmarks, marketing copy, or prompts with repeatable eval commands.
Skip if: One-off edits without measurement, or production deploy pipelines that do not use the autoresearch experiment layout.
When should I use this skill?
Set up a new autoresearch experiment interactively or via /ar:setup with domain, target file, eval command, metric, and direction.
What do I get? / Deliverables
A new autoresearch experiment is written via setup_experiment.py so optimize runs can measure the chosen metric in the right direction.
- Persisted autoresearch experiment configuration
- Listed experiments or evaluators when using --list flags
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Experiment configuration is core build work: you wire the artifact the agent will edit to how success is measured. Agent-tooling subphase covers slash commands, eval harnesses, and experiment scaffolding—not the product UI itself.
Where it fits
Register api-speed experiment targeting src/api.py with pytest bench.py and p50_ms minimized.
Define evaluate.py and ctr_score before an agent runs headline variants under a fixed eval contract.
Scope a custom-domain experiment so the optimizer only touches prompts, not unrelated repo files.
How it compares
Experiment bootstrap for agent optimization, not a generic project scaffolder like create-react-app.
Common Questions / FAQ
Who is setup for?
Developers and solo builders using the autoresearch skill pack who need to define what file to optimize and how to score each iteration.
When should I use setup?
In build agent-tooling when first wiring an optimize loop, and in ship testing when defining bench commands (e.g. pytest bench.py) before automated iteration.
Is setup safe to install?
The skill invokes local Python setup scripts and may touch the filesystem; review the Security Audits panel on this Prism page and inspect scripts before running.
SKILL.md
READMESKILL.md - Setup
# /ar:setup — Create New Experiment Set up a new autoresearch experiment with all required configuration. ## Usage ``` /ar:setup # Interactive mode /ar:setup engineering api-speed src/api.py "pytest bench.py" p50_ms lower /ar:setup --list # Show existing experiments /ar:setup --list-evaluators # Show available evaluators ``` ## What It Does ### If arguments provided Pass them directly to the setup script: ```bash python {skill_path}/scripts/setup_experiment.py \ --domain {domain} --name {name} \ --target {target} --eval "{eval_cmd}" \ --metric {metric} --direction {direction} \ [--evaluator {evaluator}] [--scope {scope}] ``` ### If no arguments (interactive mode) Collect each parameter one at a time: 1. **Domain** — Ask: "What domain? (engineering, marketing, content, prompts, custom)" 2. **Name** — Ask: "Experiment name? (e.g., api-speed, blog-titles)" 3. **Target file** — Ask: "Which file to optimize?" Verify it exists. 4. **Eval command** — Ask: "How to measure it? (e.g., pytest bench.py, python evaluate.py)" 5. **Metric** — Ask: "What metric does the eval output? (e.g., p50_ms, ctr_score)" 6. **Direction** — Ask: "Is lower or higher better?" 7. **Evaluator** (optional) — Show built-in evaluators. Ask: "Use a built-in evaluator, or your own?" 8. **Scope** — Ask: "Store in project (.autoresearch/) or user (~/.autoresearch/)?" Then run `setup_experiment.py` with the collected parameters. ### Listing ```bash # Show existing experiments python {skill_path}/scripts/setup_experiment.py --list # Show available evaluators python {skill_path}/scripts/setup_experiment.py --list-evaluators ``` ## Built-in Evaluators | Name | Metric | Use Case | |------|--------|----------| | `benchmark_speed` | `p50_ms` (lower) | Function/API execution time | | `benchmark_size` | `size_bytes` (lower) | File, bundle, Docker image size | | `test_pass_rate` | `pass_rate` (higher) | Test suite pass percentage | | `build_speed` | `build_seconds` (lower) | Build/compile/Docker build time | | `memory_usage` | `peak_mb` (lower) | Peak memory during execution | | `llm_judge_content` | `ctr_score` (higher) | Headlines, titles, descriptions | | `llm_judge_prompt` | `quality_score` (higher) | System prompts, agent instructions | | `llm_judge_copy` | `engagement_score` (higher) | Social posts, ad copy, emails | ## After Setup Report to the user: - Experiment path and branch name - Whether the eval command worked and the baseline metric - Suggest: "Run `/ar:run {domain}/{name}` to start iterating, or `/ar:loop {domain}/{name}` for autonomous mode."