Explore Run

Name: Explore Run
Author: lllllllama

lllllllama/rigorpilot-skills

176k installs
512 repo stars
Updated July 26, 2026
lllllllama/rigorpilot-skills

explore-run is a Claude Code skill for planning, ranking, and executing bounded exploratory deep learning runs with evidence-based candidate selection.

About

An exploratory execution skill for deep learning research. Use it when you have explicit authorization for exploration and want to run small-subset validation, sweeps, or quick trials with fair-comparison caveats.

Plans and ranks exploratory runs with cost, success rate, and expected gain
Executes small-subset validation, short-cycle trials, or batch sweeps
Labels results as bounded evidence with no-overclaim summaries

Explore Run by the numbers

175,906 all-time installs (skills.sh)
+25,326 installs in the week ending Jul 28, 2026 (Skillselion tracking)
Ranked #4 of 2,066 Data Science & ML skills by installs in the Skillselion catalog
Security screen: MEDIUM risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

At a glance

explore-run capabilities & compatibility

Capabilities: variant planning · candidate ranking · experiment execution · evidence collection
Use cases: research · testing

npx skills add https://github.com/lllllllama/rigorpilot-skills --skill explore-run

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/lllllllama/rigorpilot-skills/explore-run.svg)](https://skillselion.com/skills/lllllllama/rigorpilot-skills/explore-run)

Installs	176k
repo stars	★ 512
Security audit	2 / 3 scanners passed
Last updated	July 26, 2026
Repository	lllllllama/rigorpilot-skills ↗

What it does

Plan and execute bounded exploratory deep learning runs with candidate ranking.

Who is it for?

Small-subset validation,Short-cycle training probes,Batch sweeps,Idle-GPU search

Skip if: Trusted training execution,Conservative verification,Repository setup,Implicit experimentation

When should I use this skill?

The researcher explicitly authorizes exploratory runs for small-subset validation, short-cycle probes, batch sweeps, or quick transfer-learning trials.

What you get

explore_outputs/ bundle with TOP_RUNS.md ranking candidates by real evidence with cost, success rate, and expected gain.

explore_outputs/TOP_RUNS.md
explore_outputs/SCIENTIFIC_CHANGELOG.md
explore_outputs/COMPARABILITY_REPORT.md

By the numbers

Three ranking factors: cost, success_rate, expected_gain; pre- and post-execution scoring

Files

SKILL.mdMarkdownGitHub ↗

explore-run

Use this as the Rigor Improve / Rigor Explore run leaf skill. The installed slug remains explore-run for compatibility.

Use the shared operating principles in ../../references/agent-operating-principles.md; this skill should guide candidate run planning while preserving model judgment about the active repo.

When to apply

When the researcher explicitly authorizes exploratory runs.
When the task is a small-subset validation, short-cycle training probe, batch sweep, idle-GPU search, or quick transfer-learning trial.
When the output should rank candidate runs rather than certify trusted success.

When not to apply

When the user wants trusted training execution or conservative verification.
When there is no explicit exploratory authorization.
When the task is repository setup, intake, or debugging.

Clear boundaries

This skill owns exploratory execution planning and summary only.
Use ai-research-explore instead when the task spans both current_research coordination and exploratory code changes.
It may hand off actual command execution to minimal-run-and-audit or run-train.
It should keep experiment state isolated from the trusted baseline.
It should prefer small-subset and short-cycle checks before heavier exploratory runs.
It should label run results as bounded evidence and explain when a comparison

is not directly fair.

Ranking Semantics

Pre-execution candidate selection uses three factors: cost, success_rate, and expected_gain.
Default weights should stay conservative unless the researcher explicitly provides selection_weights.
Budget pruning still applies after scoring through max_variants and max_short_cycle_runs.
If runs are executed later, downstream ranking should switch to real execution evidence, not stay purely heuristic.

Variant Spec Hints

Use variant_axes to define the candidate dimension grid.
Use subset_sizes and short_run_steps to express exploratory run scale.
Use selection_weights to rebalance cost, success_rate, and expected_gain.
Use primary_metric and metric_goal so downstream ranking can order executed candidates consistently.

Output expectations

explore_outputs/CHANGESET.md
explore_outputs/SCIENTIFIC_CHANGELOG.md
explore_outputs/COMPARABILITY_REPORT.md
explore_outputs/TOP_RUNS.md
explore_outputs/status.json

Notes

Use references/execution-policy.md, ../../references/explore-variant-spec.md, ../../references/deep-learning-experiment-principles.md, scripts/plan_variants.py, and scripts/write_outputs.py.

display_name: Rigor Improve / Rigor Explore
short_description: Rigor Improve / Rigor Explore run leaf mode for isolated exploratory experiment runs.
default_prompt: Plan isolated exploratory runs conservatively, prefer small-subset or short-cycle checks first, and write CHANGESET.md TOP_RUNS.md and status.json into explore_outputs.

#!/usr/bin/env python3
"""Generate a budget-aware exploratory variant matrix for isolated runs."""

from __future__ import annotations

import argparse
import itertools
import json
from pathlib import Path
from typing import Any, Dict, List, Sequence

DEFAULT_SELECTION_WEIGHTS = {
    "cost": 0.25,
    "success_rate": 0.35,
    "expected_gain": 0.40,
}


def load_spec(path: Path) -> Dict[str, Any]:
    return json.loads(path.read_text(encoding="utf-8-sig"))


def current_research_value(spec: Dict[str, Any]) -> str:
    return str(spec.get("current_research") or spec.get("baseline_ref") or "unknown")


def normalize_metric_goal(value: Any) -> str:
    text = str(value or "maximize").strip().lower()
    if text in {"min", "minimize", "lower", "lower_is_better"}:
        return "minimize"
    return "maximize"


def safe_float(value: Any) -> float:
    if value is None:
        return 0.0
    if isinstance(value, (int, float)):
        return float(value)
    try:
        return float(str(value))
    except ValueError:
        return 0.0


def clamp_score(value: float) -> float:
    return max(0.0, min(1.0, value))


def can_float(value: Any) -> bool:
    try:
        float(str(value))
        return True
    except (TypeError, ValueError):
        return False


def unique_preserving_order(values: Sequence[Any]) -> List[Any]:
    ordered: List[Any] = []
    for item in values:
        if item not in ordered:
            ordered.append(item)
    return ordered


def rank_lookup(values: Sequence[Any]) -> Dict[Any, int]:
    ordered_values = unique_preserving_order(values)
    if not ordered_values:
        return {}

    has_none = any(item is None for item in ordered_values)
    non_none = [item for item in ordered_values if item is not None]
    if non_none and all(can_float(item) for item in non_none):
        ordered_non_none = sorted(non_none, key=lambda item: (safe_float(item), str(item)))
    else:
        ordered_non_none = non_none

    ordered = ([None] if has_none else []) + ordered_non_none
    return {item: index for index, item in enumerate(ordered)}


def normalized_lookup_score(value: Any, lookup: Dict[Any, int]) -> float:
    if not lookup:
        return 0.0
    index = lookup.get(value, 0)
    max_index = max(lookup.values(), default=0)
    if max_index <= 0:
        return 0.0
    return index / max_index


def normalize_weights(spec: Dict[str, Any]) -> Dict[str, float]:
    raw = dict(DEFAULT_SELECTION_WEIGHTS)
    raw.update(spec.get("selection_weights", {}))
    total = sum(max(0.0, safe_float(value)) for value in raw.values())
    if total <= 0:
        return dict(DEFAULT_SELECTION_WEIGHTS)
    return {
        "cost": max(0.0, safe_float(raw.get("cost"))) / total,
        "success_rate": max(0.0, safe_float(raw.get("success_rate"))) / total,
        "expected_gain": max(0.0, safe_float(raw.get("expected_gain"))) / total,
    }


def axis_aggressiveness_score(axis_values: Dict[str, Any], axes: Dict[str, Sequence[Any]]) -> float:
    if not axis_values:
        return 0.0
    scores: List[float] = []
    for key, value in axis_values.items():
        options = list(axes.get(key, []))
        if not options:
            scores.append(0.0)
            continue
        lookup = {option: index for index, option in enumerate(options)}
        max_index = max(len(options) - 1, 1)
        scores.append(lookup.get(value, 0) / max_index)
    return sum(scores) / len(scores)


def annotate_variant_scores(
    raw_variants: List[Dict[str, Any]],
    spec: Dict[str, Any],
    subset_lookup: Dict[Any, int],
    step_lookup: Dict[Any, int],
) -> List[Dict[str, Any]]:
    axes = spec.get("variant_axes", {})
    weights = normalize_weights(spec)
    annotated: List[Dict[str, Any]] = []

    for item in raw_variants:
        subset_scale = normalized_lookup_score(item.get("subset_size"), subset_lookup)
        step_scale = normalized_lookup_score(item.get("short_run_steps"), step_lookup)
        axis_scale = axis_aggressiveness_score(item.get("axes", {}), axes)

        raw_cost = 0.50 * step_scale + 0.35 * subset_scale + 0.15 * axis_scale
        cost_efficiency_score = clamp_score(1.0 - raw_cost)
        predicted_success_score = clamp_score(1.0 - (0.45 * axis_scale + 0.35 * step_scale + 0.20 * subset_scale))
        predicted_gain_score = clamp_score(0.50 * axis_scale + 0.30 * step_scale + 0.20 * subset_scale)
        total_score = (
            weights["cost"] * cost_efficiency_score
            + weights["success_rate"] * predicted_success_score
            + weights["expected_gain"] * predicted_gain_score
        )

        annotated_item = dict(item)
        annotated_item.update(
            {
                "cost_score": round(raw_cost, 4),
                "cost_efficiency_score": round(cost_efficiency_score, 4),
                "predicted_success_score": round(predicted_success_score, 4),
                "predicted_gain_score": round(predicted_gain_score, 4),
                "total_score": round(total_score, 4),
                "estimated_runtime_units": round(1.0 + 3.0 * step_scale + 2.0 * subset_scale + axis_scale, 4),
                "feasibility_annotations": [],
            }
        )
        annotated.append(annotated_item)
    return annotated


def build_raw_variants(spec: Dict[str, Any]) -> List[Dict[str, Any]]:
    axes = spec.get("variant_axes", {})
    keys = sorted(axes)
    values = [axes[key] for key in keys]
    subset_sizes = spec.get("subset_sizes", [None])
    short_run_steps = spec.get("short_run_steps", [None])
    current_research = current_research_value(spec)

    subset_rank = rank_lookup(subset_sizes)
    step_rank = rank_lookup(short_run_steps)

    variants: List[Dict[str, Any]] = []
    index = 1
    for combo in itertools.product(*values):
        axis_values = dict(zip(keys, combo))
        axis_position_penalty = sum(axes[key].index(axis_values[key]) for key in keys)
        for subset_size in subset_sizes:
            for step_limit in short_run_steps:
                subset_position = subset_rank.get(subset_size, 0)
                step_position = step_rank.get(step_limit, 0)
                variants.append(
                    {
                        "id": f"variant-{index:03d}",
                        "axes": axis_values,
                        "subset_size": subset_size,
                        "short_run_steps": step_limit,
                        "current_research": current_research,
                        "baseline_ref": spec.get("baseline_ref", current_research),
                        "base_command": spec.get("base_command"),
                        "axis_position_penalty": axis_position_penalty,
                        "subset_rank": subset_position,
                        "step_rank": step_position,
                    }
                )
                index += 1

    return annotate_variant_scores(variants, spec, subset_rank, step_rank)


def prune_variants(raw_variants: List[Dict[str, Any]], spec: Dict[str, Any]) -> List[Dict[str, Any]]:
    max_variants = int(spec.get("max_variants") or 0)
    max_short_cycle_runs = int(spec.get("max_short_cycle_runs") or 0)

    ordered = sorted(
        raw_variants,
        key=lambda item: (
            -item.get("total_score", 0.0),
            -item.get("predicted_gain_score", 0.0),
            -item.get("predicted_success_score", 0.0),
            -item.get("cost_efficiency_score", 0.0),
            item.get("cost_score", 0.0),
            item.get("id", ""),
        ),
    )

    selected: List[Dict[str, Any]] = []
    short_cycle_count = 0
    for item in ordered:
        is_short_cycle = item.get("short_run_steps") is not None
        if max_short_cycle_runs > 0 and is_short_cycle and short_cycle_count >= max_short_cycle_runs:
            continue
        selected.append(item)
        if is_short_cycle:
            short_cycle_count += 1
        if max_variants > 0 and len(selected) >= max_variants:
            break
    return selected


def build_variants(spec: Dict[str, Any]) -> Dict[str, Any]:
    current_research = current_research_value(spec)
    raw_variants = build_raw_variants(spec)
    variants = prune_variants(raw_variants, spec)
    raw_variant_count = len(raw_variants)
    variant_count = len(variants)

    return {
        "schema_version": "1.0",
        "current_research": current_research,
        "baseline_ref": spec.get("baseline_ref", current_research),
        "base_command": spec.get("base_command"),
        "raw_variant_count": raw_variant_count,
        "variant_count": variant_count,
        "pruned_variant_count": raw_variant_count - variant_count,
        "variant_budget": {
            "max_variants": int(spec.get("max_variants") or 0),
            "max_short_cycle_runs": int(spec.get("max_short_cycle_runs") or 0),
        },
        "selection_policy": {
            "factors": ["cost", "success_rate", "expected_gain"],
            "weights": normalize_weights(spec),
            "scores": {
                "cost_score": "Lower is cheaper; derived from steps, subset size, and axis aggressiveness.",
                "cost_efficiency_score": "Higher is cheaper after inverting cost_score.",
                "predicted_success_score": "Higher means the candidate is more likely to run cleanly.",
                "predicted_gain_score": "Higher means the candidate is more likely to produce a measurable improvement.",
                "total_score": "Weighted composite used for pre-execution candidate ranking.",
            },
        },
        "metric_policy": {
            "primary_metric": spec.get("primary_metric"),
            "metric_goal": normalize_metric_goal(spec.get("metric_goal")),
        },
        "variants": variants,
    }


def main() -> int:
    parser = argparse.ArgumentParser(description="Build a budget-aware exploratory variant matrix.")
    parser.add_argument("--spec-json", required=True, help="Path to the exploration spec JSON file.")
    parser.add_argument("--output-json", help="Optional output path for the generated matrix.")
    parser.add_argument("--json", action="store_true", help="Emit the matrix to stdout.")
    args = parser.parse_args()

    payload = build_variants(load_spec(Path(args.spec_json).resolve()))
    if args.output_json:
        Path(args.output_json).write_text(json.dumps(payload, indent=2, ensure_ascii=False), encoding="utf-8")
    if args.json or not args.output_json:
        print(json.dumps(payload, indent=2, ensure_ascii=False))
    return 0


if __name__ == "__main__":
    raise SystemExit(main())

#!/usr/bin/env python3
"""Compatibility wrapper for exploratory run output bundles."""

from __future__ import annotations

import importlib.util
from pathlib import Path


def load_shared_module():
    module_path = Path(__file__).resolve().parents[3] / "shared" / "scripts" / "write_explore_bundle.py"
    spec = importlib.util.spec_from_file_location("write_explore_bundle", module_path)
    if spec is None or spec.loader is None:
        raise RuntimeError(f"Unable to load shared writer module from {module_path}")
    module = importlib.util.module_from_spec(spec)
    spec.loader.exec_module(module)
    return module


def main() -> int:
    module = load_shared_module()
    return module.main(default_mode="run", default_output_dir="explore_outputs")


if __name__ == "__main__":
    raise SystemExit(main())

Related skills

Microsoft FoundryDeploy, evaluate, and continuously improve Microsoft Foundry agents from a single agent interface.478k1.3k

Ai Research ReproductionOrchestrate trustworthy, auditable reproduction of deep learning repositories directly from their READMEs.164k507

Run TrainSafely execute selected deep learning training commands with standardized evidence capture.164k507

Paper Context ResolverFetch precise reproduction-critical details like dataset splits, preprocessing steps, or evaluation protocols from the original academic paper when the repo README leav141k507

Repo Intake And PlanScan unfamiliar AI research repositories and receive a minimal, trustworthy reproduction target before investing significant time.140k507

Env And Assets BootstrapCreate a reproducible, conservative conda environment plus required checkpoints, datasets and caches before attempting to run any AI research paper reproduction.140k507

Forks & variants (3)

Explore Run has 3 known copies in the catalog totaling 431 installs. They canonicalize to this original listing.

lllllllama - 393 installs
lllllllama - 29 installs
lllllllama - 9 installs

How it compares

Pick explore-run over minimal-run-and-audit when comparing multiple agent variants on an experiment branch rather than producing a single auditable verification report.

FAQ

How are candidates ranked?

Pre-execution: by cost, success_rate, expected_gain. Post-execution: by real command status, observed metrics, and artifacts.

Are results claimed as verified success?

No. Results are labeled as bounded evidence with explanations of when comparisons are not directly fair.

Is Explore Run safe to install?

skills.sh reports 2 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

Data Science & MLresearch

Explore Run

About

Explore Run by the numbers

explore-run capabilities & compatibility

Add your badge

What it does

Who is it for?

When should I use this skill?

What you get

By the numbers

Files

explore-run

When to apply

When not to apply

Clear boundaries

Ranking Semantics

Variant Spec Hints

Output expectations

Notes

Rigor Improve / Rigor Explore Run Policy

Purpose

Requirements

Avoid

Related skills

Forks & variants (3)

How it compares

FAQ

How are candidates ranked?

Are results claimed as verified success?

Is Explore Run safe to install?

About

Explore Run by the numbers

explore-run capabilities & compatibility

Add your badge

What it does

Who is it for?

When should I use this skill?

What you get

By the numbers

Files

explore-run

When to apply

When not to apply

Clear boundaries

Ranking Semantics

Variant Spec Hints

Output expectations

Notes

Rigor Improve / Rigor Explore Run Policy

Purpose

Requirements

Avoid

Related skills

Forks & variants (3)

How it compares

FAQ

How are candidates ranked?

Are results claimed as verified success?

Is Explore Run safe to install?

This week in AI coding