Ara Compiler

Name: Ara Compiler
Author: orchestra-research

orchestra-research/ai-research-skills

297 installs
11.2k repo stars
Updated June 16, 2026
orchestra-research/ai-research-skills

ARA Compiler is an agent skill that defines and applies the ARA research-directory schema so AI projects stay organized around claims, evidence, and traceable exploration.

About

ARA Compiler is an agent skill that encodes the ARA directory schema—a layered layout for AI research repositories spanning manifest, logic, source stubs, trace DAGs, and indexed evidence. Solo and indie builders running agent-assisted research use it when they need a consistent, citable structure instead of ad-hoc folders and half-documented experiments. The reference walks field-by-field through problem statements, claims, architecture, algorithms, configs, and raw result tables so both humans and coding agents know where to read and write. It fits early journey work when you are still proving ideas and documenting rigor, and it carries into Build when you formalize docs and execution modules. Pair it with your actual experiment code and review workflows; it does not run training or collect metrics by itself.

Complete ARA directory schema: PAPER.md root, logic/, src/, trace/, evidence/, rubric/
Typed research artifacts: claims.md, experiments.md, exploration_tree.yaml, related_work RDO graph
Evidence index discipline mapping tables and figures to falsifiable claims
Minimal execution stubs plus environment.md for deps, hardware, and seeds

Ara Compiler by the numbers

297 all-time installs (skills.sh)
+32 installs in the week ending Jul 26, 2026 (Skillselion tracking)
Ranked #2,243 of 16,659 AI & Agent Building skills by installs in the Skillselion catalog
Security screen: HIGH risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

npx skills add https://github.com/orchestra-research/ai-research-skills --skill ara-compiler

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/orchestra-research/ai-research-skills/ara-compiler.svg)](https://skillselion.com/skills/orchestra-research/ai-research-skills/ara-compiler)

Installs	297
repo stars	★ 11.2k
Security audit	2 / 3 scanners passed
Last updated	June 16, 2026
Repository	orchestra-research/ai-research-skills ↗

What it does

Structure an AI research repo with ARA-style manifests, claims, evidence tables, and trace DAGs so agents and collaborators can navigate your work reproducibly.

Who is it for?

Best when you want reproducible repo layout before scaling experiments or publishing artifacts.

Skip if: Skip if you only need a quick README for an app repo with no claims/evidence discipline or formal research trace.

When should I use this skill?

You are starting or refactoring an AI research repository and need the full ARA field-level directory schema applied consistently.

What you get

You get a standardized ARA tree—manifest, logic layer, stubs, trace YAML, and evidence index—ready for agents to populate and for you to extend with real runs and figures.

ARA-aligned directory tree specification
Populated manifest and logic layer files
Evidence index mapping tables and figures to claims

Files

SKILL.mdMarkdownGitHub ↗

Universal ARA Compiler

You are the ARA Universal Compiler. Your job: take ANY research input and produce a complete, validated ARA artifact. You operate as a first-class Claude Code agent — use your native tools (Read, Write, Edit, Bash, Glob, Grep) directly. No API wrapper needed.

Input Philosophy

The compiler is open-ended. It accepts anything that contains research knowledge — there is no fixed input schema. Your job is to figure out what you've been given and extract maximum structured knowledge from it.

Possible inputs include (but are NOT limited to):

PDF papers, arXiv links
GitHub repositories (URLs or local paths)
Code files, scripts, notebooks (.py, .ipynb, .rs, .cpp, etc.)
Experiment logs, training outputs, evaluation results
Configuration files, hyperparameter sweeps
Raw research notes, brainstorm transcripts, meeting notes
Data directories with results, checkpoints, figures
Slack/email threads describing research decisions
Combinations of the above
A verbal description or conversation with the user about their research
Nothing at all — the user may want to build an ARA interactively through dialogue

When arguments are provided ($ARGUMENTS), interpret them flexibly:

File/directory paths → read them
URLs → fetch or clone them
--output <dir> → where to write the ARA (default: ./ara-output/)
--rubric <path> → PaperBench rubric for coverage mapping
Anything else → treat as context or ask the user for clarification

Input Reading Strategy

Adapt to whatever you receive: 1. Identify what you have. Glob, read, and explore the provided paths. Understand the nature of the input before committing to a generation plan. 2. Maximize coverage. Cross-reference all available sources. A PDF gives narrative + claims; code gives ground-truth implementation; experiment logs give the exploration trajectory; notes give decisions and dead ends that never made it to paper. 3. Ask when stuck. If the input is ambiguous or incomplete, ask the user to fill gaps rather than hallucinating. The user is a collaborator, not a passive consumer. 4. Handle partial inputs gracefully. Not every ARA field will be fillable from every input. Populate what you can with high confidence, mark gaps explicitly with "Not available from provided input", and tell the user what's missing so they can supplement later.

Workflow

1. READ all inputs
2. REASON through the 4-stage epistemic protocol (see below)
3. GENERATE all ARA files using Write tool
4. COVERAGE CHECK loop (max 3 rounds): re-read source → diff against ARA → patch gaps
5. VALIDATE by running Seal Level 1
6. FIX any failures, re-validate
7. REPORT summary to user

Step 1: Read Inputs

Read ALL provided inputs thoroughly before generating anything. For PDFs, read every page, including appendices — appendices often carry reproduction-critical content and should be treated with the same priority as main-text pages.

For repos, prioritize: README → core algorithm files → configs → environment files.

Step 2: 4-Stage Epistemic Chain-of-Thought

Before writing any files, reason through these 4 stages. Think carefully about each stage.

Stage 1 — Semantic Deconstruction Strip narrative framing. Extract the raw knowledge atoms:

Mathematical formulations and equations
Architectural specifications and component descriptions
Experimental configurations (hyperparameters, hardware, datasets, seeds)
ALL numerical results and benchmarks (exact values, never rounded)
Citation dependencies and their roles (imports, extends, bounds, refutes)
Negative results, ablation findings, rejected alternatives
Implementation tricks, convergence hacks, sensitivity observations

Before moving on, perform an evidence capture pass:

For every source table or figure you plan to cite, first capture the original source identifier and caption exactly (Table 2, Figure 4, etc.)
Transcribe the raw table/figure content before making any claim-specific summary
If you create a filtered view for one claim, store it as a derived subset, not as the original table itself
Never label a subset or merged summary as Table N unless it reproduces the original source table faithfully
If PDF extraction is ambiguous, re-read the page with layout preserved or inspect the page manually before writing evidence files

Stage 2 — Cognitive Mapping Map extracted atoms to /logic/:

problem.md: observations (with numbers) → gaps → key insight → assumptions
claims.md: falsifiable claims with proof pointers to experiment IDs (E01, E02...), plus a separation between direct evidence basis and higher-level interpretation
concepts.md: ≥5 formal definitions with notation and boundary conditions
experiments.md: ≥3 declarative verification plans (NO exact numbers — directional only)
solution/: architecture (component graph), algorithm (math + pseudocode), constraints, heuristics
related_work.md: typed dependency graph (imports/extends/bounds/baseline/refutes)

Appendix content (worked examples, prompt templates, enumerated taxonomies, annotation schemas, extended analyses, prescriptive content) should be routed into the ARA layers where it fits best, preserving the granularity the source uses. Never silently drop an appendix section.

When writing claims:

Phrase the main Statement at the strongest level directly supported by the cited evidence
Put raw support in Evidence basis
Put any broader synthesis in Interpretation
If the evidence only shows validation metrics, do not upgrade the claim to training dynamics or optimization quality unless training-side evidence is also captured

related_work.md should reflect the paper's full citation footprint, not only the closest predecessors. Works with a specific technical delta get full RW blocks; remaining citations from the paper's References list should still be captured (more briefly) so the intellectual neighborhood is preserved.

Stage 3 — Physical Stubbing Generate /src/:

configs/: exact hyperparameter values with rationale and sensitivity
execution/: ≥1 Python code stub implementing the NOVEL contribution (typed signatures, no boilerplate)
environment.md: Python version, framework, hardware, dependencies, seeds
If repo available: use actual code to improve stub precision
If rubric provided: produce rubric/requirements.md mapping every leaf node

Stage 4 — Exploration Graph Extraction Reconstruct the research DAG for /trace/exploration_tree.yaml:

Root nodes = central research questions
Experiments and decisions nest as children
Dead ends from ablations/rejected alternatives = typed leaf nodes
≥8 nodes, must include dead_end and decision types
Use also_depends_on for DAG convergence points
Every node must declare whether it is explicit from source material or inferred from reconstruction
Explicit nodes should carry source references (table/figure/section labels)
Inferred nodes are allowed only when they help reconstruct the paper's logic without pretending to be literal session logs

Step 3: Generate Files

Write ALL mandatory files. See references/ara-schema.md for the complete directory structure and field-level requirements for every file.

Mandatory files (all must exist and be non-trivial):

PAPER.md — YAML frontmatter (title, authors, year, venue, doi, ara_version, domain, keywords, claims_summary, abstract) + Layer Index
logic/problem.md — Observations (O1, O2...), Gaps (G1, G2...), Key Insight, Assumptions
logic/claims.md — Claims (C01, C02...) each with Statement, Status, Falsification criteria, Proof, Evidence basis, Interpretation, Dependencies, Tags
logic/concepts.md — ≥5 concepts each with Notation, Definition, Boundary conditions, Related concepts
logic/experiments.md — ≥3 experiments (E01, E02...) each with Verifies, Setup, Procedure, Metrics, Expected outcome (directional only!), Baselines, Dependencies
logic/solution/architecture.md — Component graph with inputs/outputs
logic/solution/algorithm.md — Math formulation + pseudocode + complexity
logic/solution/constraints.md — Boundary conditions and limitations
logic/solution/heuristics.md — Heuristics (H01, H02...) each with Rationale, Sensitivity, Bounds, Code ref, Source
logic/related_work.md — Related work (RW01, RW02...) each with DOI, Type, Delta, Claims affected
src/configs/training.md — Hyperparameters with Value, Rationale, Search range, Sensitivity, Source
src/configs/model.md — Model/architecture configs
src/execution/{module}.py — ≥1 code stub with typed signatures
src/environment.md — Python version, framework, hardware, dependencies, seeds
trace/exploration_tree.yaml — Research DAG (≥8 nodes, nested YAML)
evidence/README.md — Index table mapping every evidence file to claims
evidence/tables/*.md — ALL result tables (exact cell values, never rounded)
evidence/figures/*.md — ALL quantitative figures (extracted data points)

Evidence-generation rules:

Preserve raw source tables separately from any derived subset views
A file named after a source object (for example table3_...) must match that source object's caption and contents
If only a subset is included, the filename must say derived_, subset_, or equivalent, and the file must state what it was derived from
Do not merge rows from different source tables into one evidence file unless the file is explicitly labeled as a derived comparison

Step 4: Coverage Check Loop (max 3 rounds)

Before running Seal validation, verify that the ARA faithfully covers the source material. Repeat up to 3 rounds; stop early if a round produces no patches.

Each round: re-read the source, identify anything not yet captured or only shallowly captured in the ARA, patch those gaps, then note how many fixes were made. If zero, exit early. Pay particular attention to appendix content and to citations from the paper's References list, which are easy to miss on the first pass.

The coverage loop does not replace validation — it ensures the ARA is semantically complete before structural checks run.

Step 5: Validate

Run ARA Seal Level 1 validation. Perform these checks:

All mandatory dirs exist: logic/, logic/solution/, src/, src/configs/, trace/, evidence/
All mandatory files exist and are non-empty
PAPER.md has YAML frontmatter with title, authors, year
PAPER.md has Layer Index section
claims.md has C01+ blocks with Statement, Status, Falsification criteria, Proof fields
experiments.md has E01+ blocks with Verifies, Setup, Procedure, Expected outcome fields
heuristics.md has H01+ blocks with Rationale, Sensitivity, Bounds fields
concepts.md has ≥5 concept sections
experiments.md has ≥3 experiment plans
exploration_tree.yaml parses as valid YAML with ≥8 nodes, has dead_end and decision types
Claim Proof references (E01, E02...) resolve to experiments.md
Experiment Verifies references (C01, C02...) resolve to claims.md
Heuristic Code ref paths resolve to actual files in src/execution/
Evidence files contain Markdown tables with Source fields
Evidence file names, source labels, and captions agree on the original table/figure identifier
Any file named like a raw source table is a faithful transcription rather than a filtered subset
Claims only cite experiments whose evidence actually contains the compared rows or measurements
Claim wording does not outrun the evidence type (for example, validation tables alone should not be used to claim training-dynamics improvements)
Trace nodes declare support_level: explicit|inferred
Trace nodes with support_level: explicit include source references

Step 6: Fix & Iterate

For each validation failure: 1. Read the failing file 2. Apply targeted edits (prefer Edit over full rewrite to preserve correct content) 3. Re-validate after all fixes

Typically converges in 2-3 rounds.

Step 7: Report

Print a summary:

Artifact location
File count and total size
Validation result (pass/fail with details)
Key statistics: number of claims, experiments, heuristics, concepts, tree nodes, evidence files

Critical Rules

1. Exact numbers: All numerical values copied EXACTLY from source — never round or approximate 2. No hallucination: Never invent claims, results, or heuristics not in the source material 3. Experiments have NO exact numbers: experiments.md contains only directional/relative expected outcomes. Exact numbers go in evidence/ 4. Every claim has proof: Proof field references experiment IDs (E01, E02), not file paths 5. Cross-layer binding: Claims ↔ Experiments ↔ Evidence ↔ Code refs must all resolve 6. Dead ends matter: Include failed approaches, rejected alternatives, ablation findings 7. "Not specified": If information is genuinely unavailable, write "Not specified in paper" — never guess 8. No fake source labels: Never call a derived subset Table N or Figure N unless it faithfully reproduces the original source object 9. No synthetic trace history: Do not invent decisions, dead ends, or experiments that are not explicit in the provided inputs; if a trajectory is inferred, mark it as inferred or omit it 10. Evidence-limited wording: Do not use stronger language than the evidence supports; separate direct observations from interpretation

Reference Files

For detailed schema specifications, load these on demand:

references/ara-schema.md — Complete ARA directory schema with field-level format for every file
references/exploration-tree-spec.md — Detailed exploration tree YAML specification with examples
references/validation-checklist.md — All Seal Level 1 checks (what the validator looks for)

ARA Directory Schema — Complete Field-Level Reference

Directory Structure

PAPER.md                            # Level 1: Root manifest + layer index
logic/
  problem.md                        # Why: observations → gaps → key insight
  claims.md                         # Falsifiable assertions
  concepts.md                       # All key technical terms (one ## per term)
  experiments.md                    # Declarative experiment plans (NOT scripts)
  solution/
    architecture.md                 # System design + component graph
    algorithm.md                    # Math formulation + pseudocode
    constraints.md                  # Boundary conditions + limitations
    heuristics.md                   # Convergence tricks + rationale
  related_work.md                   # Typed dependency graph (RDO)
src/
  configs/
    training.md                     # Training hyperparameters with rationale
    model.md                        # Architecture/model configs
  execution/
    {module}.py                     # Minimal code stubs (core algorithm only)
  environment.md                    # Dependencies, hardware, seeds
trace/
  exploration_tree.yaml             # Research DAG: nested YAML tree with typed nodes
evidence/
  README.md                         # Index mapping every evidence file to claims
  tables/                           # Raw result tables (exact cell values)
  figures/                          # Raw figure data (extracted data points)
rubric/                             # (Only if rubric provided)
  requirements.md                   # Leaf-level rubric requirements mapped to ARA files

Additional files or subdirectories may be created on demand when the source contains content that does not fit the standard layers (for example, appendix-sourced worked examples, prompt templates, or enumerated taxonomies). Place such content in the ARA layer where it best belongs.

Progressive Disclosure (3 Levels)

Level 1 — PAPER.md (~200 tokens): Frontmatter + layer index. Agent reads ONLY this to decide relevance.
Level 2 — Layer files (problem.md, claims.md, experiments.md, evidence/README.md): Loaded on demand.
Level 3 — Detail files (algorithm.md, code stubs, individual evidence tables): Loaded when drilling in.

---

PAPER.md

YAML frontmatter MUST include:

---
title: "{full paper title}"
authors: [{author list}]
year: {year}
venue: "{venue}"
doi: "{DOI or arXiv ID}"
ara_version: "1.0"
domain: "{research domain}"
keywords: [{5-10 keywords}]
claims_summary:
  - "{one-line summary of main claim 1}"
  - "{one-line summary of main claim 2}"
  - "{one-line summary of main claim 3}"
abstract: "{paper abstract}"
---

Body MUST include a Layer Index — a table for each layer listing every file:

# {Paper Title}

## Overview
{1-2 paragraph summary of the contribution}

## Layer Index

### Cognitive Layer (`/logic`)
| File | Description |
|------|-------------|
| [problem.md](logic/problem.md) | Observations → gaps → key insight |
| [claims.md](logic/claims.md) | {N} falsifiable claims (C01–C{NN}) |
| ...

### Physical Layer (`/src`)
| File | Description | Claims |
|------|-------------|--------|
| [execution/{module}.py](src/execution/{module}.py) | {what} | C{NN} |
| ...

### Exploration Graph (`/trace`)
| File | Description |
|------|-------------|
| [exploration_tree.yaml](trace/exploration_tree.yaml) | {N}-node research DAG |

### Evidence (`/evidence`)
| File | Description |
|------|-------------|
| [README.md](evidence/README.md) | Full index of {N} tables + {N} figures |

---

Evidence Naming and Fidelity

The evidence layer has two different object types:

1. Raw source evidence

Faithful transcription of one source table or figure
Must preserve the original source identifier and caption
Example: evidence/tables/table3_imagenet_validation.md

2. Derived subset evidence

Filtered or recomposed view created for a specific claim
Must NOT masquerade as the original source object
Filename should include derived_, subset_, or equivalent
Must declare which raw source object it came from
Example: evidence/tables/derived_from_table3_residual_depth_slice.md

Rule: if a filename includes a source label such as table3 or figure4, it should faithfully represent that exact source object rather than a curated subset.

---

logic/problem.md

# Problem Specification

## Observations

### O{N}: {title}
- **Statement**: {precise empirical fact with numbers}
- **Evidence**: {source — figure, table, measurement, citation}
- **Implication**: {what this means for the problem}

## Gaps

### G{N}: {title}
- **Statement**: {what's missing or broken}
- **Caused by**: {which observations, e.g., O1, O2}
- **Existing attempts**: {what's been tried}
- **Why they fail**: {specific failure mode}

## Key Insight
- **Insight**: {the creative leap, stated precisely}
- **Derived from**: {which observations}
- **Enables**: {what solution approach this unlocks}

## Assumptions
- A1: {assumption}
- A2: {assumption}

---

logic/claims.md

Each claim MUST have ALL fields:

## C{NN}: {Short title}
- **Statement**: {Precise, falsifiable assertion}
- **Status**: {hypothesis|supported|refuted}
- **Falsification criteria**: {What would disprove this}
- **Proof**: [{experiment IDs: E01, E02}]
- **Evidence basis**: {What the cited evidence directly shows}
- **Interpretation**: {Optional broader reading that should not be confused with the raw evidence}
- **Dependencies**: {other claim IDs, if any}
- **Tags**: {comma-separated keywords}

Proof MUST reference experiment IDs from experiments.md. Each proofed experiment should in turn be backed by evidence files whose rows or measurements actually match the claim being asserted. Statement should stay at the strongest level directly supported by the cited evidence. Use Interpretation for broader synthesis.

---

logic/concepts.md

≥5 concepts. One section per concept:

## {Term Name}
- **Notation**: {LaTeX or symbolic notation}
- **Definition**: {Formal definition}
- **Boundary conditions**: {When does this concept apply/not apply}
- **Related concepts**: {other concept names}

---

logic/experiments.md

≥3 experiments. Declarative plans, NOT scripts. NO exact numerical results.

## E{NN}: {Short title}
- **Verifies**: {claim IDs, e.g., C01, C02}
- **Setup**:
  - Model: {model name and size}
  - Hardware: {GPU type, count, memory}
  - Dataset: {dataset name, size, source}
  - System: {system configuration}
- **Procedure**:
  1. {Step 1}
  2. {Step 2}
- **Metrics**: {what to measure, with units}
- **Expected outcome**:
  - {directional/relative ONLY, e.g., "A outperforms B on metric X"}
  - NEVER exact numbers (those go in evidence/)
- **Baselines**: {methods to compare against}
- **Dependencies**: {other experiment IDs, or "none"}

---

logic/solution/architecture.md

Component graph. For each component: name, purpose, inputs, outputs, interactions, key design choices.

logic/solution/algorithm.md

Mathematical formulation (LaTeX)
Pseudocode
Step-by-step explanation
Complexity analysis

logic/solution/constraints.md

Boundary conditions
Assumptions
Known limitations

logic/solution/heuristics.md

Each heuristic MUST have ALL fields:

## H{NN}: {Short description}
- **Rationale**: {Why this trick is needed}
- **Sensitivity**: {low|medium|high}
- **Bounds**: {acceptable range or limits}
- **Code ref**: [{path to src/execution/ file}]
- **Source**: {Section/table in the paper}

---

logic/related_work.md

## RW{NN}: {Author et al., Year}
- **DOI**: {DOI or arXiv ID}
- **Type**: {imports|bounds|baseline|extends|refutes}
- **Delta**:
  - What changed: {specific technical delta}
  - Why: {motivation}
- **Claims affected**: {claim IDs}
- **Adopted elements**: {what was kept}

Works with a specific technical delta get full RW blocks as above. Additional citations from the paper that do not have a technical delta (background, historical, infrastructure, or inline-comparison references) should still be captured more briefly so the ARA preserves the paper's full citation footprint.

---

src/configs/training.md

## {Parameter name}
- **Value**: {exact value}
- **Rationale**: {why this value}
- **Search range**: {if mentioned}
- **Sensitivity**: {low|medium|high}
- **Source**: {section/table}

src/configs/model.md

Same format as training.md for model/architecture configs.

src/execution/{module}.py

Typed function signatures (input/output types, tensor shapes)
Docstrings explaining what each function does
Implementation logic for the NOVEL contribution
NO scaffolding (no argparse, logging, distributed wrappers)
Import only standard libraries + torch/numpy

src/environment.md

# Environment
- **Python**: {version}
- **Framework**: {PyTorch version, etc.}
- **Hardware**: {GPU type, count, memory}
- **Key dependencies**: {list with versions}
- **Random seeds**: {if specified}

---

evidence/tables/{file}.md

Raw source-table transcription:

# Table {N} - {Caption or short description}

**Source**: Table {N} in {paper/report title}
**Caption**: {verbatim or near-verbatim caption}
**Extraction type**: raw_table

| ... | ... |
| --- | --- |
| ... | ... |

Derived subset:

# Derived subset - {Short description}

**Source**: Derived from Table {N} in {paper/report title}
**Caption**: {what part of the source table this subset preserves}
**Extraction type**: derived_subset
**Derived from**: `table{N}_{raw_file_name}.md`

| ... | ... |
| --- | --- |
| ... | ... |

Rules:

Raw source-table files should reproduce the original row set relevant to that table, not a claim-specific slice
If you drop rows, rename the file as a derived subset and declare the parent source
Do not combine rows from multiple source tables while retaining a single original table number in the filename

---

trace/exploration_tree.yaml

Each node should distinguish direct source support from reconstruction:

tree:
  - id: N01
    type: question
    support_level: explicit | inferred
    source_refs: ["Table 2", "§4.1"]   # recommended for explicit nodes
    title: "{...}"
    description: "{...}"

Rules:

support_level: explicit means the node is directly grounded in the provided source material
support_level: inferred means the node is a reconstruction of the paper's logic, not a literal session record
Explicit nodes should include source_refs
Inferred nodes must not be presented as if they were directly observed historical events

---

evidence/README.md

# Evidence Index

## Tables
| File | Source | Claims | Description |
|------|--------|--------|-------------|
| [tables/{name}.md](tables/{name}.md) | Table N, §X.Y | C01, C02 | {one sentence} |

## Figures
| File | Source | Claims | Description |
|------|--------|--------|-------------|
| [figures/{name}.md](figures/{name}.md) | Figure N, §X.Y | C03 | {one sentence} |

evidence/tables/{name}.md

ALL result tables, exact cell values:

# Table N: {Title}
- **Source**: Table N, Section X.Y
- **Caption**: "{caption}"

| Column1 | Column2 | ... |
|---------|---------|-----|
| exact   | values  | ... |

evidence/figures/{name}.md

ALL quantitative figures (not diagrams). Extract data points:

# Figure N: {Title}
- **Source**: Figure N, Section X.Y
- **Caption**: "{caption}"
- **Axes**: X = {label, units}, Y = {label, units}

| X | Y (Series A) | Y (Series B) | ... |
|---|-------------|-------------|-----|
| v | v           | v           | ... |

Mark approximate readings with "≈".

---

Appendix-sourced content

Appendix sections commonly carry worked examples, prompt templates, enumerated taxonomies, annotation schemas, extended analyses, and prescriptive content. Route each into the ARA layer where it best fits, preserving the granularity the source uses (for example, keep per-entry descriptive fields for taxonomies rather than collapsing to names + frequencies). The existing layer conventions above apply; create additional files only when no existing file is a natural home.

---

rubric/requirements.md (Only if rubric provided)

# Rubric Requirements — {paper_id}

**Source**: PaperBench expert-authored reproduction rubric
**Total leaf requirements**: {N}

## {Category Group}

### R{NN}: {Short title}
- **Rubric ID**: {uuid}
- **Category**: {task_category} / {finegrained_task_category}
- **Weight**: {weight}
- **Requirement**: {verbatim from rubric}
- **ARA coverage**: {path to most specific ARA file, or "Not covered"}
- **Key detail**: {exact value from paper, or "Not specified in paper"}

Exploration Tree YAML Specification

The exploration tree is the "git log" for research — a structured, traversable record of every successful branch, failed attempt, and design decision that shaped the final result.

Format

# Exploration Tree — {paper_id}
# Research DAG: nested tree with cross-edges (also_depends_on) forming a DAG.
# Node types: question | experiment | dead_end | decision | pivot

tree:
  - id: N01
    type: question
    support_level: explicit
    source_refs: ["§1", "Table 2"]
    title: "{Central research question}"
    description: "{What question is being investigated}"
    children:

      - id: N02
        type: experiment
        support_level: explicit
        source_refs: ["Figure 4", "Table 2"]
        title: "{What was tried}"
        result: "{What was observed}"
        evidence: [C01, "Figure 3", "§2.2"]
        children:

          - id: N04
            type: decision
            support_level: inferred
            title: "{What was decided}"
            choice: "{The chosen approach}"
            alternatives:
              - "{Alternative 1}"
              - "{Alternative 2}"
            evidence: "{What informed this decision}"
            children:
              # ... deeper nesting

      - id: N03
        type: dead_end
        support_level: inferred
        title: "{What was tried and failed}"
        hypothesis: "{What was expected}"
        failure_mode: "{Why it failed}"
        lesson: "{What was learned; what it led to}"
        # dead_end nodes have NO children — they are leaf nodes

  # For DAG edges (node with multiple parents):
  - id: N10
    type: experiment
    support_level: explicit
    source_refs: ["Table 5"]
    title: "{Convergent experiment}"
    also_depends_on: [N07, N08]  # additional parents beyond nesting
    result: "{What was observed}"
    evidence: [C05]

Node Types

question

The root driver. What is being investigated?

Required fields: description
Children: experiments, decisions, other questions

experiment

An attempt to answer a question or validate a decision.

Required fields: result
Optional fields: evidence (list of claim IDs, figure/table refs, section refs)
Children: decisions, dead_ends, more experiments

dead_end

A failed approach. THE MOST VALUABLE NODE TYPE for downstream agents.

Required fields: hypothesis, failure_mode, lesson
NO children — always a leaf node
Dead ends save agents from rediscovering known failures

decision

A design choice with documented alternatives.

Required fields: choice, alternatives
Optional fields: evidence
Children: experiments that test the decision, further decisions

pivot

A change in research direction.

Required fields: from, to, trigger
Children: the new research direction

Rules

1. Nested YAML: Children appear inline under parent node's children list 2. Valid DAG: No cycles. All also_depends_on IDs must exist in the tree 3. Minimum 8 nodes: Cover the paper's key research trajectory 4. Must include dead_end nodes: At least 1 from ablations or rejected alternatives 5. Must include decision nodes: At least 1 documenting a design choice 6. Every node has: id (N01, N02...), type, title 7. Every node has `support_level`: explicit or inferred 8. Explicit nodes should have `source_refs`: table/figure/section references from the input material 9. `also_depends_on`: Only for DAG convergence (node has multiple parents beyond nesting)

Extraction Strategy

When building from a PDF:

Central questions → root nodes
"We tried X" / "We evaluated Y" → experiment nodes
"We considered X but chose Y because..." → decision nodes with alternatives
Ablation results showing X hurts → dead_end nodes
"We initially pursued X but found..." → pivot nodes
"This approach fails because..." → dead_end nodes

Support-level guidance:

Mark a node explicit only if the paper directly reports it
Mark a node inferred if you are reconstructing a plausible research decision from the narrative structure
Prefer omission over fabricating a highly specific inferred node

When building from experiment logs:

Each experiment run → experiment node
Failed runs → dead_end nodes with actual error messages as failure_mode
Parameter sweeps → decision nodes with sweep results informing the choice
Direction changes → pivot nodes with the triggering observation

ARA Seal Level 1 — Validation Checklist

These are all checks the Seal validator runs. Fix ALL failures before reporting success.

1. Directory Existence

All must exist as directories:

logic/
logic/solution/
src/
src/configs/
trace/
evidence/

2. Mandatory File Existence (non-empty)

All must exist with >10 bytes:

PAPER.md
logic/problem.md
logic/claims.md
logic/concepts.md
logic/experiments.md
logic/solution/architecture.md
logic/solution/algorithm.md
logic/solution/constraints.md
logic/solution/heuristics.md
logic/related_work.md
src/configs/training.md
src/configs/model.md
src/environment.md
trace/exploration_tree.yaml
evidence/README.md

3. PAPER.md Checks

Starts with --- (YAML frontmatter)
Frontmatter is valid YAML mapping
Contains keys: title, authors, year
Body contains "Layer Index" section

4. Field-Level Checks (regex patterns)

logic/claims.md

Has ## C\d+ blocks (at least one claim)
Contains **Statement**
Contains **Status**
Contains **Falsification criteria**
Contains **Proof**
Contains **Evidence basis**
Contains **Interpretation**

logic/problem.md

Has ### O\d+ blocks (observations)
Has ### G\d+ blocks (gaps)
Has Key Insight section (## Key Insight or **Insight**)

logic/experiments.md

Has ## E\d+ blocks (at least 3)
Contains **Verifies**
Contains **Setup**
Contains **Procedure**
Contains **Expected outcome** or **Expected results**

logic/solution/heuristics.md

Has ## H\d+ blocks
Contains **Rationale**
Contains **Sensitivity**
Contains **Bounds**

logic/related_work.md

Has ## RW\d+ blocks
Contains **Type**
Contains **Delta**
Coverage should extend beyond the closest predecessors to reflect the paper's full

citation footprint

logic/concepts.md

Has ## sections (at least 5)
Contains **Definition**

5. Count Checks

logic/concepts.md: ≥5 concept sections (## headers)
logic/experiments.md: ≥3 experiment blocks (## E\d+)
src/execution/: ≥1 .py file
evidence/tables/ or evidence/figures/: ≥1 .md file

5b. Appendix Coverage

When the source has appendices, every appendix section should be traceable to at least one ARA file, with the granularity of the source preserved.

6. Evidence Quality

For each file in evidence/tables/*.md and evidence/figures/*.md:

Must contain a Markdown table (|...|...| pattern)
Must contain **Source** field
If the filename includes table{N} or figure{N}, the **Source** field must reference the same identifier
If the file is a derived subset, it must say so explicitly via **Extraction type**: derived_subset or equivalent
Raw source-table files should not silently omit rows while still presenting themselves as the original table

7. evidence/README.md

Must contain a Markdown table (file index)
Numbered tables and figures from the source (main text and appendices) should be

reflected in the index

8. Exploration Tree (YAML)

Parses as valid YAML
Has top-level tree key
≥8 nodes total (counted recursively through children)
All node types in {question, decision, experiment, dead_end, pivot}
At least 1 dead_end node exists
At least 1 decision node exists
Every node has id and type fields
Every node has support_level in {explicit, inferred}
Type-specific required fields:
question: description
experiment: result
dead_end: hypothesis, failure_mode, lesson
decision: choice, alternatives
pivot: from, to, trigger
All also_depends_on references resolve to existing node IDs
Nodes with support_level: explicit should include source_refs

9. Cross-Layer Binding

Claim Proof → Experiment Resolution

Every E\d+ in a claim's **Proof**: [...] must exist in experiments.md
Proof-linked experiments should have evidence files whose labels and row contents actually match the compared systems or measurements
Claim wording should be auditable against Evidence basis; broader language should be isolated to Interpretation

Experiment Verifies → Claim Resolution

Every C\d+ in an experiment's **Verifies** must exist in claims.md

Heuristic Code Ref → File Resolution

Every src/... path in **Code ref**: [...] must be an existing file

Architecture Components → Code Stubs (fuzzy)

Significant words from ## headings in architecture.md should appear somewhere in src/execution/ code

Tree Evidence → Claims (YAML)

Any C\d+ in a tree node's evidence field must exist in claims.md

Trace Hygiene

Do not add dead_end, decision, or experiment nodes that are unsupported by the provided source material
If a node is reconstructed from partial evidence rather than stated explicitly, it should be marked as inferred or excluded from Seal Level 1 outputs

Related skills

Setup Matt Pocock SkillsScaffold the per-repo configuration that Matt Pocock’s engineering agent skills rely on so they understand the issue tracker, triage labels, and domain documentation la462k185k

Lark Skill MakerQuickly turn any Lark/Feishu OpenAPI call or multi-step workflow into a reusable agent skill with its own SKILL.md.379k15.8k

CavemanSlash token usage by roughly 75% while keeping every technical detail intact when working with Claude Code, Cursor or similar agents.378k92.5k

Lark AppsConnect Claude, Cursor or custom agents directly to Lark (Feishu) for messaging, document automation, approval workflows and enterprise data access.375k

Running Claude Code Via Litellm CopilotRun Claude Code at a fraction of the cost by routing requests through LiteLLM to the GitHub Copilot Chat API.270k72

Codex PetGenerate a complete Codex Pet spritesheet and metadata from one reference image without needing an OpenAI key or Codex Pro.246k8

How it compares

Use as a structural template for research repos, not as a one-shot code generator or generic project scaffold.

FAQ

Who is ara-compiler for?

and small-team AI researchers and agent developers who document hypotheses, experiments, and evidence in a repo agents can navigate.

When should I use ara-compiler?

During Idea when framing a study, during Validate when locking experiment plans, and during Build when aligning docs, stubs, and evidence folders before you ship analysis or write-ups.

Is ara-compiler safe to install?

It is primarily schema and documentation guidance; review the Security Audits panel on this page and treat any bundled stubs like normal code before executing.

AI & Agent Buildingresearchautomation