
Paper2code
Turn a research paper into a faithful codebase with explicit tags for specified, official-code, and unspecified choices instead of silent guesses.
Overview
Paper2Code is an agent skill most often used in Build (also Validate scope) that implements research papers into code using explicit ambiguity resolution instead of silent guessing.
Install
npx skills add https://github.com/prathamlearnstocode/paper2code --skill paper2codeWhat is this skill?
- Decision tree for ambiguity: explicit paper text, official GitHub, reimplementations, field defaults, or documented stub
- Partial specification protocol for underspecified optimizers and hyperparameters—never guess silently
- Marks provenance with [SPECIFIED], [FROM_OFFICIAL_CODE], and [UNSPECIFIED] in code and REPRODUCTION_NOTES.md
- Stub-first path when no standard exists, with docstrings listing implementation options
- Documented decision tree for explicit, partial, official-code, reimplementation, standard-field, and stub outcomes
Adoption & trust: 541 installs on skills.sh; 1.4k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You want to code a paper but the write-up omits optimizers, shapes, or data steps and ad-hoc agents fill gaps with wrong defaults.
Who is it for?
Solo ML builders reproducing papers, coursework, or baselines who need auditable choices when specs are partial or contradictory.
Skip if: Quick demos that only need a high-level blog summary, or production features with no tie to a published method you intend to reproduce faithfully.
When should I use this skill?
Implementing or reproducing code from a research paper where methods, hyperparameters, or architecture details may be vague or incomplete.
What do I get? / Deliverables
You get a structured codebase with labeled provenance, reproduction notes, and stubs where the literature truly leaves gaps.
- Implementation modules with provenance tags
- REPRODUCTION_NOTES.md for unspecified choices
- Stub files with option docstrings when no standard exists
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Canonical shelf is Build backend because the deliverable is implementation code and stubs for models, training loops, and experiment utilities described in papers. Backend fits ML reproduction work—training scripts, model modules, and data pipelines—rather than marketing or launch surfaces.
Where it fits
Walk the ambiguity decision tree to see whether the paper is implementable before promising a demo deadline.
Generate model and training files with [FROM_OFFICIAL_CODE] tags when the authors’ GitHub exists.
Use REPRODUCTION_NOTES.md to design smoke tests only on behaviors the paper or official code actually specifies.
How it compares
Use instead of one-shot “implement this PDF” prompts that hide assumptions—this skill forces a documented decision tree and REPRODUCTION_NOTES.md.
Common Questions / FAQ
Who is paper2code for?
Indie researchers and developers implementing ML or systems papers who need transparent handling of missing hyperparameters and ambiguous methods text.
When should I use paper2code?
During Validate scope when scoping what a paper actually specifies, and during Build backend when generating modules, training scripts, and stubs with [SPECIFIED] versus [UNSPECIFIED] labels.
Is paper2code safe to install?
It may pull logic from cited official repos; review the Security Audits panel on this Prism page and verify third-party code licenses before merging into your product.
SKILL.md
READMESKILL.md - Paper2code
# Guardrail: Handling Badly Written Papers ## Reality check Many papers are vague, inconsistent, or incomplete. This is not a criticism — writing a paper is hard, page limits are strict, and reviewers rarely check hyperparameter tables. But it means you will regularly encounter papers where the text alone is insufficient to produce a correct implementation. This file tells you what to do when that happens. The answer is never "guess silently." --- ## Decision tree for resolving ambiguity ``` Is the detail stated explicitly in the paper? ├── YES → Use it. Cite the section. Done. ├── PARTIALLY → Follow the partial specification protocol (below) └── NO → Is there official code on GitHub? ├── YES → Use it as ground truth. Cite the file and line number. │ Mark as [FROM_OFFICIAL_CODE], not [SPECIFIED]. └── NO → Is there a well-known reimplementation? ├── YES → Reference it in REPRODUCTION_NOTES.md. │ DO NOT blindly copy — they made their own choices. │ Mark as [UNSPECIFIED] and note the external reference. └── NO → Is there a "standard" choice in the field? ├── YES → Use it. Mark as [UNSPECIFIED] with alternatives. └── NO → Write a stub with a detailed docstring. Explain what should go here and what the options are. ``` --- ## Partial specification protocol When the paper partially specifies something: ### "We use Adam" The paper says an optimizer but not all parameters. ```python # §4.1 — "We use the Adam optimizer" # [PARTIALLY_SPECIFIED] Optimizer stated as Adam, but beta and epsilon values not specified # Using: β₁=0.9, β₂=0.999, ε=1e-8 (PyTorch defaults) # Alternatives: β₁=0.9, β₂=0.98, ε=1e-9 (Transformer default from Vaswani et al.) optimizer = torch.optim.Adam(model.parameters(), lr=config.lr, betas=(0.9, 0.999), eps=1e-8) ``` ### "We use dropout for regularization" The paper mentions dropout but not the rate or placement. ```python # §3.3 — "We use dropout for regularization" # [PARTIALLY_SPECIFIED] Dropout mentioned but rate not specified # Using: 0.1 (common default for transformer models) # Placement: after attention and FFN (not specified — this is our choice) self.dropout = nn.Dropout(0.1) ``` ### "Similar to Transformer [Vaswani et al., 2017]" The paper references another work for a component. ```python # §3.1 — "We use an architecture similar to the Transformer [Vaswani et al., 2017]" # [PARTIALLY_SPECIFIED] "Similar" implies differences exist but none are specified # We implement the standard Transformer from Vaswani et al. unless other sections # describe modifications. Reader should check if the authors intended differences. ``` --- ## When the paper contradicts itself ### Figure vs text Figures are often created early in the paper-writing process and may show an earlier version of the architecture or method. Text is usually more up-to-date. **Rule: Trust the text. Flag the figure discrepancy.** ```python # §3.2 — "We apply layer normalization before each sub-layer" (Pre-LN) # NOTE: Figure 2 shows post-norm placement, inconsistent with this text. # We implement pre-norm as stated in the text. If reproduction fails, # try post-norm (change this to: x = self.norm(x + sublayer(x))) ``` ### Equation vs prose Equations are peer-reviewed more carefully and are more precise. **Rule: Trust the equation. Flag the prose discrepancy.** ### Different numbers in different sections The paper might say "learning rate of 1e-4" in one section and "learning rate of 3e-4" in another. **Rule: If one is in the appendix/hyperparameter table, trust that. If both are in prose, flag both and use the one from the main experimental section (usually the one paired with the best results).** --- ## When the paper is genuinely incomplete Some papers are missing critical details that make reproduction impossible without additional information. Here