
Graduated Implementation
Ramp coding ambition in controlled increments with explicit advancement gates—evidence on low stakes, unrehearsed explanation on high stakes—so agents do not outrun your understanding.
Overview
graduated-implementation is a journey-wide agent skill that enforces evidence- or explanation-based advancement gates before each wider implementation increment—usable whenever a solo builder needs to throttle agent ambi
Install
npx skills add https://github.com/athola/claude-night-market --skill graduated-implementationWhat is this skill?
- Advancement gate targets ~85% calibration between blind trust and endless drilling
- Stakes matrix: GREEN/YELLOW use Evidence gate; RED/CRITICAL use Explanation gate with novel, unrehearsed human questions
- Integrates stakes tier from leyline:risk-classification via IMBUE_STAKES env or `.imbue/stakes` file with RED path heuri
- Requires recorded tradeoff entries before widening the next rung
- Explicit rejection of cheap-to-fake progress signals (streaks, unchecked yes)
- Gate calibration target ~85% band between lax and over-strict advancement
- 4 stakes tiers in gate table (GREEN/YELLOW vs RED/CRITICAL gate types)
- 2 gate types: Evidence (low stakes) and Explanation (high stakes)
Adoption & trust: 1 installs on skills.sh; 304 GitHub stars; trending (+100% hot-view momentum).
What problem does it solve?
Your agent keeps widening changes after shallow wins, so confidence outruns competence and high-stakes diffs ship without real understanding.
Who is it for?
Builders using Imbue/Leyline-style risk tiers who want agent workflows that scale scope only after tests, tradeoffs, and genuine comprehension checks.
Skip if: Throwaway spikes with no tests, or teams that want maximum agent autonomy without human gate checks on production paths.
When should I use this skill?
Before widening agent implementation scope after an increment, or when hooks read IMBUE_STAKES / risk classification for ramp decisions.
What do I get? / Deliverables
Each wider rung requires a minted gate token—green tests plus tradeoffs on low stakes, or an unrehearsed human explanation on RED/CRITICAL paths—before the next increment proceeds.
- Recorded tradeoff entry
- Minted advancement gate token (evidence or explanation)
- Constrained next increment scope
Recommended Skills
Journey fit
Useful at every journey phase - explore requirements and options before committing to a direction.
Where it fits
Hold a prototype expansion until the prior slice’s tradeoffs are recorded and tests stay green on YELLOW stakes.
Block the agent’s wider refactor until the explanation gate passes on a novel question about the last diff under RED classification.
Require evidence gate artifacts before merging a follow-on agent batch that touches the same module.
Throttle production-touching fixes so each increment earns a ramp token instead of chaining unchecked agent edits.
How it compares
A process gate skill—not a code generator; complements risk-classification rather than replacing test runners or linters.
Common Questions / FAQ
Who is graduated-implementation for?
graduated-implementation is for solo and indie builders orchestrating agent coding sessions who need formal gates before each larger implementation step, especially under explicit stakes tiers.
When should I use graduated-implementation?
Use it journey-wide before widening agent scope: in Validate when scoping prototypes, in Build when stacking increments, in Ship during review-heavy changes, and in Operate when touching production paths—always after classifying stakes.
Is graduated-implementation safe to install?
Review the Security Audits panel on this Prism page; the skill influences process hooks and environment reads (IMBUE_STAKES) rather than network calls, but gates only work if you enforce them in your agent setup.
Workflow Chain
Requires first: leyline risk classification
SKILL.md
READMESKILL.md - Graduated Implementation
# Advancement Gate The gate decides whether the agent may ramp the next increment's ambition a notch. It is the load-bearing part of graduated implementation: too lax and ambition outruns understanding (blind trust); too strict and the work never advances (over-drilling). The design target is the ~85% band. ## The gate by stakes The stakes tier (from `leyline:risk-classification`) selects which check must pass before the rung widens. | Stakes | Gate | What mints the ramp token | |--------|------|---------------------------| | Low (GREEN/YELLOW) | Evidence | Prior increment has green tests and a recorded tradeoff entry. | | High (RED/CRITICAL) | Explanation | The human explains the prior diff unaided, on a novel question, and records a tradeoff entry. | The high-stakes gate uses an *unrehearsed* question about the actual change, not a recap the agent fed the human. This is the sight-reading principle from graded music exams and the reason medicine rejected "see one, do one, teach one": confidence outran competence when the test was a rehearsal. A signal that is cheap to fake (completion, a streak, a yes) will be faked; the gate has to cost what understanding costs. The tier comes from `leyline:risk-classification`. The hook reads it from the `IMBUE_STAKES` environment variable or a one-line `.imbue/stakes` file (values `GREEN`, `YELLOW`, `RED`, `CRITICAL`); when neither is set it falls back to a path heuristic (a high-stakes path is treated as RED). The rung scales with the tier: GREEN and YELLOW keep the full rung, RED halves it, and CRITICAL quarters it, so the riskier the change the sooner a demonstration is forced. ## Why the producer cannot self-certify The agent that wrote the increment may not be the one that grades readiness to ramp. This is the four-eyes principle, and it is the universal anti-gaming device across every apprenticeship domain studied: the guild masterpiece judged by other masters, the visiting examiner, the medical milestone observed by a supervisor. A producer grading its own readiness is the automation-bias trap in miniature. See `imbue:proof-of-work` module `independent-verification` for the high-stakes verification rule this builds on. ## The three failure modes the gate guards 1. **Advancing too fast (under-mastery).** A large increment passes because the signal was cheap. Guard: the rung only widens by one notch per recorded demonstration, and high-stakes paths get a halved rung so they force a demonstration sooner. 2. **Never advancing (over-drill, boredom).** The human clears every slice trivially but the rung never grows, or a single stumble ratchets it down and traps the work (the spaced- repetition "ease hell" that gets decks abandoned). Guard: ramp faster when the human is clearly above the band; never ratchet the rung down on one miss alone. 3. **Gaming the metric (Goodhart).** Optimizing the signal instead of the skill: padding tests, memorizing the recap, clicking through. Guard: the high-stakes check is a novel question, the producer is not the certifier, and the demonstration is recorded where it can be audited later. ## How the hook operationalizes the gate `guard_scope_ramp.py` (PreToolUse on Write, Edit, MultiEdit) holds each increment to the current rung: - The rung starts bounded (`RUNG_START`, ~40 added lines) and widens by `RAMP_FACTOR` per ramp token, capped at `RUNG_CAP`. - A ramp token is `IMBUE_RAMP_OK=1` or a `.imbue/ramp-ok` file, created only after a demonstration is recorded and consumed on use, so one demonstration buys one notch. - High-stakes paths (auth, migration, payment, infra, crypto) get a halved effective rung. - Shadow mode (default) warns; `VOW_SHADOW_MODE=0` blocks an over-rung increment. The hook never blocks on its own state error and never crashes the agent. The hook enforces the *bound and the ramp*. It does not measure understanding; it requires that a demonstration was recorded before ambitio