Graduated Implementation

Name: Graduated Implementation
Author: athola

athola/claude-night-market

Ramp coding ambition in controlled increments with explicit advancement gates—evidence on low stakes, unrehearsed explanation on high stakes—so agents do not outrun your understanding.

Overview

graduated-implementation is a journey-wide agent skill that enforces evidence- or explanation-based advancement gates before each wider implementation increment—usable whenever a solo builder needs to throttle agent ambi

Install

npx skills add https://github.com/athola/claude-night-market --skill graduated-implementation

What is this skill?

Advancement gate targets ~85% calibration between blind trust and endless drilling
Stakes matrix: GREEN/YELLOW use Evidence gate; RED/CRITICAL use Explanation gate with novel, unrehearsed human questions
Integrates stakes tier from leyline:risk-classification via IMBUE_STAKES env or `.imbue/stakes` file with RED path heuri
Requires recorded tradeoff entries before widening the next rung
Explicit rejection of cheap-to-fake progress signals (streaks, unchecked yes)
Gate calibration target ~85% band between lax and over-strict advancement
4 stakes tiers in gate table (GREEN/YELLOW vs RED/CRITICAL gate types)
2 gate types: Evidence (low stakes) and Explanation (high stakes)

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 1 installs on skills.sh; 304 GitHub stars; trending (+100% hot-view momentum).

What problem does it solve?

Your agent keeps widening changes after shallow wins, so confidence outruns competence and high-stakes diffs ship without real understanding.

Who is it for?

Builders using Imbue/Leyline-style risk tiers who want agent workflows that scale scope only after tests, tradeoffs, and genuine comprehension checks.

Skip if: Throwaway spikes with no tests, or teams that want maximum agent autonomy without human gate checks on production paths.

When should I use this skill?

Before widening agent implementation scope after an increment, or when hooks read IMBUE_STAKES / risk classification for ramp decisions.

What do I get? / Deliverables

Each wider rung requires a minted gate token—green tests plus tradeoffs on low stakes, or an unrehearsed human explanation on RED/CRITICAL paths—before the next increment proceeds.

Recorded tradeoff entry
Minted advancement gate token (evidence or explanation)
Constrained next increment scope

Recommended Skills

Grill Memattpocock/skills

Grill Me is an agent skill that interviews you relentlessly about a plan or design until you and the agent share the sam…278k installs·121k stars

Grill With Docsmattpocock/skills

Grill With Docs is an agent skill that runs a structured grilling session on your plan: it interviews you relentlessly, …218k installs·121k stars

Brainstormingobra/superpowers

Brainstorming is a journey-wide Superpowers agent skill that turns rough ideas into approved designs through guided conv…209k installs·221k stars

Lark Tasklarksuite/cli

Lark task v2 skill for todos, tasklists, related/my task queries, attachments, and task-agent lifecycle, with guidance o…209k installs·13.7k stars

Lark Workflow Standup Reportlarksuite/cli

Lark workflow skill that pulls agenda events and incomplete tasks, then expects AI to time-convert, detect conflicts, an…208k installs·13.7k stars

Cavemanjuliusbrussee/blueprint

Caveman is a blueprint agent skill that re-encodes SPEC.md and spec-referencing writes into a terse grammar with symboli…197k installs·1k stars

Journey fit

Useful at every journey phase - explore requirements and options before committing to a direction.

Where it fits

Example use

ValidateScope & plan

Hold a prototype expansion until the prior slice’s tradeoffs are recorded and tests stay green on YELLOW stakes.

Example use

BuildProject management & tracking

Block the agent’s wider refactor until the explanation gate passes on a novel question about the last diff under RED classification.

Example use

ShipCode review

Require evidence gate artifacts before merging a follow-on agent batch that touches the same module.

Example use

OperateIteration & experiments

Throttle production-touching fixes so each increment earns a ramp token instead of chaining unchecked agent edits.

How it compares

A process gate skill—not a code generator; complements risk-classification rather than replacing test runners or linters.

Common Questions / FAQ

Who is graduated-implementation for?

graduated-implementation is for solo and indie builders orchestrating agent coding sessions who need formal gates before each larger implementation step, especially under explicit stakes tiers.

When should I use graduated-implementation?

Use it journey-wide before widening agent scope: in Validate when scoping prototypes, in Build when stacking increments, in Ship during review-heavy changes, and in Operate when touching production paths—always after classifying stakes.

Is graduated-implementation safe to install?

Review the Security Audits panel on this Prism page; the skill influences process hooks and environment reads (IMBUE_STAKES) rather than network calls, but gates only work if you enforce them in your agent setup.

Workflow Chain

Requires first: leyline risk classification

SKILL.md

READMESKILL.md - Graduated Implementation

# Advancement Gate

The gate decides whether the agent may ramp the next increment's
ambition a notch. It is the load-bearing part of graduated
implementation: too lax and ambition outruns understanding (blind
trust); too strict and the work never advances (over-drilling). The
design target is the ~85% band.

## The gate by stakes

The stakes tier (from `leyline:risk-classification`) selects which
check must pass before the rung widens.

| Stakes | Gate | What mints the ramp token |
|--------|------|---------------------------|
| Low (GREEN/YELLOW) | Evidence | Prior increment has green tests and a recorded tradeoff entry. |
| High (RED/CRITICAL) | Explanation | The human explains the prior diff unaided, on a novel question, and records a tradeoff entry. |

The high-stakes gate uses an *unrehearsed* question about the actual
change, not a recap the agent fed the human. This is the
sight-reading principle from graded music exams and the reason
medicine rejected "see one, do one, teach one": confidence outran
competence when the test was a rehearsal. A signal that is cheap to
fake (completion, a streak, a yes) will be faked; the gate has to
cost what understanding costs.

The tier comes from `leyline:risk-classification`. The hook reads it
from the `IMBUE_STAKES` environment variable or a one-line
`.imbue/stakes` file (values `GREEN`, `YELLOW`, `RED`, `CRITICAL`);
when neither is set it falls back to a path heuristic (a high-stakes
path is treated as RED). The rung scales with the tier: GREEN and
YELLOW keep the full rung, RED halves it, and CRITICAL quarters it, so
the riskier the change the sooner a demonstration is forced.

## Why the producer cannot self-certify

The agent that wrote the increment may not be the one that grades
readiness to ramp. This is the four-eyes principle, and it is the
universal anti-gaming device across every apprenticeship domain
studied: the guild masterpiece judged by other masters, the
visiting examiner, the medical milestone observed by a supervisor.
A producer grading its own readiness is the automation-bias trap in
miniature. See `imbue:proof-of-work` module
`independent-verification` for the high-stakes verification rule
this builds on.

## The three failure modes the gate guards

1. **Advancing too fast (under-mastery).** A large increment passes
   because the signal was cheap. Guard: the rung only widens by one
   notch per recorded demonstration, and high-stakes paths get a
   halved rung so they force a demonstration sooner.
2. **Never advancing (over-drill, boredom).** The human clears
   every slice trivially but the rung never grows, or a single
   stumble ratchets it down and traps the work (the spaced-
   repetition "ease hell" that gets decks abandoned). Guard: ramp
   faster when the human is clearly above the band; never ratchet
   the rung down on one miss alone.
3. **Gaming the metric (Goodhart).** Optimizing the signal instead
   of the skill: padding tests, memorizing the recap, clicking
   through. Guard: the high-stakes check is a novel question, the
   producer is not the certifier, and the demonstration is recorded
   where it can be audited later.

## How the hook operationalizes the gate

`guard_scope_ramp.py` (PreToolUse on Write, Edit, MultiEdit) holds
each increment to the current rung:

- The rung starts bounded (`RUNG_START`, ~40 added lines) and
  widens by `RAMP_FACTOR` per ramp token, capped at `RUNG_CAP`.
- A ramp token is `IMBUE_RAMP_OK=1` or a `.imbue/ramp-ok` file,
  created only after a demonstration is recorded and consumed on
  use, so one demonstration buys one notch.
- High-stakes paths (auth, migration, payment, infra, crypto) get a
  halved effective rung.
- Shadow mode (default) warns; `VOW_SHADOW_MODE=0` blocks an
  over-rung increment. The hook never blocks on its own state error
  and never crashes the agent.

The hook enforces the *bound and the ramp*. It does not measure
understanding; it requires that a demonstration was recorded before
ambitio

What is this skill?

Advancement gate targets ~85% calibration between blind trust and endless drilling

Stakes matrix: GREEN/YELLOW use Evidence gate; RED/CRITICAL use Explanation gate with novel, unrehearsed human questions

Integrates stakes tier from leyline:risk-classification via IMBUE_STAKES env or `.imbue/stakes` file with RED path heuri

Requires recorded tradeoff entries before widening the next rung

Explicit rejection of cheap-to-fake progress signals (streaks, unchecked yes)

Gate calibration target ~85% band between lax and over-strict advancement

4 stakes tiers in gate table (GREEN/YELLOW vs RED/CRITICAL gate types)

2 gate types: Evidence (low stakes) and Explanation (high stakes)

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 1 installs on skills.sh; 304 GitHub stars; trending (+100% hot-view momentum).

Journey fit

Useful at every journey phase - explore requirements and options before committing to a direction.

Where it fits

Example use

ValidateScope & plan

Hold a prototype expansion until the prior slice’s tradeoffs are recorded and tests stay green on YELLOW stakes.

Example use

BuildProject management & tracking

Block the agent’s wider refactor until the explanation gate passes on a novel question about the last diff under RED classification.

Example use

ShipCode review

Require evidence gate artifacts before merging a follow-on agent batch that touches the same module.

Example use

OperateIteration & experiments

Throttle production-touching fixes so each increment earns a ramp token instead of chaining unchecked agent edits.

SKILL.md

READMESKILL.md - Graduated Implementation

# Advancement Gate

The gate decides whether the agent may ramp the next increment's
ambition a notch. It is the load-bearing part of graduated
implementation: too lax and ambition outruns understanding (blind
trust); too strict and the work never advances (over-drilling). The
design target is the ~85% band.

## The gate by stakes

The stakes tier (from `leyline:risk-classification`) selects which
check must pass before the rung widens.

| Stakes | Gate | What mints the ramp token |
|--------|------|---------------------------|
| Low (GREEN/YELLOW) | Evidence | Prior increment has green tests and a recorded tradeoff entry. |
| High (RED/CRITICAL) | Explanation | The human explains the prior diff unaided, on a novel question, and records a tradeoff entry. |

The high-stakes gate uses an *unrehearsed* question about the actual
change, not a recap the agent fed the human. This is the
sight-reading principle from graded music exams and the reason
medicine rejected "see one, do one, teach one": confidence outran
competence when the test was a rehearsal. A signal that is cheap to
fake (completion, a streak, a yes) will be faked; the gate has to
cost what understanding costs.

The tier comes from `leyline:risk-classification`. The hook reads it
from the `IMBUE_STAKES` environment variable or a one-line
`.imbue/stakes` file (values `GREEN`, `YELLOW`, `RED`, `CRITICAL`);
when neither is set it falls back to a path heuristic (a high-stakes
path is treated as RED). The rung scales with the tier: GREEN and
YELLOW keep the full rung, RED halves it, and CRITICAL quarters it, so
the riskier the change the sooner a demonstration is forced.

## Why the producer cannot self-certify

The agent that wrote the increment may not be the one that grades
readiness to ramp. This is the four-eyes principle, and it is the
universal anti-gaming device across every apprenticeship domain
studied: the guild masterpiece judged by other masters, the
visiting examiner, the medical milestone observed by a supervisor.
A producer grading its own readiness is the automation-bias trap in
miniature. See `imbue:proof-of-work` module
`independent-verification` for the high-stakes verification rule
this builds on.

## The three failure modes the gate guards

1. **Advancing too fast (under-mastery).** A large increment passes
   because the signal was cheap. Guard: the rung only widens by one
   notch per recorded demonstration, and high-stakes paths get a
   halved rung so they force a demonstration sooner.
2. **Never advancing (over-drill, boredom).** The human clears
   every slice trivially but the rung never grows, or a single
   stumble ratchets it down and traps the work (the spaced-
   repetition "ease hell" that gets decks abandoned). Guard: ramp
   faster when the human is clearly above the band; never ratchet
   the rung down on one miss alone.
3. **Gaming the metric (Goodhart).** Optimizing the signal instead
   of the skill: padding tests, memorizing the recap, clicking
   through. Guard: the high-stakes check is a novel question, the
   producer is not the certifier, and the demonstration is recorded
   where it can be audited later.

## How the hook operationalizes the gate

`guard_scope_ramp.py` (PreToolUse on Write, Edit, MultiEdit) holds
each increment to the current rung:

- The rung starts bounded (`RUNG_START`, ~40 added lines) and
  widens by `RAMP_FACTOR` per ramp token, capped at `RUNG_CAP`.
- A ramp token is `IMBUE_RAMP_OK=1` or a `.imbue/ramp-ok` file,
  created only after a demonstration is recorded and consumed on
  use, so one demonstration buys one notch.
- High-stakes paths (auth, migration, payment, infra, crypto) get a
  halved effective rung.
- Shadow mode (default) warns; `VOW_SHADOW_MODE=0` blocks an
  over-rung increment. The hook never blocks on its own state error
  and never crashes the agent.

The hook enforces the *bound and the ramp*. It does not measure
understanding; it requires that a demonstration was recorded before
ambitio

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is graduated-implementation for?

When should I use graduated-implementation?

Is graduated-implementation safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is graduated-implementation for?

When should I use graduated-implementation?

Is graduated-implementation safe to install?

SKILL.md