
Karpathy Guardrails
Load four behavioral rules so Cavekit agents think before coding, avoid scope creep, and make surgical, goal-driven edits on every task.
Overview
Karpathy Guardrails is a journey-wide agent skill that keeps Cavekit agents disciplined—usable whenever a solo builder needs to stop over-engineering and unverified assumptions before committing code.
Install
npx skills add https://github.com/juliusbrussee/cavekit --skill karpathy-guardrailsWhat is this skill?
- Four principles: think before coding, simplicity first, surgical changes, goal-driven execution
- Reviewer enforces guardrails as Pass-1 filter before code-quality review
- Think-before-coding checklist: one-sentence goal, assumption list, verifiable success mapping
- Refusing to produce code is allowed when scope is unknown—flag NEEDS_CONTEXT instead of guessing
- Integrates with revision skill for sharpening vague acceptance criteria (automated-trace subsection)
- 4 guardrail principles
- Reviewer Pass-1 filter before code quality
Adoption & trust: 4 installs on skills.sh; 1k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
Your coding agent adds speculative features, guesses requirements, and expands scope because nothing forces verifiable acceptance criteria upfront.
Who is it for?
Indie developers running Cavekit multi-agent loops who want consistent anti-scope-creep behavior across planning, implementation, and review.
Skip if: Fully approved specs with frozen scope where you only need a single-shot formatter and no behavioral gate—still useful for review, but lighter-touch prompts may suffice.
When should I use this skill?
Trigger phrases: guardrails, karpathy, scope creep, over-engineering, stop adding features, surgical fix; load at start of every Cavekit task.
What do I get? / Deliverables
Every task starts with stated goals, flagged assumptions, minimum necessary changes, and reviewer Pass-1 enforcement—with revision skill handoff when criteria need sharpening.
- Documented goal sentence, assumption list, verifiable success mapping
- Pass-1 compliant change set or explicit NEEDS_CONTEXT flags
Recommended Skills
Journey fit
Useful at every journey phase - explore requirements and options before committing to a direction.
Where it fits
Planner maps each acceptance criterion to a verifiable test before approving a prototype milestone.
Task-builder lists load-bearing assumptions and refuses to code until NEEDS_CONTEXT questions are answered.
Agent limits a UI fix to the reported bug with no new component library “just in case.”
Reviewer Pass-1 rejects a PR that introduces speculative abstraction layers outside acceptance criteria.
Hotfix task stays surgical—one observable behavior change with a mapped regression check.
How it compares
Behavioral contract for agents, not a linter—pairs with the revision skill instead of ad-hoc “keep it simple” chat reminders.
Common Questions / FAQ
Who is karpathy-guardrails for?
Solo and small-team builders using Cavekit task-builder, reviewer, planner, and inspector agents who want Karpathy-style discipline on every task.
When should I use karpathy-guardrails?
At the start of any Cavekit task in Validate when scoping acceptance criteria; in Build before implementation; in Ship during reviewer Pass-1; in Operate when iterating fixes—especially when you say guardrails, scope creep, over-engineering, or surgical fix.
Is karpathy-guardrails safe to install?
It is policy text with no network or shell requirements by itself; confirm combined Cavekit skills follow your policies and review the Security Audits panel on this page before enabling the full pipeline.
Workflow Chain
Then invoke: revision
SKILL.md
READMESKILL.md - Karpathy Guardrails
# Karpathy Guardrails Four rules. Load them into context at the start of every task. The reviewer enforces them as a Pass-1 filter before it looks at code quality. ## 1. Think Before Coding Before the first edit, write down: - **What am I actually building?** One sentence. If you cannot state it, stop. - **What am I assuming?** List every assumption. If any is load-bearing and unverified, flag `NEEDS_CONTEXT` and ask — do not guess. - **What does success look like?** Map each acceptance criterion to a concrete test, check, or observable behaviour. If a criterion is not verifiable, propose a sharpening via the `revision` skill (automated-trace subsection), not a vague attempt. Refusing to produce code is allowed. A task with unknown scope is a spec bug, not a coding task. ## 2. Simplicity First The correct amount of code is the minimum that meets the acceptance criteria. - No speculative features. No abstraction layer "in case we need it." - No new dependencies unless the task requires one and no existing dep fits. - No "while I'm in here" refactors. Surface them as separate kits. - Duplication is not always wrong. Three similar lines usually beat a premature abstraction with two configuration knobs. If the diff is larger than the acceptance criteria seem to demand, explain why in the commit body. If you cannot, trim the diff. ## 3. Surgical Changes Every line in the diff must trace back to an acceptance criterion. Touching code outside the task's owned files is justified only when a requirement forces it. Examples of violations: - Fixing a formatter warning in an unrelated file. - Renaming a helper "to match new convention." - Reordering imports, docstrings, whitespace. - Tightening a type signature the task did not ask about. If you see a real bug in adjacent code, log it to `.cavekit/history/backprop-log.md` as a candidate kit item and keep it out of this task's diff. ## 4. Goal-Driven Execution Transform vague tasks into verifiable success criteria before execution. - A task that cannot be verified is not a task — escalate it. - The verification plan must be concrete: exact commands, exact assertions, exact files to inspect. "Make sure it works" is not a plan. - After implementation, run the verification plan. Report the output. ## Role-specific enforcement - **task-builder** — must produce, alongside code, a Verification Report listing each AC, the verification step, and the observed result. - **reviewer** — must refuse to advance to Pass 2 (code quality) if Pass 1 finds any of: undeclared assumptions, diff lines unjustified by an AC, out-of-scope edits, or unreachable verification steps. - **planner** — must reject kits that contain un-testable ACs. They are spec bugs and block planning. - **inspector** — must flag completed tasks whose verification logs are missing or hand-waved. ## When you are tempted to break a rule You are probably over-confident about a shortcut that will cost more than the delay of asking. Stop and note the tension in the commit body or in the build log so the reviewer can judge.