
juliusbrussee/cavekit
26 skills8.5k installs26k starsGitHub
Install
npx skills add https://github.com/juliusbrussee/cavekitSkills in this repo
1CavemanCaveman is an agent skill that encodes SPEC.md and spec-referencing prose into a terse fragment grammar plus symbolic operators, cutting token use roughly seventy-five percent compared with full prose while keeping technical literals untouched. Solo builders shipping with Claude Code or similar agents install it when long specs bloat context during /spec, /build, and /check loops. It applies only to specification surfaces: invariant lines, backprop notes, and spec-adjacent explanations—not application code, error strings, commit messages, or PR bodies. The skill documents explicit preserve-verbatim rules for paths like src/auth/mw.go, env vars, SQL, JSON, YAML, and quoted strings so compression never corrupts machine-readable detail. When you need stakeholder-readable marketing copy or narrative PR descriptions, use normal prose instead.1.7kinstalls2BuildCavekit build is an agent skill for solo builders who already have SPEC.md and want deterministic implementation instead of ad-hoc coding. You invoke it when you say build, implement the spec, build §T.3, build --next, or run the build. Main Claude stays single-threaded: parse task selection, enter native plan mode, cite invariants and interfaces, list files and tests, then execute per task while updating §T status cells in SPEC.md. Each task runs a verification command; success marks the row done and failure triggers the backprop skill so recurring mistakes become new §V invariants before retry. It expects FORMAT.md once for structure and refuses to guess when the spec is absent, which keeps agent runs aligned with an approved plan rather than drifting in chat.1.6kinstalls3Checkcheck is a Cavekit diagnostic skill for solo builders who treat SPEC.md as the contract between intent and implementation. It loads the spec, optionally narrows to invariants (§V), interfaces (§I), or tasks (§T), greps and reads the repo, and emits a structured drift report with severity-style classifications and addresses. Nothing is auto-fixed—the user chooses whether to update the spec or invoke build skills afterward. That makes it ideal during active Build work when tasks flip to done, during Ship review before release, and during Operate iteration when production changes might have outpaced the document. Phrasing like “check drift”, “audit the spec”, or “does the code still match §V” should trigger it. It is intermediate complexity because you need a maintained SPEC.md and comfort reading evidence lines, but the workflow itself is a single read-only pass.1.6kinstalls4BackpropBackprop is the Cavekit skill that encodes the non-obvious difference between fixing a bug once and making recurrence structurally impossible in spec-driven development. When a test fails at build verification, a user reports a bug, a post-mortem lands, or `/check` flags a VIOLATE with a known root cause, the agent traces to file and line, decides whether §V, §I, or §T must change, drafts §B and invariant rows, and mandates a new failing test before the spec edit sticks. Solo builders running Cavekit stacks use it to keep SPEC.md honest instead of accumulating tribal knowledge in closed PRs. It pairs naturally with the spec mutator for writes and with build for re-verification. The workflow is procedural and opinionated: skipping §B is never allowed; other sections are case-by-case. Intermediate complexity because you must read FORMAT.md-shaped specs and think in invariant classes, not single-line patches.1.6kinstalls5SpecSpec is the Cavekit agent skill that owns every write to SPEC.md at the repository root. Solo builders invoke it when they need a spec from scratch, need to distill behavior from an existing codebase, need to amend §G through §B sections, or need to route a bug: prefix into backprop. The skill always defers to FORMAT.md for pipe-table shapes and terse caveman prose. NEW mode compresses a user idea into goal, constraints, external interfaces, numbered invariants, and an ordered §T task table with fresh ids, then shows the entire file for approval before build. DISTILL mode is for brownfield repos missing documentation. BACKPROP and AMEND branch to targeted edits without letting other skills silently fork the spec. It sits at the center of the Cavekit SDD loop: spec defines truth, build executes §T, backprop tightens §V when reality disagrees. Intermediate complexity because you must understand section semantics, not just paste markdown.1.6kinstalls6Design SystemThe design-system skill teaches solo and indie builders how to author and maintain a DESIGN.md document in the nine-section Google Stitch format so AI coding agents share one visual contract. It positions DESIGN.md as the visual equivalent of behavioral kits: it states what the product looks like—palette, typography, spacing, components, and responsive behavior—without prescribing implementation steps that belong in plans or CLAUDE.md. Without this layer, colors and spacing drift across components and sessions; with it, UI-building agents read the same tokens and patterns every time. Coverage includes section structure, token naming, quality bars, how design specs plug into kits and Hunt-phase tasks, revision workflows, and importing design collections. Use it when you are starting a new interface, standardizing an existing app, or onboarding agents to a consistent design language before feature work scales.17installs7Context Architecturecontext-architecture teaches solo builders how to lay out Cavekit-style project knowledge so AI agents read the smallest subgraph that still answers their task. It centers on progressive disclosure: index files act as DAG hubs, edges point to refs (what is), kits (what must be), plans (how), and impl (what was done), and agents enter at the root CLAUDE.md then follow only relevant branches. The skill explains tier dependencies, backward compatibility for older layouts, and how human-auditable docs stay synchronized with agent workflows. Use it when context folders grow unwieldy, agents repeatedly over-read markdown, or you are adopting Cavekit tiers without guessing directory names. It pairs naturally with Cavekit methodology and planning skills when you need structure before more implementation work.16installs8Convergence MonitoringConvergence monitoring is an agent skill for solo builders who run repeated AI edit-test-fix loops on real codebases. It teaches you to read convergence signals—especially how much each iteration still changes, whether improvements are cosmetic only, and whether automated tests are trending toward stable pass rates—so you can stop before you chase a perfect zero-diff or burn budget on a ceiling. The skill also helps diagnose non-convergence when the agent rewrites the same areas, regresses behavior, or spins without measurable forward progress. It is aimed at indie developers shipping with Claude Code, Cursor, Codex, and similar tools on multi-file features, refactors, or long-running remediation tasks where iteration count alone is a poor stop rule. Use it whenever you wonder if the agent is actually converging, when to stop iterating, or whether you are hitting diminishing returns during ship-time verification and later production iteration.16installs9Documentation InversionDocumentation inversion is a journey-wide agent skill that reframes how solo builders document products: instead of maintaining a separate wiki that drifts from the code, you embed hierarchical, cross-referenced guidance in the repository for machines to traverse. The flow runs from source to CLAUDE.md at module boundaries, then to skills and plugins that agents invoke when they need context—so documentation stays current because it lives beside the code agents already read. It assumes agents need explicit entry points and followable references rather than long narrative manuals. Indie developers shipping with Claude Code and similar stacks use it when “docs for agents,” living documentation, or machine-readable docs are the goal—during initial repo setup in Build, but also whenever you onboard a new agent session in Ship review or Operate iterate. This is agent-first documentation: built for programmatic navigation, not passive human browsing.16installs10Methodologymethodology is the master Cavekit skill for solo builders who want AI agents to work from explicit specifications instead of improvising from chat. It states the core rule: go through a Cavekit stage—define kits before implementation—whether you are greenfielding or modernizing legacy code. Kits stay human-legible and structured so agents load only what they need while engineers audit requirements above the code layer. The skill explains the Hunt lifecycle phases, when Cavekit beats ad-hoc prompting, and how the build pipeline mirrors a scientific method from observation to validated delivery. Invoke it at project start, before large refactors, or whenever agents are coding without an approved kit tree. It routes to specialized Cavekit skills such as context-architecture and planning artifacts rather than replacing them.16installs11Ui CraftUI craft is an agent skill for the Build phase that tells solo builders how to implement interfaces that feel exceptional—not just functional. It consolidates design engineering philosophy, accessibility standards, animation timing, spatial layout, typography, color systems, and component-level polish into one implementation-oriented guide. If a DESIGN.md exists at the project root, its tokens and specifications override the skill’s defaults; otherwise the skill supplies sensible baselines. It pairs with a design-system-oriented skill that focuses on writing specs; ui-craft covers execution when you ask to build UI, create components, polish a landing page, or make the frontend beautiful. The emphasis is on micro-decisions users feel unconsciously: easing, shadow weight, spacing rhythm, and accessible patterns. Use it during frontend build work in Claude Code, Cursor, or Codex when narrative design docs exist but you still need concrete styling and interaction guidance.16installs12Brownfield AdoptionBrownfield-adoption guides solo builders who already ship production code and want Cavekit without a big-bang rewrite. The skill walks a structured brownfield path: when layering kits makes sense, how to use the existing implementation as reference material, and how to reverse-engineer durable kits from what the system actually does today. Bootstrap prompt design and validation steps help kits reflect real behavior instead of wishful documentation. It also contrasts deliberate rewrite scenarios so you do not adopt brownfield on a codebase you intend to replace entirely. Once kits exist, future changes can follow the Cavekit lifecycle while development continues. Ideal for indie teams onboarding AI agents to a mature repo, improving traceability on a critical monolith, or piloting Cavekit on a bounded subdomain before wider rollout.15installs13Cavekit WritingCavekit-writing teaches solo and indie builders how to author Cavekit-quality kits that AI coding agents can follow without baking in a single stack choice. The skill centers on a durable split: kits state what must be true and how to verify it, while plans and code remain framework-specific derivatives. You learn to write hierarchical, cross-linked requirements with concrete acceptance criteria, use provided templates, and apply greenfield versus rewrite patterns when the product shape is still moving. Compaction and gap analysis keep large kit sets readable as the repo evolves. The methodology is aimed at builders who want portable specs that survive framework migrations and give agents a single behavioral source of truth. Use it when you are formalizing product behavior for agent-driven implementation, audits, or cross-framework evaluation—not when you only need a one-off implementation recipe in one language.15installs14Peer ReviewPeer Review is a Cavekit methodology skill for solo builders who want cross-model quality gates beyond unit tests. It defines six structured review modes—from diff-focused critique to coverage audits—and explains how to wire a second agent via MCP or CLI so the reviewer is instructed to challenge assumptions and hunt omissions. The core principle rejects hollow “looks good” reviews in favor of adversarial scrutiny of design, threads, and delegated work. It documents iteration loops where the builder addresses findings and the peer re-runs until issues converge or a deciding vote breaks ties. Codex Loop Mode integrates Ralph Loop patterns with Codex as the dedicated reviewer when you already use that toolchain. Invoke on triggers like peer review, second opinion on code, cross-model review, or codex peer reviewer—not when you only need lint autofix. Works across Build (design challenge), Ship (pre-merge review), and Operate (post-incident scrutiny) whenever agent output needs independent eyes.15installs15RevisionRevision is a Cavekit methodology skill for solo builders who generate software through kits, specs, and automated hunt loops. When something breaks in the built artifact, the instinct is to patch src/; this skill forces upstream tracing—which prompt, plan, or kit gap allowed the defect—and updates that source so the next loop run inherits the fix. It documents both batch commit-sweep revision and the six-step backpropagation protocol that can fire automatically on test failure or on demand via trace commands. That matters for one-person teams because unmaintained one-off patches compound into endless rework; kit-level fixes scale with the agent. Use it during Ship when tests fail, during Operate when bugs surface in production, and during Build when you realize the spec pipeline missed a constraint. It pairs with Cavekit prompt-pipeline and planning skills rather than replacing a debugger or linter.15installs16Speculative PipelineSpeculative Pipeline is a Cavekit execution strategy for indie builders running multi-stage agent pipelines where each stage traditionally waits for the prior one to finish completely. Instead, the leader runs immediately and downstream followers start after a delay, consuming whatever partial upstream artifacts exist—often around eighty percent complete—then refine through convergence loops as upstream output solidifies. The skill argues that waiting for perfect handoffs wastes hours and that self-correcting follower iterations recover from incomplete first passes. It pairs timing configuration with explicit re-read behavior so errors from speculative starts diminish over loops. Use when you have chained prompts or agents (research → spec → implementation) and need faster pipeline completion without abandoning quality gates. Less suited to single-step tasks or pipelines where downstream work is unsafe without fully signed-off inputs. Think of it as orchestration policy for agent PM, not a code generator.15installs17Impl Trackingimpl-tracking is a Cavekit skill for solo builders who run many agent sessions on the same codebase and keep losing context between chats. It defines implementation tracking documents as durable memory: what was built, what remains, what failed, and which dead ends were explored. The skill emphasizes that failures are often more valuable than successes because they stop tomorrow’s agent from repeating the same mistake. Coverage includes a full document template, dead-end prevention, cross-iteration continuity, spec compaction when context gets long, and an inter-session feedback protocol so humans and agents share one source of truth. Use it alongside validation-first when you need gates plus a paper trail of attempts. It suits multi-day agent implementations, refactors, and spec-driven hunts—not one-shot questions. Beginners benefit from the template; advanced users customize sections for test health and known issues.14installs18Prompt PipelinePrompt-pipeline teaches solo builders how to engineer the numbered markdown prompts that power Cavekit's Hunt lifecycle. Instead of one giant mega-prompt, you split work so each file drives one phase—reading from refs, specs, or plans and writing to the next artifact layer—keeping agents focused and auditable. Greenfield projects can start with a tight three-prompt chain from reference materials to specs, plans, then source and tests; rewrites expand to six through nine prompts when legacy code and narrower phase gates matter. The skill emphasizes systemic instructions, delegated content, templates, and time guards so SDD stays repeatable when you are the only PM and tech lead. Use it in Validate when scoping how agents will execute a prototype path, and in Build when standing up or refactoring Hunt automation before heavy Implement work. Approved pipeline structure is the handoff into implementation prompts and revision when gaps appear.14installs19Validation FirstValidation-first is a Cavekit methodology skill for solo builders running spec-driven development with AI agents. It states that every requirement must ship with acceptance criteria an agent can verify automatically, because non-deterministic outputs otherwise look done when they are not. The skill documents a six ordered validation gates pipeline, phase gates between Hunt phases, merge protocol, completion signals, and patterns for writing testable criteria at the spec and plan layer—not only at the end of a sprint. Use it when you are designing specs, breaking plans into tasks, or closing an implementation pass and need measurable proof instead of vibes. It pairs naturally with implementation tracking and planning skills in the same ecosystem. Intermediate-to-advanced builders who already write structured specs get the most value; it is less useful if you only want a one-off lint fix with no written requirements.14installs20Peer Review Looppeer-review-loop is an agent skill for solo and indie builders who already use Cavekit kits and want automated rigor beyond a single model polishing its own output. It wires a Ralph Loop where Claude constructs from cavekit specifications and Codex—preferably through codex-review.sh CLI delegation—reviews each tranche adversarially, with MCP-based Codex as a legacy fallback when CLI is unavailable. The workflow covers setup, how to structure iterations, when the loop has converged, and what “done” means when a different training distribution challenges assumptions about security, spec fidelity, and edge cases. It is most valuable during active implementation and pre-release hardening, when drift from the kit or “good enough for one model” endings would be costly. Builders should enter with a cavekit or spec package and tolerance for multiple review cycles; it is not a quick linter pass. Successful runs produce implementation progress that has survived cross-examination, with documented review findings feeding the next Ralph iteration until completion criteria are met.10installs21Caveman InternalCaveman Internal is a journey-wide agent skill that applies a token-compression protocol to machine-to-machine prose inside Cavekit-style loops. Solo builders running long agent chains hit context limits when every handoff, artifact summary, and status block stays verbose. This skill targets summaries under .cavekit/artifacts/, bundles in .cavekit/context-bundles/, non-user review notes, state.md notes fields, loop-log lines, and stop-hook dashboard bodies. It deliberately never compresses source code, git messages, kits, actionable error text, security warnings, or structured severity tables—doing so is documented as a bug with a verbose fallback path. Three intensities (lite, full, ultra) scale with budget pressure so you trade readability for tokens only when needed. It is not the public /caveman command users invoke to shorten chat answers; it is infrastructure for the next agent reader in the pipeline.5installs22Complexity Detectioncomplexity-detection is a journey-wide Cavekit skill that classifies work as quick, standard, or thorough using a fixed five-axis rubric. Each axis scores from zero to four on files touched, change type, judgment required, cross-component blast radius, and novelty, then sums to a twenty-point scale that downstream commands use to size budgets, pick models, and set review depth. Solo builders running /ck:sketch, /ck:map, or /ck:make get consistent depth defaults instead of guessing whether a chore deserves the same rigor as an architectural change. The ck:complexity agent can invoke the same rules with a lightweight model. Invoke it whenever a task feels ambiguous in effort or risk so the rest of the pipeline does not over- or under-invest tokens and verification.5installs23Autonomous LoopAutonomous Loop explains how Cavekit turns a single /ck:make invocation into a bounded, multi-iteration agent run. Solo builders using Claude Code with Cavekit need this mental model before they rely on unattended execution or wonder why a session keeps spinning. The skill walks the architecture from the Stop event through stop-hook.sh into routeDecision and related directives, including completion sentinels, locks, budget enforcement, and the iteration ceiling. It is reference material, not a one-shot task: read it when implementing commands that participate in the loop or when diagnosing stuck loops. The content bridges building new automation and operating flaky sessions. Intermediate complexity assumes you already run Cavekit and can read shell and Node routing code paths.4installs24Capability DiscoveryCapability Discovery is a Cavekit agent skill that stops pipelines from inventing external tools. Before drafting kits that mention GitHub Actions, Supabase, Codex, or Wrangler, it runs a deterministic discover pass and records what is really available. Solo builders using Claude Code with Cavekit get a machine-readable capabilities.json under .cavekit/, covering shell CLIs, MCP server entries, installed plugins, Codex presence, and optional Graphify artifacts. The skill is intentionally narrow: it does not install missing tools; it grounds automation in facts so downstream kit generation and build sites bind to credentials and binaries you actually have. Trigger it when someone asks what is installed, whether you can use X, or during tools-only init—especially at project kickoff and again when the toolchain changes.4installs25Graphify IntegrationGraphify Integration is Cavekit’s optional bridge to a symbol-level knowledge graph stored as NetworkX node-link JSON. Solo builders shipping with agent pipelines install graphifyy, run graphify build, and let architect, researcher, reviewer, and task-builder agents answer “who imports or calls this?” without spamming the codebase with grep. Nothing in Cavekit hard-depends on the graph—missing graph.json returns no-ops and search falls back. That graceful degradation makes it safe for indie repos that only build the graph before large refactors or security-sensitive edits. Confidence tiers on edges help agents weight IMPORTS versus INFERRED links. Use when blast-radius analysis, dependency tracing, or community-partitioned graph slices beat text search for your monorepo or service boundary.4installs26Karpathy GuardrailsKarpathy Guardrails is Cavekit’s mandatory behavioral lens for every agent that writes or judges code. Solo builders invoke it at task start so task-builders, reviewers, planners, and inspectors internalize four rules: articulate what you are building and every assumption, ship the minimum code that meets acceptance criteria, change only what the task requires, and tie execution to observable success tests. The reviewer treats these as Pass-1—scope and discipline failures fail before style nits. Unknown scope becomes a spec bug with explicit NEEDS_CONTEXT rather than silent guesses. Over-engineering, speculative abstractions, and unfocused feature sprawl are anti-patterns the skill is designed to block. Because triggers span guardrails, scope creep, and surgical fix, it belongs on the journey-wide shelf even though Prism files it under Build planning first.4installs