
Cavekit Validation First
Define testable acceptance criteria and a six-gate validation pipeline so agent-generated work in Cavekit/SDD can be proven in CI instead of guessed.
Install
npx skills add https://github.com/juliusbrussee/caveman-code --skill cavekit-validation-firstWhat is this skill?
- Core rule: if an agent cannot automatically validate a requirement, treat it as unmet
- Six-gate ordered validation pipeline from cheap checks through heavier CI-style gates
- Phase gates, merge protocol, and completion signals for spec-driven (SDD) workflows
- Acceptance-criteria patterns for specs, plans, implementation, and iterations
- Applies at spec, plan, task, implementation, and iteration levels with measurable gate progress
Adoption & trust: 20 installs on skills.sh; 390 GitHub stars.
Recommended Skills
Agent Browservercel-labs/open-agents
Tddmattpocock/skills
Use My Browserxixu-me/skills
Test Driven Developmentobra/superpowers
Verification Before Completionobra/superpowers
Webapp Testinganthropics/skills
Journey fit
Primary fit
Validation-first design belongs on the Validate shelf because verifiable acceptance criteria are defined before implementation commits the team to uncheckable output. Scope is where requirements and pass/fail signals are nailed down; this skill turns vague kit requirements into criteria an agent can run against.
SKILL.md
READMESKILL.md - Cavekit Validation First
# Validation-First Design ## Core Principle: If an Agent Cannot Validate It, It Will Not Be Met Every spec requirement must include testable acceptance criteria that an agent can automatically verify. This is not optional — it is the foundation that makes SDD work. **Why?** AI agents are non-deterministic. Without automated validation, there is no way to know whether an agent's output is correct. Validation gates turn "the agent generated some code" into "the agent generated code that provably meets the specification." The validation-first rule applies at every level: - **Spec requirements** must have testable acceptance criteria - **Plans** must define which gates verify each task - **Implementation** must pass all applicable gates before being considered complete - **Iterations** must show measurable progress through gates --- ## The Validation Gate Sequence Every implementation must pass through six ordered checkpoints. Each successive gate is more expensive to run, so catching failures early saves significant time. ### Gate 1: Compilation Check **What:** The project compiles/transpiles without errors. ```bash # Generic pattern — substitute your project's build command {BUILD_COMMAND} ``` **Why it matters:** If the code does not build, nothing else can be validated. This is the cheapest possible check. **What it catches:** - Syntax errors - Missing imports/dependencies - Type errors (in typed languages) - Configuration errors **Acceptance criteria pattern:** ```markdown - [ ] `{BUILD_COMMAND}` completes with exit code 0 - [ ] No warnings related to {domain} (warnings in other domains are acceptable) ``` ### Gate 2: Isolated Unit Verification **What:** Unit tests pass on all changed files. ```bash # Generic pattern {TEST_COMMAND} # Or targeted at changed files {TEST_COMMAND} --filter {changed-files} ``` **Why it matters:** Unit tests verify individual functions and modules in isolation. They are fast, deterministic, and catch logic errors. **What it catches:** - Incorrect function behavior - Edge cases not handled - Regression from changes to existing code - Contract violations (wrong return types, missing fields) **Acceptance criteria pattern:** ```markdown - [ ] All existing unit tests pass - [ ] New unit tests cover all acceptance criteria for R{N} - [ ] No test relies on external services or network access ``` ### Gate 3: Cross-Component Integration **What:** End-to-end and integration tests verify that components work together. ```bash # Generic pattern {TEST_COMMAND} --e2e # Or with a specific test runner {E2E_TEST_COMMAND} ``` **Why it matters:** Unit tests verify components in isolation. Integration tests verify they work together. Many bugs only appear at integration boundaries. **What it catches:** - API contract mismatches between components - Data flow errors across module boundaries - Authentication/authorization integration issues - Database query errors with real (or realistic) data **Acceptance criteria pattern:** ```markdown - [ ] User can complete {workflow} end-to-end - [ ] API endpoint returns correct response for {scenario} - [ ] Error propagation works correctly from {source} to {destination} ``` ### Gate 4: Resource and Speed Benchmarks **What:** Performance benchmarks pass defined thresholds. ```bash # Generic pattern {BENCHMARK_COMMAND} # Or specific checks {TEST_COMMAND} --performance ``` **Why it matters:** Functional correctness is necessary but not sufficient. Performance regression can make a feature unusable even if it produces correct out