
Validation First
Define testable acceptance criteria and a six-gate pipeline so agent output can be automatically verified before you treat work as done.
Install
npx skills add https://github.com/juliusbrussee/cavekit --skill validation-firstWhat is this skill?
- Core rule: if an agent cannot validate a requirement, it will not reliably be met
- Six-gate validation sequence—fail fast on cheaper checks before expensive runs
- Phase gates between Hunt phases plus merge protocol and completion signals
- Acceptance-criteria design patterns for spec requirements, plans, and iterations
- Applies at spec, plan, implementation, and iteration levels in spec-driven workflows
Adoption & trust: 14 installs on skills.sh; 1k GitHub stars; 3/3 security scanners passed (skills.sh audits).
Recommended Skills
Journey fit
Automated gates are the canonical ship concern—proving implementation meets spec before release—while the same rules originate earlier in specs and plans. Ordered validation checkpoints (cheap gates first) map directly to testing and quality assurance in the solo-builder ship phase.
Common Questions / FAQ
Is Validation First safe to install?
skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.
SKILL.md
READMESKILL.md - Validation First
# Validation-First Design ## Core Principle: If an Agent Cannot Validate It, It Will Not Be Met Every spec requirement must include testable acceptance criteria that an agent can automatically verify. This is not optional — it is the foundation that makes SDD work. **Why?** AI agents are non-deterministic. Without automated validation, there is no way to know whether an agent's output is correct. Validation gates turn "the agent generated some code" into "the agent generated code that provably meets the specification." The validation-first rule applies at every level: - **Spec requirements** must have testable acceptance criteria - **Plans** must define which gates verify each task - **Implementation** must pass all applicable gates before being considered complete - **Iterations** must show measurable progress through gates --- ## The Validation Gate Sequence Every implementation must pass through six ordered checkpoints. Each successive gate is more expensive to run, so catching failures early saves significant time. ### Gate 1: Compilation Check **What:** The project compiles/transpiles without errors. ```bash # Generic pattern — substitute your project's build command {BUILD_COMMAND} ``` **Why it matters:** If the code does not build, nothing else can be validated. This is the cheapest possible check. **What it catches:** - Syntax errors - Missing imports/dependencies - Type errors (in typed languages) - Configuration errors **Acceptance criteria pattern:** ```markdown - [ ] `{BUILD_COMMAND}` completes with exit code 0 - [ ] No warnings related to {domain} (warnings in other domains are acceptable) ``` ### Gate 2: Isolated Unit Verification **What:** Unit tests pass on all changed files. ```bash # Generic pattern {TEST_COMMAND} # Or targeted at changed files {TEST_COMMAND} --filter {changed-files} ``` **Why it matters:** Unit tests verify individual functions and modules in isolation. They are fast, deterministic, and catch logic errors. **What it catches:** - Incorrect function behavior - Edge cases not handled - Regression from changes to existing code - Contract violations (wrong return types, missing fields) **Acceptance criteria pattern:** ```markdown - [ ] All existing unit tests pass - [ ] New unit tests cover all acceptance criteria for R{N} - [ ] No test relies on external services or network access ``` ### Gate 3: Cross-Component Integration **What:** End-to-end and integration tests verify that components work together. ```bash # Generic pattern {TEST_COMMAND} --e2e # Or with a specific test runner {E2E_TEST_COMMAND} ``` **Why it matters:** Unit tests verify components in isolation. Integration tests verify they work together. Many bugs only appear at integration boundaries. **What it catches:** - API contract mismatches between components - Data flow errors across module boundaries - Authentication/authorization integration issues - Database query errors with real (or realistic) data **Acceptance criteria pattern:** ```markdown - [ ] User can complete {workflow} end-to-end - [ ] API endpoint returns correct response for {scenario} - [ ] Error propagation works correctly from {source} to {destination} ``` ### Gate 4: Resource and Speed Benchmarks **What:** Performance benchmarks pass defined thresholds. ```bash # Generic pattern {BENCHMARK_COMMAND} # Or specific checks {TEST_COMMAND} --performance ``` **Why it matters:** Functional correctness is necessary but not sufficient. Performance regression can make a feature unusable even if it p