
Test Review
Review agent or plugin test suites for content-assertion depth so skills with JSON, YAML, and behavioral contracts are verified beyond keyword smoke tests.
Overview
Test-review is an agent skill most often used in Ship (also Build docs modules, Operate iterate on skill quality) that scores content assertion depth and flags shallow or brittle skill tests.
Install
npx skills add https://github.com/athola/claude-night-market --skill test-reviewWhat is this skill?
- Content Depth scored 1–5 from existence-only checks through cross-plugin validation
- Flags gaps when skills ship L1-only tests despite JSON/YAML blocks or version-gated features
- Documents content assertion anti-patterns: prose style, exact wording, brittle string matches
- Ties to Leyline content-assertion levels and scenario quality assessment extensions
- Explicit triggers for anti-pattern, decision-framework, and forbidden-behavior coverage
- Content Depth scored on a 1–5 scale across five defined levels
- Anti-pattern table with three documented assertion mistakes and better approaches
Adoption & trust: 1 installs on skills.sh; 304 GitHub stars; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).
What problem does it solve?
Your skill tests only grep keywords while SKILL.md encodes schemas, version gates, and behavioral rules that regress silently on edit.
Who is it for?
Skill authors and plugin maintainers reviewing Leyline-style or modular SKILL.md test packs before publishing updates.
Skip if: Application UI E2E testing or production incident response—this targets documentation and skill content test quality only.
When should I use this skill?
During test review when evaluating content assertion quality for skills with JSON/YAML blocks, version-gated features, behavioral guidance, or forbidden behaviors.
What do I get? / Deliverables
You get a structured content-depth score, gap flags, and anti-pattern corrections so test suites assert semantics appropriate to L2–L3+ expectations.
- Content Depth score (1–5) with level justification
- List of content test gaps and anti-pattern fixes
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Canonical shelf is Ship because the skill governs quality gates on tests before release, extending scenario review with explicit content-depth scoring. Testing subphase matches assertion-level rubrics (L1–L3+), anti-pattern tables, and gap flags for version-gated and cross-plugin docs.
Where it fits
While drafting SKILL.md modules with YAML examples, plan L2 parse assertions before the first publish.
Gate a skill release when all tests are still L1 but the skill ships JSON schemas and version-gated features.
Pair content-depth scoring with broader scenario review before tagging a marketplace version.
After a doc rewrite, re-score tests to ensure anti-pattern and forbidden-behavior assertions still hold.
How it compares
Use as a content-assertion rubric during test review instead of treating keyword presence as sufficient coverage for agent skills.
Common Questions / FAQ
Who is test-review for?
Developers maintaining agent skills, plugin docs, and automated content tests who need depth scoring beyond file existence and keyword checks.
When should I use test-review?
In Ship before merging skill releases; in Build while authoring modules with JSON/YAML examples; in Operate when iterating test suites after doc edits.
Is test-review safe to install?
Check the Security Audits panel on this Prism page; the skill is review guidance and should not require network if used as documented.
SKILL.md
READMESKILL.md - Test Review
# Content Assertion Quality Scoring criteria for evaluating content assertion tests during test review. Extends the scenario quality assessment with a Content Depth dimension. Reference: `leyline:testing-quality-standards/modules/content-assertion-levels.md` ## Content Depth Scoring Rate content assertion depth on a 1-5 scale: | Score | Level | Description | |---|---|---| | 1 | None | Tests only file existence or line count | | 2 | L1 | Keyword presence checks (`assert "section" in content`) | | 3 | L2 | Parses embedded examples, validates schema structure | | 4 | L3 | Cross-references, anti-patterns, decision framework contracts | | 5 | L3+ | Cross-plugin validation (version refs checked against other plugins' docs) | ## When to Flag Missing Content Assertions During test review, flag as a content test gap when: - A skill has tests but all are L1 (keyword-only) and the skill contains JSON or YAML code blocks - A skill has version-gated features but no cross-reference validation - A skill defines behavioral guidance (decision trees, strategies) but no anti-pattern or completeness tests - A module documents forbidden behaviors but no test asserts their absence ## Content Assertion Anti-Patterns Avoid these when reviewing content tests: | Anti-Pattern | Problem | Better Approach | |---|---|---| | Testing prose style | Brittle to rewording, overlaps with scribe:slop-detector | Test behavioral semantics | | Asserting exact wording | Breaks on any edit | Assert concepts (`"version" in content.lower()`) | | Checking line counts | Not behavioral | Check required sections exist | | Testing formatting | Not what Claude interprets | Test parseable structure | | Duplicating slop detection | Already handled by scribe | Focus on correctness, not style | ## Review Checklist Addition Add this item to the existing Test Quality Checklist when reviewing a plugin that has execution markdown: ```markdown - [ ] Content assertion depth matches content complexity (L1 for simple skills, L2+ for code examples, L3 for behavioral guidance) ``` ## Remediation Guidance When content tests are missing or insufficient: 1. **No content tests at all**: Generate L1 scaffolding using `sanctum:test-updates/modules/generation/content-test-templates.md` 2. **L1 only, has code blocks**: Upgrade to L2 (add JSON/YAML parsing tests) 3. **L2 only, has version gates**: Upgrade to L3 (add cross-reference validation) 4. **L2 only, has behavioral guidance**: Upgrade to L3 (add anti-pattern and completeness tests) --- parent_skill: pensive:test-review name: coverage-analysis description: Coverage measurement and gap identification category: testing tags: [coverage, testing, gap-analysis] load_priority: 2 estimated_tokens: 350 --- # Coverage Analysis Measure test coverage and identify gaps. ## Coverage Tools by Language ### Rust ```bash # Using tarpaulin cargo install cargo-tarpaulin cargo tarpaulin --out Html --output-dir coverage/ # Using llvm-cov cargo install cargo-llvm-cov cargo llvm-cov --html ``` ### Python ```bash # Using pytest-cov pytest --cov=src --cov-report=html --cov-report=term-missing # Using coverage.py coverage run -m pytest coverage html coverage report --show-missing ``` ### JavaScript/TypeScript ```bash # Jest npm test -- --coverage --coverageReporters=html text # Vitest vitest --coverage # Cypress (code coverage plugin) cypress run --env coverage=true ``` ### Go ```bash # Built-in coverage go test -cover ./... go test -coverprofile=coverage.out ./... go tool cover -html=coverage.out # Detailed coverage go test -covermode=count -coverprofile=coverage.out ./... ``` ## Coverage Thresholds | Level | Coverage | Use Case | |-------|----------|----------| | Minimum | 60% | Legacy code, initial cleanup | | Standard | 80% | Normal development | | High | 90% | Critical systems, libraries | | detailed | 95%+ | Safety-critical, financial | ## Gap Identification ### Find impacted test files ```bash # Tests affected by changes