Test Review

Name: Test Review
Author: athola

athola/claude-night-market

Review agent or plugin test suites for content-assertion depth so skills with JSON, YAML, and behavioral contracts are verified beyond keyword smoke tests.

Overview

Test-review is an agent skill most often used in Ship (also Build docs modules, Operate iterate on skill quality) that scores content assertion depth and flags shallow or brittle skill tests.

Install

npx skills add https://github.com/athola/claude-night-market --skill test-review

What is this skill?

Content Depth scored 1–5 from existence-only checks through cross-plugin validation
Flags gaps when skills ship L1-only tests despite JSON/YAML blocks or version-gated features
Documents content assertion anti-patterns: prose style, exact wording, brittle string matches
Ties to Leyline content-assertion levels and scenario quality assessment extensions
Explicit triggers for anti-pattern, decision-framework, and forbidden-behavior coverage
Content Depth scored on a 1–5 scale across five defined levels
Anti-pattern table with three documented assertion mistakes and better approaches

Compatible agents: Claude Code, Cursor, Codex, Windsurf

Adoption & trust: 1 installs on skills.sh; 304 GitHub stars; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

What problem does it solve?

Your skill tests only grep keywords while SKILL.md encodes schemas, version gates, and behavioral rules that regress silently on edit.

Who is it for?

Skill authors and plugin maintainers reviewing Leyline-style or modular SKILL.md test packs before publishing updates.

Skip if: Application UI E2E testing or production incident response—this targets documentation and skill content test quality only.

When should I use this skill?

During test review when evaluating content assertion quality for skills with JSON/YAML blocks, version-gated features, behavioral guidance, or forbidden behaviors.

What do I get? / Deliverables

You get a structured content-depth score, gap flags, and anti-pattern corrections so test suites assert semantics appropriate to L2–L3+ expectations.

Content Depth score (1–5) with level justification
List of content test gaps and anti-pattern fixes

Recommended Skills

Agent Browservercel-labs/open-agents

agent-browser is a Vercel Open Agents skill that wraps a CLI for programmatic browser control—ideal when solo builders n…404k installs·5.6k stars

Tddmattpocock/skills

TDD is an agent skill that coaches test-driven development using the red-green-refactor loop for solo and indie builders…214k installs·121k stars

Use My Browserxixu-me/skills

Use My Browser skill forces agents to classify tasks as static-capable or browser-required before choosing tools—staying…198k installs·61 stars

Test Driven Developmentobra/superpowers

Test-Driven Development is an agent skill from obra/superpowers that forces a test-first implementation ritual: write a …118k installs·221k stars

Verification Before Completionobra/superpowers

Verification Before Completion is an agent skill from the Superpowers lineage that blocks premature success claims durin…100k installs·221k stars

Webapp Testinganthropics/skills

webapp-testing is an agent skill for solo builders who need to prove that a local web application actually works—not jus…90.9k installs·148k stars

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Canonical shelf is Ship because the skill governs quality gates on tests before release, extending scenario review with explicit content-depth scoring. Testing subphase matches assertion-level rubrics (L1–L3+), anti-pattern tables, and gap flags for version-gated and cross-plugin docs.

Also useful

BuildDocs & content

Also useful

OperateIteration & experiments

Where it fits

Example use

BuildDocs & content

While drafting SKILL.md modules with YAML examples, plan L2 parse assertions before the first publish.

Example use

ShipTesting & QA

Gate a skill release when all tests are still L1 but the skill ships JSON schemas and version-gated features.

Example use

ShipCode review

Pair content-depth scoring with broader scenario review before tagging a marketplace version.

Example use

OperateIteration & experiments

After a doc rewrite, re-score tests to ensure anti-pattern and forbidden-behavior assertions still hold.

How it compares

Use as a content-assertion rubric during test review instead of treating keyword presence as sufficient coverage for agent skills.

Common Questions / FAQ

Who is test-review for?

Developers maintaining agent skills, plugin docs, and automated content tests who need depth scoring beyond file existence and keyword checks.

When should I use test-review?

In Ship before merging skill releases; in Build while authoring modules with JSON/YAML examples; in Operate when iterating test suites after doc edits.

Is test-review safe to install?

Check the Security Audits panel on this Prism page; the skill is review guidance and should not require network if used as documented.

SKILL.md

READMESKILL.md - Test Review

# Content Assertion Quality

Scoring criteria for evaluating content assertion tests during test review. Extends the scenario quality assessment with a Content Depth dimension.

Reference: `leyline:testing-quality-standards/modules/content-assertion-levels.md`

## Content Depth Scoring

Rate content assertion depth on a 1-5 scale:

| Score | Level | Description |
|---|---|---|
| 1 | None | Tests only file existence or line count |
| 2 | L1 | Keyword presence checks (`assert "section" in content`) |
| 3 | L2 | Parses embedded examples, validates schema structure |
| 4 | L3 | Cross-references, anti-patterns, decision framework contracts |
| 5 | L3+ | Cross-plugin validation (version refs checked against other plugins' docs) |

## When to Flag Missing Content Assertions

During test review, flag as a content test gap when:

- A skill has tests but all are L1 (keyword-only) and the skill contains JSON or YAML code blocks
- A skill has version-gated features but no cross-reference validation
- A skill defines behavioral guidance (decision trees, strategies) but no anti-pattern or completeness tests
- A module documents forbidden behaviors but no test asserts their absence

## Content Assertion Anti-Patterns

Avoid these when reviewing content tests:

| Anti-Pattern | Problem | Better Approach |
|---|---|---|
| Testing prose style | Brittle to rewording, overlaps with scribe:slop-detector | Test behavioral semantics |
| Asserting exact wording | Breaks on any edit | Assert concepts (`"version" in content.lower()`) |
| Checking line counts | Not behavioral | Check required sections exist |
| Testing formatting | Not what Claude interprets | Test parseable structure |
| Duplicating slop detection | Already handled by scribe | Focus on correctness, not style |

## Review Checklist Addition

Add this item to the existing Test Quality Checklist when reviewing a plugin that has execution markdown:

```markdown
- [ ] Content assertion depth matches content complexity
      (L1 for simple skills, L2+ for code examples, L3 for behavioral guidance)
```

## Remediation Guidance

When content tests are missing or insufficient:

1. **No content tests at all**: Generate L1 scaffolding using `sanctum:test-updates/modules/generation/content-test-templates.md`
2. **L1 only, has code blocks**: Upgrade to L2 (add JSON/YAML parsing tests)
3. **L2 only, has version gates**: Upgrade to L3 (add cross-reference validation)
4. **L2 only, has behavioral guidance**: Upgrade to L3 (add anti-pattern and completeness tests)


---
parent_skill: pensive:test-review
name: coverage-analysis
description: Coverage measurement and gap identification
category: testing
tags: [coverage, testing, gap-analysis]
load_priority: 2
estimated_tokens: 350
---

# Coverage Analysis

Measure test coverage and identify gaps.

## Coverage Tools by Language

### Rust
```bash
# Using tarpaulin
cargo install cargo-tarpaulin
cargo tarpaulin --out Html --output-dir coverage/

# Using llvm-cov
cargo install cargo-llvm-cov
cargo llvm-cov --html
```

### Python
```bash
# Using pytest-cov
pytest --cov=src --cov-report=html --cov-report=term-missing

# Using coverage.py
coverage run -m pytest
coverage html
coverage report --show-missing
```

### JavaScript/TypeScript
```bash
# Jest
npm test -- --coverage --coverageReporters=html text

# Vitest
vitest --coverage

# Cypress (code coverage plugin)
cypress run --env coverage=true
```

### Go
```bash
# Built-in coverage
go test -cover ./...
go test -coverprofile=coverage.out ./...
go tool cover -html=coverage.out

# Detailed coverage
go test -covermode=count -coverprofile=coverage.out ./...
```

## Coverage Thresholds

| Level | Coverage | Use Case |
|-------|----------|----------|
| Minimum | 60% | Legacy code, initial cleanup |
| Standard | 80% | Normal development |
| High | 90% | Critical systems, libraries |
| detailed | 95%+ | Safety-critical, financial |

## Gap Identification

### Find impacted test files
```bash
# Tests affected by changes

What is this skill?

Content Depth scored 1–5 from existence-only checks through cross-plugin validation

Flags gaps when skills ship L1-only tests despite JSON/YAML blocks or version-gated features

Documents content assertion anti-patterns: prose style, exact wording, brittle string matches

Ties to Leyline content-assertion levels and scenario quality assessment extensions

Explicit triggers for anti-pattern, decision-framework, and forbidden-behavior coverage

Content Depth scored on a 1–5 scale across five defined levels

Anti-pattern table with three documented assertion mistakes and better approaches

Compatible agents: Claude Code, Cursor, Codex, Windsurf

Adoption & trust: 1 installs on skills.sh; 304 GitHub stars; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

Also useful

OperateIteration & experiments

Where it fits

Example use

BuildDocs & content

While drafting SKILL.md modules with YAML examples, plan L2 parse assertions before the first publish.

Example use

ShipTesting & QA

Gate a skill release when all tests are still L1 but the skill ships JSON schemas and version-gated features.

Example use

ShipCode review

Pair content-depth scoring with broader scenario review before tagging a marketplace version.

Example use

OperateIteration & experiments

After a doc rewrite, re-score tests to ensure anti-pattern and forbidden-behavior assertions still hold.

SKILL.md

READMESKILL.md - Test Review

# Content Assertion Quality

Scoring criteria for evaluating content assertion tests during test review. Extends the scenario quality assessment with a Content Depth dimension.

Reference: `leyline:testing-quality-standards/modules/content-assertion-levels.md`

## Content Depth Scoring

Rate content assertion depth on a 1-5 scale:

| Score | Level | Description |
|---|---|---|
| 1 | None | Tests only file existence or line count |
| 2 | L1 | Keyword presence checks (`assert "section" in content`) |
| 3 | L2 | Parses embedded examples, validates schema structure |
| 4 | L3 | Cross-references, anti-patterns, decision framework contracts |
| 5 | L3+ | Cross-plugin validation (version refs checked against other plugins' docs) |

## When to Flag Missing Content Assertions

During test review, flag as a content test gap when:

- A skill has tests but all are L1 (keyword-only) and the skill contains JSON or YAML code blocks
- A skill has version-gated features but no cross-reference validation
- A skill defines behavioral guidance (decision trees, strategies) but no anti-pattern or completeness tests
- A module documents forbidden behaviors but no test asserts their absence

## Content Assertion Anti-Patterns

Avoid these when reviewing content tests:

| Anti-Pattern | Problem | Better Approach |
|---|---|---|
| Testing prose style | Brittle to rewording, overlaps with scribe:slop-detector | Test behavioral semantics |
| Asserting exact wording | Breaks on any edit | Assert concepts (`"version" in content.lower()`) |
| Checking line counts | Not behavioral | Check required sections exist |
| Testing formatting | Not what Claude interprets | Test parseable structure |
| Duplicating slop detection | Already handled by scribe | Focus on correctness, not style |

## Review Checklist Addition

Add this item to the existing Test Quality Checklist when reviewing a plugin that has execution markdown:

```markdown
- [ ] Content assertion depth matches content complexity
      (L1 for simple skills, L2+ for code examples, L3 for behavioral guidance)
```

## Remediation Guidance

When content tests are missing or insufficient:

1. **No content tests at all**: Generate L1 scaffolding using `sanctum:test-updates/modules/generation/content-test-templates.md`
2. **L1 only, has code blocks**: Upgrade to L2 (add JSON/YAML parsing tests)
3. **L2 only, has version gates**: Upgrade to L3 (add cross-reference validation)
4. **L2 only, has behavioral guidance**: Upgrade to L3 (add anti-pattern and completeness tests)


---
parent_skill: pensive:test-review
name: coverage-analysis
description: Coverage measurement and gap identification
category: testing
tags: [coverage, testing, gap-analysis]
load_priority: 2
estimated_tokens: 350
---

# Coverage Analysis

Measure test coverage and identify gaps.

## Coverage Tools by Language

### Rust
```bash
# Using tarpaulin
cargo install cargo-tarpaulin
cargo tarpaulin --out Html --output-dir coverage/

# Using llvm-cov
cargo install cargo-llvm-cov
cargo llvm-cov --html
```

### Python
```bash
# Using pytest-cov
pytest --cov=src --cov-report=html --cov-report=term-missing

# Using coverage.py
coverage run -m pytest
coverage html
coverage report --show-missing
```

### JavaScript/TypeScript
```bash
# Jest
npm test -- --coverage --coverageReporters=html text

# Vitest
vitest --coverage

# Cypress (code coverage plugin)
cypress run --env coverage=true
```

### Go
```bash
# Built-in coverage
go test -cover ./...
go test -coverprofile=coverage.out ./...
go tool cover -html=coverage.out

# Detailed coverage
go test -covermode=count -coverprofile=coverage.out ./...
```

## Coverage Thresholds

| Level | Coverage | Use Case |
|-------|----------|----------|
| Minimum | 60% | Legacy code, initial cleanup |
| Standard | 80% | Normal development |
| High | 90% | Critical systems, libraries |
| detailed | 95%+ | Safety-critical, financial |

## Gap Identification

### Find impacted test files
```bash
# Tests affected by changes

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is test-review for?

When should I use test-review?

Is test-review safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is test-review for?

When should I use test-review?

Is test-review safe to install?

SKILL.md