Hooks Eval

Name: Hooks Eval
Author: athola

athola/claude-night-market

Score Claude Code hook scripts against a security-first 100-point rubric before you enable them in your agent loop.

Overview

hooks-eval is an agent skill most often used in Ship—security (also Build—agent-tooling, Ship—review) that scores Claude Code hooks on a 100-point security-first rubric with explicit quality gates.

Install

npx skills add https://github.com/athola/claude-night-market --skill hooks-eval

What is this skill?

100-point MCDA scoring rubric with documented vector normalization and stakeholder-weight methodology
Security analysis block (30 points) with Critical −15, High −8, Medium −4, and Low −1 per finding
Performance analysis block (25 points) as a dedicated weighted criterion alongside security
Security checklist rows for dynamic eval with user input, command injection, unvalidated paths, and embedded secrets
Aligns with the night-market skills-eval multi-metric evaluation methodology and sensitivity analysis guidance
100-point total scoring system
30-point security analysis weight with per-severity deductions
25-point performance analysis weight

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 1 installs on skills.sh; 304 GitHub stars; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

What problem does it solve?

You wrote a Claude Code hook but have no consistent way to know if it is safe and performant enough to turn on in real sessions.

Who is it for?

Solo and indie builders who add Claude Code hooks and want MCDA-weighted security and performance review before enabling automation.

Skip if: Teams with no Claude Code hook lifecycle, or anyone who only needs generic app pen-testing unrelated to agent hook scripts.

When should I use this skill?

Use when evaluating Claude Code hook scripts for security vulnerabilities, performance risk, and overall quality gates before enabling or merging them.

What do I get? / Deliverables

You get a normalized score, categorized security deductions, and a clear pass/fail against quality gates so you can fix hooks or ship them with justified confidence.

Weighted hook score out of 100 with security and performance breakdown
Checklist-mapped findings with severity and point deductions
Pass/fail recommendation against documented quality gates

Recommended Skills

Improve Codebase Architecturemattpocock/skills

Improve Codebase Architecture is an agent skill that teaches how to deepen a cluster of shallow modules without breaking…226k installs·121k stars

Zoom Outmattpocock/skills

Lightweight meta-prompt skill that tells the agent to zoom out and deliver a domain-aligned overview of modules and call…181k installs·121k stars

Caveman Reviewjuliusbrussee/caveman

Formats code review as single actionable lines: location, problem, fix, with minimal noise.139k installs·70k stars

Requesting Code Reviewobra/superpowers

Requesting Code Review is an agent skill from the Superpowers collection that gives solo and indie builders a copy-ready…119k installs·221k stars

Receiving Code Reviewobra/superpowers

Superpowers methodology for agents receiving code review: prioritize technical correctness over social comfort, verify e…96.2k installs·221k stars

Request Refactor Planmattpocock/skills

request-refactor-plan is a structured agent workflow for solo and small-team maintainers who want refactors filed as act…30.5k installs·121k stars

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Hook evaluation is a pre-ship quality gate: the rubric penalizes injection, secrets, and unsafe paths—the same risks you must clear before automation runs in production sessions. The canonical shelf is security because 30 of 100 points come from a vulnerability checklist with explicit Critical/High/Medium/Low deductions, not from feature completeness.

Also useful

BuildAgent skills & templates

Also useful

ShipCode review

Where it fits

Example use

BuildAgent skills & templates

After drafting a PostToolUse formatter hook, run the rubric to catch unvalidated paths before you commit settings.json.

Example use

ShipSecurity

Block enabling a hook that shells out with user-controlled strings until Critical command-injection deductions are cleared.

Example use

ShipCode review

Attach a scored hook evaluation summary to a PR so reviewers see the 30-point security section outcomes, not just the diff.

Example use

OperateIteration & experiments

Re-run evaluation when you upgrade Claude Code or change hook timeouts after latency regressions in the 25-point performance block.

How it compares

Use a structured hook rubric instead of one-off chat reviews that skip weighted security penalties and repeatable scoring.

Common Questions / FAQ

Who is hooks-eval for?

It is for solo builders and small teams shipping Claude Code agent hooks who need a documented scoring rubric—not a popularity list entry—before hooks touch real tool-use traffic.

When should I use hooks-eval?

Use it in Build—agent-tooling while authoring hooks, in Ship—security before merging hook changes, in Ship—review on PRs, and in Operate—iterate after you change paths, dependencies, or hook events; run it whenever a hook reads user input or runs shell commands.

Is hooks-eval safe to install?

Treat it as evaluation criteria your agent follows locally; review the Security Audits panel on this Prism page for the ingested package risk signals before enabling it in automated workflows.

SKILL.md

READMESKILL.md - Hooks Eval

# Hook Evaluation Criteria

Detailed scoring rubric and quality gates for hook evaluation.

## Mathematical Foundation

This evaluation framework follows Multi-Criteria Decision Analysis (MCDA) best practices:

- **Normalization**: Vector normalization for scale invariance ([full methodology](../../skills-eval/modules/multi-metric-evaluation-methodology.md))
- **Weighting**: Security-first weights with stakeholder validation
- **Aggregation**: Weighted sum with penalty-based security scoring
- **Validation**: Sensitivity analysis on non-security weights

**Documentation**: See [Multi-Metric Evaluation Methodology](../../skills-eval/modules/multi-metric-evaluation-methodology.md) for complete mathematical foundation.

## Scoring System (100 points total)

### Security Analysis (30 points)

**Vulnerability Detection:**
- Critical vulnerabilities: -15 points each
- High-risk issues: -8 points each
- Medium-risk issues: -4 points each
- Low-risk issues: -1 point each

**Security Checklist:**

| Check | Severity | Points Lost |
|-------|----------|-------------|
| Dynamic code evaluation with user input | Critical | -15 |
| Command injection vulnerability | Critical | -15 |
| Unvalidated file path access | High | -8 |
| Secrets/credentials in code | High | -8 |
| Missing input validation | Medium | -4 |
| Overly permissive patterns | Medium | -4 |
| No rate limiting | Low | -1 |
| Verbose error messages exposing internals | Low | -1 |

### Performance Analysis (25 points)

| Metric | Max Points | Criteria |
|--------|------------|----------|
| Execution time efficiency | 10 | PreToolUse <100ms, PostToolUse <200ms |
| Memory usage optimization | 8 | <50MB for simple hooks, <100MB for complex |
| I/O operation efficiency | 4 | Minimal file/network operations |
| Resource cleanup | 3 | Proper cleanup of handles, connections |

**Performance Thresholds:**

```yaml
pre_tool_use:
  excellent: <50ms
  good: <100ms
  acceptable: <200ms
  poor: >200ms

post_tool_use:
  excellent: <100ms
  good: <200ms
  acceptable: <500ms
  poor: >500ms

memory:
  excellent: <25MB
  good: <50MB
  acceptable: <100MB
  poor: >100MB
```

### Compliance Analysis (20 points)

| Aspect | Max Points | Requirements |
|--------|------------|--------------|
| Structure compliance | 8 | Valid JSON/Python, correct schema |
| Documentation completeness | 6 | Purpose, parameters, return values documented |
| Error handling | 4 | All exceptions caught, meaningful messages |
| Best practices | 2 | Follows hook authoring guidelines |

**Structure Requirements:**

- JSON hooks: Valid JSON schema with required fields
- Python hooks: Type hints, async/await patterns
- Matcher patterns: Valid regex, appropriate scope

### Reliability Analysis (15 points)

| Aspect | Max Points | Requirements |
|--------|------------|--------------|
| Error handling robustness | 6 | Graceful handling of all error conditions |
| Timeout management | 4 | Appropriate timeouts configured |
| Idempotency | 3 | Safe to retry without side effects |
| Graceful degradation | 2 | Falls back safely on failure |

**Reliability Checklist:**

- [ ] Hook returns valid response on all code paths
- [ ] Exceptions are caught and handled
- [ ] Timeout is configured appropriately
- [ ] Hook can be called multiple times safely
- [ ] Failure doesn't break agent operation

### Maintainability (10 points)

| Aspect | Max Points | Requirements |
|--------|------------|--------------|
| Code structure | 4 | Clear, modular, single responsibility |
| Documentation clarity | 3 | Purpose and behavior well explained |
| Modularity | 2 | Reusable components, no duplication |
| Test coverage | 1 | Tests exist for key functionality |

## Quality Levels

| Score | Level | Description |
|-------|-------|-------------|
| 91-100 | Excellent | Production-ready, follows all best practices |
| 76-90 | Good | Minor improvements suggested |
| 51-75 | Acceptable | Some issues requiring attention |
| 26-50 | Poor | Significant issues need

What is this skill?

100-point MCDA scoring rubric with documented vector normalization and stakeholder-weight methodology

Security analysis block (30 points) with Critical −15, High −8, Medium −4, and Low −1 per finding

Performance analysis block (25 points) as a dedicated weighted criterion alongside security

Security checklist rows for dynamic eval with user input, command injection, unvalidated paths, and embedded secrets

Aligns with the night-market skills-eval multi-metric evaluation methodology and sensitivity analysis guidance

100-point total scoring system

30-point security analysis weight with per-severity deductions

25-point performance analysis weight

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 1 installs on skills.sh; 304 GitHub stars; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

What do I get? / Deliverables

You get a normalized score, categorized security deductions, and a clear pass/fail against quality gates so you can fix hooks or ship them with justified confidence.

Weighted hook score out of 100 with security and performance breakdown

Checklist-mapped findings with severity and point deductions

Pass/fail recommendation against documented quality gates

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

BuildAgent skills & templates

Also useful

ShipCode review

Where it fits

Example use

BuildAgent skills & templates

After drafting a PostToolUse formatter hook, run the rubric to catch unvalidated paths before you commit settings.json.

Example use

ShipSecurity

Block enabling a hook that shells out with user-controlled strings until Critical command-injection deductions are cleared.

Example use

ShipCode review

Attach a scored hook evaluation summary to a PR so reviewers see the 30-point security section outcomes, not just the diff.

Example use

OperateIteration & experiments

Re-run evaluation when you upgrade Claude Code or change hook timeouts after latency regressions in the 25-point performance block.

SKILL.md

READMESKILL.md - Hooks Eval

# Hook Evaluation Criteria

Detailed scoring rubric and quality gates for hook evaluation.

## Mathematical Foundation

This evaluation framework follows Multi-Criteria Decision Analysis (MCDA) best practices:

- **Normalization**: Vector normalization for scale invariance ([full methodology](../../skills-eval/modules/multi-metric-evaluation-methodology.md))
- **Weighting**: Security-first weights with stakeholder validation
- **Aggregation**: Weighted sum with penalty-based security scoring
- **Validation**: Sensitivity analysis on non-security weights

**Documentation**: See [Multi-Metric Evaluation Methodology](../../skills-eval/modules/multi-metric-evaluation-methodology.md) for complete mathematical foundation.

## Scoring System (100 points total)

### Security Analysis (30 points)

**Vulnerability Detection:**
- Critical vulnerabilities: -15 points each
- High-risk issues: -8 points each
- Medium-risk issues: -4 points each
- Low-risk issues: -1 point each

**Security Checklist:**

| Check | Severity | Points Lost |
|-------|----------|-------------|
| Dynamic code evaluation with user input | Critical | -15 |
| Command injection vulnerability | Critical | -15 |
| Unvalidated file path access | High | -8 |
| Secrets/credentials in code | High | -8 |
| Missing input validation | Medium | -4 |
| Overly permissive patterns | Medium | -4 |
| No rate limiting | Low | -1 |
| Verbose error messages exposing internals | Low | -1 |

### Performance Analysis (25 points)

| Metric | Max Points | Criteria |
|--------|------------|----------|
| Execution time efficiency | 10 | PreToolUse <100ms, PostToolUse <200ms |
| Memory usage optimization | 8 | <50MB for simple hooks, <100MB for complex |
| I/O operation efficiency | 4 | Minimal file/network operations |
| Resource cleanup | 3 | Proper cleanup of handles, connections |

**Performance Thresholds:**

```yaml
pre_tool_use:
  excellent: <50ms
  good: <100ms
  acceptable: <200ms
  poor: >200ms

post_tool_use:
  excellent: <100ms
  good: <200ms
  acceptable: <500ms
  poor: >500ms

memory:
  excellent: <25MB
  good: <50MB
  acceptable: <100MB
  poor: >100MB
```

### Compliance Analysis (20 points)

| Aspect | Max Points | Requirements |
|--------|------------|--------------|
| Structure compliance | 8 | Valid JSON/Python, correct schema |
| Documentation completeness | 6 | Purpose, parameters, return values documented |
| Error handling | 4 | All exceptions caught, meaningful messages |
| Best practices | 2 | Follows hook authoring guidelines |

**Structure Requirements:**

- JSON hooks: Valid JSON schema with required fields
- Python hooks: Type hints, async/await patterns
- Matcher patterns: Valid regex, appropriate scope

### Reliability Analysis (15 points)

| Aspect | Max Points | Requirements |
|--------|------------|--------------|
| Error handling robustness | 6 | Graceful handling of all error conditions |
| Timeout management | 4 | Appropriate timeouts configured |
| Idempotency | 3 | Safe to retry without side effects |
| Graceful degradation | 2 | Falls back safely on failure |

**Reliability Checklist:**

- [ ] Hook returns valid response on all code paths
- [ ] Exceptions are caught and handled
- [ ] Timeout is configured appropriately
- [ ] Hook can be called multiple times safely
- [ ] Failure doesn't break agent operation

### Maintainability (10 points)

| Aspect | Max Points | Requirements |
|--------|------------|--------------|
| Code structure | 4 | Clear, modular, single responsibility |
| Documentation clarity | 3 | Purpose and behavior well explained |
| Modularity | 2 | Reusable components, no duplication |
| Test coverage | 1 | Tests exist for key functionality |

## Quality Levels

| Score | Level | Description |
|-------|-------|-------------|
| 91-100 | Excellent | Production-ready, follows all best practices |
| 76-90 | Good | Minor improvements suggested |
| 51-75 | Acceptable | Some issues requiring attention |
| 26-50 | Poor | Significant issues need

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is hooks-eval for?

When should I use hooks-eval?

Is hooks-eval safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is hooks-eval for?

When should I use hooks-eval?

Is hooks-eval safe to install?

SKILL.md