Healthcare Eval Harness

Name: Healthcare Eval Harness
Author: affaan-m

affaan-m/everything-claude-code

Gate EMR/EHR and clinical-app deployments with automated patient-safety checks before anything reaches production.

Overview

Healthcare Eval Harness is an agent skill most often used in Ship (also Operate monitoring and Ship security) that runs five patient-safety test categories and blocks healthcare deployments when CRITICAL suites fail.

Install

npx skills add https://github.com/affaan-m/everything-claude-code --skill healthcare-eval-harness

What is this skill?

Five ordered test categories: CDSS accuracy, PHI exposure, data integrity, clinical workflow, and integration compliance
Three CRITICAL gates (CDSS, PHI, data integrity) require 100% pass—any single failure blocks deployment
Two HIGH gates (clinical workflow, integration) require 95%+ pass rate before promote
Maps categories to test path patterns (Jest by default; adaptable to Vitest, pytest, PHPUnit)
Explicit triggers: CDSS edits, patient-data schema changes, auth/RBAC changes, and post-merge clinical modules
5 test categories run in order
3 CRITICAL gates require 100% pass rate
2 HIGH gates require 95%+ pass rate

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 3.3k installs on skills.sh; 210k GitHub stars; 3/3 security scanners passed (skills.sh audits).

What problem does it solve?

You are about to deploy healthcare software but lack a repeatable, strict gate that catches CDSS mistakes, PHI leaks, and broken clinical data paths before users are exposed.

Who is it for?

Indie or small-team builders shipping EMR/EHR, CDSS, or patient-facing health apps who need CI blocks tied to clinical risk, not generic lint scores.

Skip if: Non-clinical SaaS with no PHI or CDSS logic, or teams that only want informal manual QA without automated deployment blocks.

When should I use this skill?

Before EMR/EHR deployments; after CDSS, patient-data schema, or auth changes; during healthcare CI setup; after clinical-module merge conflicts.

What do I get? / Deliverables

Your pipeline runs categorized safety suites with documented pass thresholds and refuses promotion when CRITICAL checks fail, giving you a defendable pre-release record.

Ordered safety test run with per-category pass/fail
Deployment block decision when any CRITICAL suite fails
Documented threshold report for HIGH vs CRITICAL gates

Recommended Skills

Agent Browservercel-labs/open-agents

agent-browser is a Vercel Open Agents skill that wraps a CLI for programmatic browser control—ideal when solo builders n…404k installs·5.6k stars

Tddmattpocock/skills

TDD is an agent skill that coaches test-driven development using the red-green-refactor loop for solo and indie builders…214k installs·121k stars

Use My Browserxixu-me/skills

Use My Browser skill forces agents to classify tasks as static-capable or browser-required before choosing tools—staying…198k installs·61 stars

Test Driven Developmentobra/superpowers

Test-Driven Development is an agent skill from obra/superpowers that forces a test-first implementation ritual: write a …118k installs·221k stars

Verification Before Completionobra/superpowers

Verification Before Completion is an agent skill from the Superpowers lineage that blocks premature success claims durin…100k installs·221k stars

Webapp Testinganthropics/skills

webapp-testing is an agent skill for solo builders who need to prove that a local web application actually works—not jus…90.9k installs·148k stars

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

The skill is a pre-deploy verification harness wired into CI/CD—its canonical home is Ship when you prove the build is safe to release. It runs ordered automated test suites and pass thresholds like a release QA gate, not a one-off code style review.

Also useful

ShipSecurity

Also useful

OperateMonitoring & observability

Where it fits

Example use

ShipTesting & QA

Run the full five-category harness in CI after merging CDSS dose-validation changes.

Example use

ShipSecurity

Execute PHI exposure suites after tightening auth or access-control middleware.

Example use

OperateMonitoring & observability

Re-run CRITICAL categories after hotfixing a production clinical workflow regression.

Example use

BuildBackend, data & payments

Add integration-compliance tests while wiring HL7/FHIR or hospital interface adapters.

How it compares

Use as a domain-specific release gate instead of treating generic unit-test green as sufficient for regulated health software.

Common Questions / FAQ

Who is healthcare-eval-harness for?

Solo builders and small teams deploying healthcare applications who must verify CDSS behavior, PHI handling, and clinical data integrity under real CI/CD constraints.

When should I use healthcare-eval-harness?

Before EMR/EHR deployments; after CDSS, schema, or auth changes; when configuring healthcare CI; and during Operate when re-validating clinical modules after production-impacting fixes.

Is healthcare-eval-harness safe to install?

Review the Security Audits panel on this Prism page and inspect the skill source in your repo before granting CI or filesystem access in regulated environments.

SKILL.md

READMESKILL.md - Healthcare Eval Harness

# Healthcare Eval Harness — Patient Safety Verification

Automated verification system for healthcare application deployments. A single CRITICAL failure blocks deployment. Patient safety is non-negotiable.

> **Note:** Examples use Jest as the reference test runner. Adapt commands for your framework (Vitest, pytest, PHPUnit, etc.) — the test categories and pass thresholds are framework-agnostic.

## When to Use

- Before any deployment of EMR/EHR applications
- After modifying CDSS logic (drug interactions, dose validation, scoring)
- After changing database schemas that touch patient data
- After modifying authentication or access control
- During CI/CD pipeline configuration for healthcare apps
- After resolving merge conflicts in clinical modules

## How It Works

The eval harness runs five test categories in order. The first three (CDSS Accuracy, PHI Exposure, Data Integrity) are CRITICAL gates requiring 100% pass rate — a single failure blocks deployment. The remaining two (Clinical Workflow, Integration) are HIGH gates requiring 95%+ pass rate.

Each category maps to a Jest test path pattern. The CI pipeline runs CRITICAL gates with `--bail` (stop on first failure) and enforces coverage thresholds with `--coverage --coverageThreshold`.

### Eval Categories

**1. CDSS Accuracy (CRITICAL — 100% required)**

Tests all clinical decision support logic: drug interaction pairs (both directions), dose validation rules, clinical scoring vs published specs, no false negatives, no silent failures.

```bash
npx jest --testPathPattern='tests/cdss' --bail --ci --coverage
```

**2. PHI Exposure (CRITICAL — 100% required)**

Tests for protected health information leaks: API error responses, console output, URL parameters, browser storage, cross-facility isolation, unauthenticated access, service role key absence.

```bash
npx jest --testPathPattern='tests/security/phi' --bail --ci
```

**3. Data Integrity (CRITICAL — 100% required)**

Tests clinical data safety: locked encounters, audit trail entries, cascade delete protection, concurrent edit handling, no orphaned records.

```bash
npx jest --testPathPattern='tests/data-integrity' --bail --ci
```

**4. Clinical Workflow (HIGH — 95%+ required)**

Tests end-to-end flows: encounter lifecycle, template rendering, medication sets, drug/diagnosis search, prescription PDF, red flag alerts.

```bash
tmp_json=$(mktemp)
npx jest --testPathPattern='tests/clinical' --ci --json --outputFile="$tmp_json" || true
total=$(jq '.numTotalTests // 0' "$tmp_json")
passed=$(jq '.numPassedTests // 0' "$tmp_json")
if [ "$total" -eq 0 ]; then
  echo "No clinical tests found" >&2
  exit 1
fi
rate=$(echo "scale=2; $passed * 100 / $total" | bc)
echo "Clinical pass rate: ${rate}% ($passed/$total)"
```

**5. Integration Compliance (HIGH — 95%+ required)**

Tests external systems: HL7 message parsing (v2.x), FHIR validation, lab result mapping, malformed message handling.

```bash
tmp_json=$(mktemp)
npx jest --testPathPattern='tests/integration' --ci --json --outputFile="$tmp_json" || true
total=$(jq '.numTotalTests // 0' "$tmp_json")
passed=$(jq '.numPassedTests // 0' "$tmp_json")
if [ "$total" -eq 0 ]; then
  echo "No integration tests found" >&2
  exit 1
fi
rate=$(echo "scale=2; $passed * 100 / $total" | bc)
echo "Integration pass rate: ${rate}% ($passed/$total)"
```

### Pass/Fail Matrix

| Category | Threshold | On Failure |
|----------|-----------|------------|
| CDSS Accuracy | 100% | **BLOCK deployment** |
| PHI Exposure | 100% | **BLOCK deployment** |
| Data Integrity | 100% | **BLOCK deployment** |
| Clinical Workflow | 95%+ | WARN, allow with

What is this skill?

Five ordered test categories: CDSS accuracy, PHI exposure, data integrity, clinical workflow, and integration compliance

Three CRITICAL gates (CDSS, PHI, data integrity) require 100% pass—any single failure blocks deployment

Two HIGH gates (clinical workflow, integration) require 95%+ pass rate before promote

Maps categories to test path patterns (Jest by default; adaptable to Vitest, pytest, PHPUnit)

Explicit triggers: CDSS edits, patient-data schema changes, auth/RBAC changes, and post-merge clinical modules

5 test categories run in order

3 CRITICAL gates require 100% pass rate

2 HIGH gates require 95%+ pass rate

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 3.3k installs on skills.sh; 210k GitHub stars; 3/3 security scanners passed (skills.sh audits).

What do I get? / Deliverables

Your pipeline runs categorized safety suites with documented pass thresholds and refuses promotion when CRITICAL checks fail, giving you a defendable pre-release record.

Ordered safety test run with per-category pass/fail

Deployment block decision when any CRITICAL suite fails

Documented threshold report for HIGH vs CRITICAL gates

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

Also useful

OperateMonitoring & observability

Where it fits

Example use

ShipTesting & QA

Run the full five-category harness in CI after merging CDSS dose-validation changes.

Example use

ShipSecurity

Execute PHI exposure suites after tightening auth or access-control middleware.

Example use

OperateMonitoring & observability

Re-run CRITICAL categories after hotfixing a production clinical workflow regression.

Example use

BuildBackend, data & payments

Add integration-compliance tests while wiring HL7/FHIR or hospital interface adapters.

SKILL.md

READMESKILL.md - Healthcare Eval Harness

# Healthcare Eval Harness — Patient Safety Verification

Automated verification system for healthcare application deployments. A single CRITICAL failure blocks deployment. Patient safety is non-negotiable.

> **Note:** Examples use Jest as the reference test runner. Adapt commands for your framework (Vitest, pytest, PHPUnit, etc.) — the test categories and pass thresholds are framework-agnostic.

## When to Use

- Before any deployment of EMR/EHR applications
- After modifying CDSS logic (drug interactions, dose validation, scoring)
- After changing database schemas that touch patient data
- After modifying authentication or access control
- During CI/CD pipeline configuration for healthcare apps
- After resolving merge conflicts in clinical modules

## How It Works

The eval harness runs five test categories in order. The first three (CDSS Accuracy, PHI Exposure, Data Integrity) are CRITICAL gates requiring 100% pass rate — a single failure blocks deployment. The remaining two (Clinical Workflow, Integration) are HIGH gates requiring 95%+ pass rate.

Each category maps to a Jest test path pattern. The CI pipeline runs CRITICAL gates with `--bail` (stop on first failure) and enforces coverage thresholds with `--coverage --coverageThreshold`.

### Eval Categories

**1. CDSS Accuracy (CRITICAL — 100% required)**

Tests all clinical decision support logic: drug interaction pairs (both directions), dose validation rules, clinical scoring vs published specs, no false negatives, no silent failures.

```bash
npx jest --testPathPattern='tests/cdss' --bail --ci --coverage
```

**2. PHI Exposure (CRITICAL — 100% required)**

Tests for protected health information leaks: API error responses, console output, URL parameters, browser storage, cross-facility isolation, unauthenticated access, service role key absence.

```bash
npx jest --testPathPattern='tests/security/phi' --bail --ci
```

**3. Data Integrity (CRITICAL — 100% required)**

Tests clinical data safety: locked encounters, audit trail entries, cascade delete protection, concurrent edit handling, no orphaned records.

```bash
npx jest --testPathPattern='tests/data-integrity' --bail --ci
```

**4. Clinical Workflow (HIGH — 95%+ required)**

Tests end-to-end flows: encounter lifecycle, template rendering, medication sets, drug/diagnosis search, prescription PDF, red flag alerts.

```bash
tmp_json=$(mktemp)
npx jest --testPathPattern='tests/clinical' --ci --json --outputFile="$tmp_json" || true
total=$(jq '.numTotalTests // 0' "$tmp_json")
passed=$(jq '.numPassedTests // 0' "$tmp_json")
if [ "$total" -eq 0 ]; then
  echo "No clinical tests found" >&2
  exit 1
fi
rate=$(echo "scale=2; $passed * 100 / $total" | bc)
echo "Clinical pass rate: ${rate}% ($passed/$total)"
```

**5. Integration Compliance (HIGH — 95%+ required)**

Tests external systems: HL7 message parsing (v2.x), FHIR validation, lab result mapping, malformed message handling.

```bash
tmp_json=$(mktemp)
npx jest --testPathPattern='tests/integration' --ci --json --outputFile="$tmp_json" || true
total=$(jq '.numTotalTests // 0' "$tmp_json")
passed=$(jq '.numPassedTests // 0' "$tmp_json")
if [ "$total" -eq 0 ]; then
  echo "No integration tests found" >&2
  exit 1
fi
rate=$(echo "scale=2; $passed * 100 / $total" | bc)
echo "Integration pass rate: ${rate}% ($passed/$total)"
```

### Pass/Fail Matrix

| Category | Threshold | On Failure |
|----------|-----------|------------|
| CDSS Accuracy | 100% | **BLOCK deployment** |
| PHI Exposure | 100% | **BLOCK deployment** |
| Data Integrity | 100% | **BLOCK deployment** |
| Clinical Workflow | 95%+ | WARN, allow with

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is healthcare-eval-harness for?

When should I use healthcare-eval-harness?

Is healthcare-eval-harness safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is healthcare-eval-harness for?

When should I use healthcare-eval-harness?

Is healthcare-eval-harness safe to install?

SKILL.md