Expect

Name: Expect
Author: millionco

millionco/expect·MIT

Prove UI and layout changes in a real browser with MCP-driven checks before you mark React or HTML work done.

Overview

Expect is an agent skill most often used in Ship (also Build frontend) that verifies TSX, JSX, CSS, and HTML changes in a real browser via expect MCP tools before you claim the work is finished.

Install

npx skills add https://github.com/millionco/expect --skill expect

What is this skill?

Requires expect MCP tools (open, playwright, screenshot)—not ad-hoc Playwright or Chrome MCP unless the user insists
Blocks completion claims without browser evidence
Strongly prefers a subagent or background shell so fixes and multi-step browser runs stay parallel
Reuses live tabs via browser_tabs list or screenshot before opening a new session
Compounds checks through the playwright tool for repeatable interactions
Skill metadata version 2.4.0
Documents open, playwright, and screenshot MCP tools as the default browser path

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 1.7k installs on skills.sh; 3.5k GitHub stars; 2/3 security scanners passed (skills.sh audits).

What problem does it solve?

You shipped or edited UI code but have no browser-backed proof it actually renders and behaves correctly.

Who is it for?

Solo builders iterating React or static UI who already use expect MCP and want a hard verify-before-done habit.

Skip if: Pure backend or API-only changes with no UI surface, or teams that refuse MCP browser tooling and only want unit tests.

When should I use this skill?

Use when editing .tsx/.jsx/.css/.html, React components, pages, routes, forms, styles, or layouts, or when asked to test, verify, validate, QA, find bugs, check for issues, or fix expect-cli failures.

What do I get? / Deliverables

You get MCP-driven navigation, Playwright runs, and screenshots that justify a completion claim—or clear failures to fix next.

Browser screenshots
Playwright-backed interaction logs
Evidence-backed pass/fail assessment before completion claims

Recommended Skills

Agent Browservercel-labs/open-agents

agent-browser is a Vercel Open Agents skill that wraps a CLI for programmatic browser control—ideal when solo builders n…404k installs·5.6k stars

Tddmattpocock/skills

TDD is an agent skill that coaches test-driven development using the red-green-refactor loop for solo and indie builders…214k installs·121k stars

Use My Browserxixu-me/skills

Use My Browser skill forces agents to classify tasks as static-capable or browser-required before choosing tools—staying…198k installs·61 stars

Test Driven Developmentobra/superpowers

Test-Driven Development is an agent skill from obra/superpowers that forces a test-first implementation ritual: write a …118k installs·221k stars

Verification Before Completionobra/superpowers

Verification Before Completion is an agent skill from the Superpowers lineage that blocks premature success claims durin…100k installs·221k stars

Webapp Testinganthropics/skills

webapp-testing is an agent skill for solo builders who need to prove that a local web application actually works—not jus…90.9k installs·148k stars

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Canonical shelf is Ship because the skill’s gate is browser evidence and QA-style verification, not authoring components. Testing subphase matches verify, validate, QA, bug-hunting, and expect-cli failure fixes called out in the skill triggers.

Also useful

BuildUI/UX & frontend

Where it fits

Example use

BuildUI/UX & frontend

After refactoring a form layout in TSX, run expect screenshots before opening a PR.

Example use

ShipTesting & QA

Replay a multi-step checkout flow with the playwright tool to confirm a regression fix.

Example use

ShipCode review

Attach browser evidence to a self-review so you do not merge on editor-only confidence.

Example use

OperateIteration & experiments

Re-verify a hotfixed CSS rule on production-like local dev with an existing tab refresh.

How it compares

Use as a procedural verify gate on top of normal coding chat—not as a substitute for automated CI test suites.

Common Questions / FAQ

Who is expect for?

Indie and solo builders using Claude Code, Cursor, or Codex on React and HTML stacks who need disciplined in-browser QA before they call a task complete.

When should I use expect?

During Build when editing components, styles, or routes; during Ship when testing, validating, or fixing expect-cli failures; and whenever you are asked to find UI bugs with browser evidence.

Is expect safe to install?

Review the Security Audits panel on this Prism page and treat browser automation plus MCP access as privileged—only enable on projects you trust.

SKILL.md

READMESKILL.md - Expect

# Expect

You verify code changes in a real browser before claiming they work. No browser evidence, no completion claim.

Use the expect MCP tools (`open`, `playwright`, `screenshot`, etc.) for all browser interactions. Do not use raw browser tools (Playwright MCP, chrome tools, etc.) unless the user explicitly asks.

## Subagent Usage

Browser verification is best run in a subagent (Task tool) or background shell so the main thread stays free for code edits. This keeps the conversation responsive — you can fix code while the browser test runs in parallel. Strongly prefer launching a subagent for browser work, especially when the test involves multiple steps or long interactions. If the test is truly trivial (single screenshot check), inline is acceptable.

## Resuming Browser State

Before opening a new browser, check if one is already running. Use `browser_tabs` (action `list`) or the expect `screenshot` tool to see if a session is still active. If a tab is already open at the target URL, reuse it — don't close and reopen. When re-verifying after a code fix, prefer navigating or refreshing the existing session over starting from scratch.

## Compounding

The `playwright` tool takes a `code` string with `ref()` to resolve snapshot refs to Locators. One call can do an entire interaction — fills, clicks, AND data collection. Use that.

**BAD — 5 tool calls:**

```
screenshot (snapshot)
playwright: await ref('e3').fill('Jane')
screenshot (snapshot)                        ← WHY? page didn't change
playwright: await ref('e5').fill('jane@example.com')
playwright: await ref('e7').click()
```

**GOOD — 2 tool calls:**

```
screenshot (snapshot)
playwright (snapshotAfter=true):
  await ref('e3').fill('Jane');
  await ref('e5').fill('jane@example.com');
  await ref('e7').click();
  return { title: await page.title(), url: page.url(), errors: (await page.$$('.error')).length };
```

Use `return` to collect data. Response: `{ result: <value>, resultFile: "<tmp path>", snapshot: { tree, refs, stats } }`. The `resultFile` persists until `close` — read or grep it later. Without a return value, responds `"OK"` (or just the snapshot if `snapshotAfter=true`).

**Re-snapshot only across DOM boundaries.** Fills and hovers don't change page structure — keep using the same refs. Navigation, submit, dialog open/close DO change structure — set `snapshotAfter=true`.

## Writing Instructions

**Bad:** `"Check that the login form renders on http://localhost:5173"`
**Good:** `"Submit the login form empty, with invalid email, with wrong password, and with valid credentials. Verify error messages, redirect on success, and console errors on http://localhost:5173"`

## Before Claiming Completion

1. Verify in a browser with adversarial instructions.
2. Read the full output — check failures, accessibility, performance.
3. If ANY failure: fix the code, re-verify immediately. No asking, no waiting.
4. Repeat until 0 failures, then state the claim with passing evidence.

## Rationalizations

- "I'll run the browser test inline, it's quick" — Probably not. Launch a subagent so you can keep editing code in parallel. Only skip the subagent for a single screenshot sanity check.
- "I'll open a fresh browser to re-test" — Check for an existing session first. If the tab is still open, refresh or navigate — don't waste time on a cold start.
- "I'll make one `playwright` call per action" — No. Whole sequence in one call.
- "I need a snapshot between fills" — No. Fills don't change DOM. Batch them.
- "Let me snapshot to see what changed" — Did the page navigate or submit? No? Use `snapshotAfter=true` on the action that does.

What is this skill?

Requires expect MCP tools (open, playwright, screenshot)—not ad-hoc Playwright or Chrome MCP unless the user insists

Blocks completion claims without browser evidence

Strongly prefers a subagent or background shell so fixes and multi-step browser runs stay parallel

Reuses live tabs via browser_tabs list or screenshot before opening a new session

Compounds checks through the playwright tool for repeatable interactions

Skill metadata version 2.4.0

Documents open, playwright, and screenshot MCP tools as the default browser path

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 1.7k installs on skills.sh; 3.5k GitHub stars; 2/3 security scanners passed (skills.sh audits).

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

BuildUI/UX & frontend

Where it fits

Example use

BuildUI/UX & frontend

After refactoring a form layout in TSX, run expect screenshots before opening a PR.

Example use

ShipTesting & QA

Replay a multi-step checkout flow with the playwright tool to confirm a regression fix.

Example use

ShipCode review

Attach browser evidence to a self-review so you do not merge on editor-only confidence.

Example use

OperateIteration & experiments

Re-verify a hotfixed CSS rule on production-like local dev with an existing tab refresh.

SKILL.md

READMESKILL.md - Expect

# Expect

You verify code changes in a real browser before claiming they work. No browser evidence, no completion claim.

Use the expect MCP tools (`open`, `playwright`, `screenshot`, etc.) for all browser interactions. Do not use raw browser tools (Playwright MCP, chrome tools, etc.) unless the user explicitly asks.

## Subagent Usage

Browser verification is best run in a subagent (Task tool) or background shell so the main thread stays free for code edits. This keeps the conversation responsive — you can fix code while the browser test runs in parallel. Strongly prefer launching a subagent for browser work, especially when the test involves multiple steps or long interactions. If the test is truly trivial (single screenshot check), inline is acceptable.

## Resuming Browser State

Before opening a new browser, check if one is already running. Use `browser_tabs` (action `list`) or the expect `screenshot` tool to see if a session is still active. If a tab is already open at the target URL, reuse it — don't close and reopen. When re-verifying after a code fix, prefer navigating or refreshing the existing session over starting from scratch.

## Compounding

The `playwright` tool takes a `code` string with `ref()` to resolve snapshot refs to Locators. One call can do an entire interaction — fills, clicks, AND data collection. Use that.

**BAD — 5 tool calls:**

```
screenshot (snapshot)
playwright: await ref('e3').fill('Jane')
screenshot (snapshot)                        ← WHY? page didn't change
playwright: await ref('e5').fill('jane@example.com')
playwright: await ref('e7').click()
```

**GOOD — 2 tool calls:**

```
screenshot (snapshot)
playwright (snapshotAfter=true):
  await ref('e3').fill('Jane');
  await ref('e5').fill('jane@example.com');
  await ref('e7').click();
  return { title: await page.title(), url: page.url(), errors: (await page.$$('.error')).length };
```

Use `return` to collect data. Response: `{ result: <value>, resultFile: "<tmp path>", snapshot: { tree, refs, stats } }`. The `resultFile` persists until `close` — read or grep it later. Without a return value, responds `"OK"` (or just the snapshot if `snapshotAfter=true`).

**Re-snapshot only across DOM boundaries.** Fills and hovers don't change page structure — keep using the same refs. Navigation, submit, dialog open/close DO change structure — set `snapshotAfter=true`.

## Writing Instructions

**Bad:** `"Check that the login form renders on http://localhost:5173"`
**Good:** `"Submit the login form empty, with invalid email, with wrong password, and with valid credentials. Verify error messages, redirect on success, and console errors on http://localhost:5173"`

## Before Claiming Completion

1. Verify in a browser with adversarial instructions.
2. Read the full output — check failures, accessibility, performance.
3. If ANY failure: fix the code, re-verify immediately. No asking, no waiting.
4. Repeat until 0 failures, then state the claim with passing evidence.

## Rationalizations

- "I'll run the browser test inline, it's quick" — Probably not. Launch a subagent so you can keep editing code in parallel. Only skip the subagent for a single screenshot sanity check.
- "I'll open a fresh browser to re-test" — Check for an existing session first. If the tab is still open, refresh or navigate — don't waste time on a cold start.
- "I'll make one `playwright` call per action" — No. Whole sequence in one call.
- "I need a snapshot between fills" — No. Fills don't change DOM. Batch them.
- "Let me snapshot to see what changed" — Did the page navigate or submit? No? Use `snapshotAfter=true` on the action that does.

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is expect for?

When should I use expect?

Is expect safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is expect for?

When should I use expect?

Is expect safe to install?

SKILL.md