Agent Browser

Browser automation is shelved under Build → integrations because it wires an external runtime (Chrome/Chromium + agent-browser) into your agent workflow. Integrations is the canonical home for CLI tools that connect agents to browsers, CDP, and programmatic page interaction—not a one-off script in chat.

Also useful

Also useful

Where it fits

Example use

Click through a landing-page signup on staging before you commit to the full build.

Example use

Drive OAuth or payment iframes via snapshot refs while wiring a SaaS checkout integration.

Example use

Re-run open → snapshot → submit on a release candidate to catch broken selectors.

Example use

GrowContent & marketing

Capture competitor pricing or changelog pages for a lightweight research pass.

How it compares

Use this CLI-integrated browser driver instead of asking the model to guess DOM selectors without snapshot refs or maintaining a separate Playwright repo for every small check.

Common Questions / FAQ

Who is agent-browser for?

Indie and solo builders using Claude Code, Cursor, Codex, or similar agents who need real browser interaction—forms, auth, screenshots, scraping, and light web-app testing—from the terminal.

When should I use agent-browser?

Use it during Validate when prototyping a clickable path on a URL; during Build when integrating or debugging UI flows; and during Ship when you need repeatable browser checks before release—anytime triggers like “open this site”, “fill this form”, or “test this web app” appear.

Is agent-browser safe to install?

It requires global or local CLI install and shell execution against live sites; review the Security Audits panel on this Prism page and treat credentials, cookies, and production data with your usual secrets hygiene.

SKILL.md

READMESKILL.md - Agent Browser

# Browser Automation with agent-browser

The CLI uses Chrome/Chromium via CDP directly. Install via `npm i -g agent-browser`, `brew install agent-browser`, or `cargo install agent-browser`. Run `agent-browser install` to download Chrome. Run `agent-browser upgrade` to update to the latest version.

## Core Workflow

Every browser automation follows this pattern:

1. **Navigate**: `agent-browser open <url>`
2. **Snapshot**: `agent-browser snapshot -i` (get element refs like `@e1`, `@e2`)
3. **Interact**: Use refs to click, fill, select
4. **Re-snapshot**: After navigation or DOM changes, get fresh refs

```bash
agent-browser open https://example.com/form
agent-browser snapshot -i
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"

agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i  # Check result
```

## Command Chaining

Commands can be chained with `&&` in a single shell invocation. The browser persists between commands via a background daemon, so chaining is safe and more efficient than separate calls.

```bash
# Chain open + wait + snapshot in one call
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i

# Chain multiple interactions
agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "password123" && agent-browser click @e3

# Navigate and capture
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser screenshot page.png
```

**When to chain:** Use `&&` when you don't need to read the output of an intermediate command before proceeding (e.g., open + wait + screenshot). Run commands separately when you need to parse the output first (e.g., snapshot to discover refs, then interact using those refs).

## Handling Authentication

When automating a site that requires login, choose the approach that fits:

**Option 1: Import auth from the user's browser (fastest for one-off tasks)**

```bash
# Connect to the user's running Chrome (they're already logged in)
agent-browser --auto-connect state save ./auth.json
# Use that auth state
agent-browser --state ./auth.json open https://app.example.com/dashboard
```

State files contain session tokens in plaintext -- add to `.gitignore` and delete when no longer needed. Set `AGENT_BROWSER_ENCRYPTION_KEY` for encryption at rest.

**Option 2: Persistent profile (simplest for recurring tasks)**

```bash
# First run: login manually or via automation
agent-browser --profile ~/.myapp open https://app.example.com/login
# ... fill credentials, submit ...

# All future runs: already authenticated
agent-browser --profile ~/.myapp open https://app.example.com/dashboard
```

**Option 3: Session name (auto-save/restore cookies + localStorage)**

```bash
agent-browser --session-name myapp open https://app.example.com/login
# ... login flow ...
agent-browser close  # State auto-saved

# Next time: state auto-restored
agent-browser --session-name myapp open https://app.example.com/dashboard
```

**Option 4: Auth vault (credentials stored encrypted, login by name)**

```bash
echo "$PASSWORD" | agent-browser auth save myapp --url https://app.example.com/login --username user --password-stdin
agent-browser auth login myapp
```

`auth login` navigates with `load`

What is this skill?

Four-step loop: open URL → snapshot -i for @refs → interact (fill/click/select) → re-snapshot after DOM changes

Runs on Chrome/Chromium via CDP; install via npm global, Homebrew, or cargo, plus agent-browser install for Chrome

Snapshot -i outputs stable element refs (@e1, @e2) for fill, click, and wait --load style commands

Covers navigation, forms, buttons, screenshots, scraping, logins, and repeatable web-app test passes

Allowed tooling is Bash scoped to npx agent-browser and agent-browser CLI invocations

Documented 4-step core workflow: navigate, snapshot -i, interact, re-snapshot

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 526 installs on skills.sh; 20.5k GitHub stars; 1/3 security scanners passed (skills.sh audits).

Who is it for?

Solo builders validating flows on staging, automating repetitive browser QA, or harvesting page data when shell-only tools are insufficient.

Skip if: Teams that forbid shell/browser tooling on agents, headless-only API testing with no UI, or jobs that need persistent crawler infrastructure instead of interactive CLI sessions.

What do I get? / Deliverables

Completed browser interactions (navigation, form submit, clicks)

Screenshots or extracted page data

Repeatable CLI command sequences with @element refs

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

Also useful

Where it fits

Example use

Click through a landing-page signup on staging before you commit to the full build.

Example use

Drive OAuth or payment iframes via snapshot refs while wiring a SaaS checkout integration.

Example use