
Agent Browser
Install this when your agent needs to drive a real browser—login flows, forms, screenshots, scraping, or end-to-end web app checks via a CDP-based CLI.
Overview
agent-browser is an agent skill most often used in Build (also Validate prototype, Ship testing) that automates Chrome/Chromium browser tasks through a CDP CLI—navigate, snapshot elements, interact, and re-snapshot after
Install
npx skills add https://github.com/everyinc/compound-engineering-plugin --skill agent-browserWhat is this skill?
- Four-step loop: open URL → snapshot -i for @refs → interact (fill/click/select) → re-snapshot after DOM changes
- Runs on Chrome/Chromium via CDP; install via npm global, Homebrew, or cargo, plus agent-browser install for Chrome
- Snapshot -i outputs stable element refs (@e1, @e2) for fill, click, and wait --load style commands
- Covers navigation, forms, buttons, screenshots, scraping, logins, and repeatable web-app test passes
- Allowed tooling is Bash scoped to npx agent-browser and agent-browser CLI invocations
- Documented 4-step core workflow: navigate, snapshot -i, interact, re-snapshot
Adoption & trust: 526 installs on skills.sh; 20.5k GitHub stars; 1/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need your coding agent to log into a site, submit a form, screenshot UI, or scrape a page, but ad-hoc curl or manual QA cannot exercise real browser behavior.
Who is it for?
Solo builders validating flows on staging, automating repetitive browser QA, or harvesting page data when shell-only tools are insufficient.
Skip if: Teams that forbid shell/browser tooling on agents, headless-only API testing with no UI, or jobs that need persistent crawler infrastructure instead of interactive CLI sessions.
When should I use this skill?
User needs to interact with websites—navigate, fill forms, click, screenshot, scrape, test web apps, login, or automate any browser task; triggers include “open a website”, “fill out a form”, “take a screenshot”, “scrape
What do I get? / Deliverables
After the skill runs, your agent executes a repeatable open → snapshot -i → interact → re-snapshot loop and leaves you with completed browser actions, captures, or extracted page data instead of brittle one-off instructi
- Completed browser interactions (navigation, form submit, clicks)
- Screenshots or extracted page data
- Repeatable CLI command sequences with @element refs
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Browser automation is shelved under Build → integrations because it wires an external runtime (Chrome/Chromium + agent-browser) into your agent workflow. Integrations is the canonical home for CLI tools that connect agents to browsers, CDP, and programmatic page interaction—not a one-off script in chat.
Where it fits
Click through a landing-page signup on staging before you commit to the full build.
Drive OAuth or payment iframes via snapshot refs while wiring a SaaS checkout integration.
Re-run open → snapshot → submit on a release candidate to catch broken selectors.
Capture competitor pricing or changelog pages for a lightweight research pass.
How it compares
Use this CLI-integrated browser driver instead of asking the model to guess DOM selectors without snapshot refs or maintaining a separate Playwright repo for every small check.
Common Questions / FAQ
Who is agent-browser for?
Indie and solo builders using Claude Code, Cursor, Codex, or similar agents who need real browser interaction—forms, auth, screenshots, scraping, and light web-app testing—from the terminal.
When should I use agent-browser?
Use it during Validate when prototyping a clickable path on a URL; during Build when integrating or debugging UI flows; and during Ship when you need repeatable browser checks before release—anytime triggers like “open this site”, “fill this form”, or “test this web app” appear.
Is agent-browser safe to install?
It requires global or local CLI install and shell execution against live sites; review the Security Audits panel on this Prism page and treat credentials, cookies, and production data with your usual secrets hygiene.
SKILL.md
READMESKILL.md - Agent Browser
# Browser Automation with agent-browser The CLI uses Chrome/Chromium via CDP directly. Install via `npm i -g agent-browser`, `brew install agent-browser`, or `cargo install agent-browser`. Run `agent-browser install` to download Chrome. Run `agent-browser upgrade` to update to the latest version. ## Core Workflow Every browser automation follows this pattern: 1. **Navigate**: `agent-browser open <url>` 2. **Snapshot**: `agent-browser snapshot -i` (get element refs like `@e1`, `@e2`) 3. **Interact**: Use refs to click, fill, select 4. **Re-snapshot**: After navigation or DOM changes, get fresh refs ```bash agent-browser open https://example.com/form agent-browser snapshot -i # Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit" agent-browser fill @e1 "user@example.com" agent-browser fill @e2 "password123" agent-browser click @e3 agent-browser wait --load networkidle agent-browser snapshot -i # Check result ``` ## Command Chaining Commands can be chained with `&&` in a single shell invocation. The browser persists between commands via a background daemon, so chaining is safe and more efficient than separate calls. ```bash # Chain open + wait + snapshot in one call agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i # Chain multiple interactions agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "password123" && agent-browser click @e3 # Navigate and capture agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser screenshot page.png ``` **When to chain:** Use `&&` when you don't need to read the output of an intermediate command before proceeding (e.g., open + wait + screenshot). Run commands separately when you need to parse the output first (e.g., snapshot to discover refs, then interact using those refs). ## Handling Authentication When automating a site that requires login, choose the approach that fits: **Option 1: Import auth from the user's browser (fastest for one-off tasks)** ```bash # Connect to the user's running Chrome (they're already logged in) agent-browser --auto-connect state save ./auth.json # Use that auth state agent-browser --state ./auth.json open https://app.example.com/dashboard ``` State files contain session tokens in plaintext -- add to `.gitignore` and delete when no longer needed. Set `AGENT_BROWSER_ENCRYPTION_KEY` for encryption at rest. **Option 2: Persistent profile (simplest for recurring tasks)** ```bash # First run: login manually or via automation agent-browser --profile ~/.myapp open https://app.example.com/login # ... fill credentials, submit ... # All future runs: already authenticated agent-browser --profile ~/.myapp open https://app.example.com/dashboard ``` **Option 3: Session name (auto-save/restore cookies + localStorage)** ```bash agent-browser --session-name myapp open https://app.example.com/login # ... login flow ... agent-browser close # State auto-saved # Next time: state auto-restored agent-browser --session-name myapp open https://app.example.com/dashboard ``` **Option 4: Auth vault (credentials stored encrypted, login by name)** ```bash echo "$PASSWORD" | agent-browser auth save myapp --url https://app.example.com/login --username user --password-stdin agent-browser auth login myapp ``` `auth login` navigates with `load`