
Agent Browser
Drive real browser sessions from the agent CLI to test flows, fill forms, capture screenshots, and extract page data via snapshot refs.
Overview
agent-browser is an agent skill most often used in Ship (also Validate prototype, Launch distribution) that automates browser navigation, interaction, screenshots, and extraction through snapshot-based element refs.
Install
npx skills add https://github.com/vercel-labs/open-agents --skill agent-browserWhat is this skill?
- Quick start: open, snapshot -i, click/fill by @ref, close
- Navigation: open/back/forward/reload/close and CDP connect on port 9222
- Snapshot -i returns interactive elements with stable refs (@e1, @e2)
- Supports https, http, file://, about:, data:// URLs with https default
- Bash allowed-tools scoped to agent-browser:* commands
Adoption & trust: 404k installs on skills.sh; 5.6k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need your coding agent to exercise a live web UI, not just read static HTML or mocked APIs.
Who is it for?
Solo builders shipping web apps who want CLI browser automation triggered from Claude Code or similar agents.
Skip if: Headless-only API testing with no DOM, or teams blocked from Bash(agent-browser:*) tool permissions.
When should I use this skill?
User needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages.
What do I get? / Deliverables
You run repeatable browser sessions—navigate, snapshot, click, fill—and close cleanly for tests, demos, or data pulls.
- Executed browser navigation and interaction sessions
- Interactive element snapshots with @refs
- Screenshots or extracted page data from live sessions
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Browser automation is shelved under ship testing as the primary gate before launch, even though it also supports validate prototypes. Snapshot-interact-re-snapshot workflow maps directly to QA and smoke testing of web apps.
Where it fits
Walk through a staging URL to confirm a landing CTA before you commit to full build.
Run open → snapshot -i → click @e1 on checkout to catch regressions pre-release.
Screenshot production pricing page after deploy for launch checklist evidence.
How it compares
CLI snapshot-ref browser driver—not a static checklist skill or an MCP server catalog entry.
Common Questions / FAQ
Who is agent-browser for?
Developers using AI agents to test web applications, complete forms, take screenshots, or extract page information.
When should I use agent-browser?
In ship testing for smoke flows, in validate prototype for clickable demos, or at launch when checking live distribution pages—whenever SKILL.md says navigate, interact, fill forms, screenshot, or extract from web pages.
Is agent-browser safe to install?
Check Security Audits on this page; the skill grants Bash access to agent-browser commands and drives a real browser with network reach.
SKILL.md
READMESKILL.md - Agent Browser
# Browser Automation with agent-browser ## Quick start ```bash agent-browser open <url> # Navigate to page agent-browser snapshot -i # Get interactive elements with refs agent-browser click @e1 # Click element by ref agent-browser fill @e2 "text" # Fill input by ref agent-browser close # Close browser ``` ## Core workflow 1. Navigate: `agent-browser open <url>` 2. Snapshot: `agent-browser snapshot -i` (returns elements with refs like `@e1`, `@e2`) 3. Interact using refs from the snapshot 4. Re-snapshot after navigation or significant DOM changes ## Commands ### Navigation ```bash agent-browser open <url> # Navigate to URL (aliases: goto, navigate) # Supports: https://, http://, file://, about:, data:// # Auto-prepends https:// if no protocol given agent-browser back # Go back agent-browser forward # Go forward agent-browser reload # Reload page agent-browser close # Close browser (aliases: quit, exit) agent-browser connect 9222 # Connect to browser via CDP port ``` ### Snapshot (page analysis) ```bash agent-browser snapshot # Full accessibility tree agent-browser snapshot -i # Interactive elements only (recommended) agent-browser snapshot -c # Compact output agent-browser snapshot -d 3 # Limit depth to 3 agent-browser snapshot -s "#main" # Scope to CSS selector ``` ### Interactions (use @refs from snapshot) ```bash agent-browser click @e1 # Click agent-browser dblclick @e1 # Double-click agent-browser focus @e1 # Focus element agent-browser fill @e2 "text" # Clear and type agent-browser type @e2 "text" # Type without clearing agent-browser press Enter # Press key (alias: key) agent-browser press Control+a # Key combination agent-browser keydown Shift # Hold key down agent-browser keyup Shift # Release key agent-browser hover @e1 # Hover agent-browser check @e1 # Check checkbox agent-browser uncheck @e1 # Uncheck checkbox agent-browser select @e1 "value" # Select dropdown option agent-browser select @e1 "a" "b" # Select multiple options agent-browser scroll down 500 # Scroll page (default: down 300px) agent-browser scrollintoview @e1 # Scroll element into view (alias: scrollinto) agent-browser drag @e1 @e2 # Drag and drop agent-browser upload @e1 file.pdf # Upload files ``` ### Get information ```bash agent-browser get text @e1 # Get element text agent-browser get html @e1 # Get innerHTML agent-browser get value @e1 # Get input value agent-browser get attr @e1 href # Get attribute agent-browser get title # Get page title agent-browser get url # Get current URL agent-browser get count ".item" # Count matching elements agent-browser get box @e1 # Get bounding box agent-browser get styles @e1 # Get computed styles (font, color, bg, etc.) ``` ### Check state ```bash agent-browser is visible @e1 # Check if visible agent-browser is enabled @e1 # Check if enabled agent-browser is checked @e1 # Check if checked ``` ### Screenshots & PDF ```bash agent-browser screenshot # Save to a temporary directory agent-browser screenshot path.png # Save to a specific path agent-browser screenshot --full # Full page agent-browser pdf output.pdf # Save as PDF ``` ### Video recording ```bash agent-browser record start ./demo.webm # Start recording (uses current URL + state) agent-browser click @e1 # Perform actions agent