Agent Browser

Agent-browser is cataloged under Build because its core job is extending what AI agents can do in the product—structured web automation is foundational agent tooling, not a one-off ship-only test harness. Fits agent-tooling: it is infrastructure agents invoke (open/snapshot/click/fill) rather than a human-only frontend framework or a standalone backend API skill.

Also useful

Also useful

Where it fits

Example use

Wire an agent to open your staging app, snapshot refs, and fill a signup form before merging an integration PR.

Example use

Run a quick agent-led smoke pass: open production, screenshot key pages, and verify text on a billing screen.

Example use

Prove a competitor or pricing page scrape works via snapshot and get text before committing to a full data pipeline.

How it compares

Browser automation CLI for agents—not a hosted Playwright MCP server or manual-only QA checklist.

Common Questions / FAQ

Who is agent browser for?

Indie and solo developers using Claude Code, Codex, or Cursor who want agents to automate real browser tasks during build, validate, and ship workflows.

When should I use agent browser?

During Build agent-tooling when integrating web flows; during Validate prototype when proving scrapers or forms; during Ship testing when agents need UI smoke checks via snapshot and click commands.

Is agent browser safe to install?

It requires node/npm and can control a real browser with network and filesystem access—review the Security Audits panel on this Prism page and pin versions before global install.

SKILL.md

READMESKILL.md - Agent Browser

# Agent Browser

A fast Rust-based headless browser automation CLI with Node.js fallback that enables AI agents to navigate, click, type, and snapshot pages via structured commands.

## Installation

### npm recommended

```bash
npm install -g agent-browser
agent-browser install
agent-browser install --with-deps
```

### From Source

```bash
git clone https://github.com/vercel-labs/agent-browser
cd agent-browser
pnpm install
pnpm build
agent-browser install
```

## Quick Start

```bash
agent-browser open example.com
agent-browser snapshot
agent-browser click @e2
agent-browser fill @e3 "test@example.com"
agent-browser get text @e1
agent-browser screenshot page.png
agent-browser close
```

## Using Real Chrome Profile (for OAuth/Logged-in Sessions)

For sites requiring Google/Discord/etc login (like star-swap.com):

**Method 1: Launch Chrome with custom profile, connect via CDP**

```bash
# Terminal 1: Launch Chrome with your real profile and remote debugging
google-chrome --remote-debugging-port=9222 --user-data-dir=/home/willr/.config/google-chrome/Default &

# Terminal 2: Connect agent-browser to that Chrome instance
agent-browser --cdp 9222 open "https://star-swap.com"
agent-browser --cdp 9222 snapshot -i
agent-browser --cdp 9222 click e2

# This reuses your existing Google session - no re-login needed!
# Works for: Google OAuth, Discord OAuth, any site you're logged into in Chrome
```

**Method 2: Session persistence (first-time manual login)**

```bash
# First time: headed mode, login manually
agent-browser --headed --session starswap open "https://star-swap.com"
# Complete Google OAuth manually in the browser window
# Close when done

# Future runs: cookies persist!
agent-browser --session starswap open "https://star-swap.com"
# Already logged in automatically
```

**am.will.ryan Chrome profile:** `/home/willr/.config/google-chrome/Default`

## Core Commands

### Navigation

```bash
agent-browser open <url>
agent-browser back
agent-browser forward
agent-browser reload
```

### Interaction

```bash
agent-browser click <sel>
agent-browser dblclick <sel>
agent-browser focus <sel>
agent-browser type <sel> <text>
agent-browser fill <sel> <text>
agent-browser clear <sel>
agent-browser press <key>
agent-browser keydown <key>
agent-browser keyup <key>
agent-browser hover <sel>
agent-browser select <sel> <val>
agent-browser check <sel>
agent-browser uncheck <sel>
agent-browser drag <src> <tgt>
agent-browser upload <sel> <files>
```

### Extraction and Info

```bash
agent-browser snapshot
agent-browser get text <sel>
agent-browser get html <sel>
agent-browser get value <sel>
agent-browser get attr <sel> <attr>
agent-browser get title
agent-browser get url
agent-browser get count <sel>
agent-browser get box <sel>
agent-browser screenshot [path]
agent-browser pdf <path>
```

### Check State

```bash
agent-browser is visible <sel>
agent-browser is enabled <sel>
agent-browser is checked <sel>
```

### Find Elements

- agent-browser find role <role> <action> [value]
- agent-browser find text <text> <action>
- agent-browser find label <label> <action> [value]
- agent-browser find placeholder <ph> <action> [value]
- agent-browser find alt <text> <action>
- agent-browser find title <text> <action>
- agent-browser find testid <id> <action> [value]

Actions include click, fill, check, hover, and text.

### Wait and Timing

```bash
agent-browser wait <selector>
agent-browser wait <ms>
agent-browser wait --text "Welcome"
agent-browser wait --url "**/dash"
agent-browser wait --load networkidle
```

### Advanced Control

```bash
agen

What is this skill?

Rust-first headless browser CLI with Node.js fallback for agent-driven automation

Structured commands: open, snapshot, click (@refs), fill, get text, screenshot, close

npm global install path plus from-source pnpm build workflow

Real Chrome profile + CDP remote debugging for OAuth and logged-in sessions

read_when triggers: web UI automation, structured extraction, programmatic forms, UI testing

Quick-start documents 7 core CLI steps: open, snapshot, click, fill, get text, screenshot, close

Compatible agents: Codex, Claude Code, Cursor, any compatible agent

Adoption & trust: 1.2k installs on skills.sh; 941 GitHub stars; 1/3 security scanners passed (skills.sh audits).

Who is it for?

Solo builders wiring agents to scrape, test, or operate web UIs from the terminal when structured refs and snapshots beat one-off Puppeteer snippets.

Skip if: Teams that only need static unit tests with no browser, or production monitoring without reviewing install permissions and the Security Audits panel on this page.

What do I get? / Deliverables

Agents run repeatable browser sessions—open, snapshot, click, fill, extract, screenshot—with an optional logged-in Chrome profile path for OAuth-heavy sites.

Automated browser session logs and element snapshots

Screenshots and extracted page text for agent reasoning

Repeatable CLI command sequences for web flows

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

Also useful

Where it fits

Example use

Wire an agent to open your staging app, snapshot refs, and fill a signup form before merging an integration PR.

Example use

Run a quick agent-led smoke pass: open production, screenshot key pages, and verify text on a billing screen.

Example use