
Agent Browser
Give coding agents a headless browser CLI to open pages, click, fill forms, snapshot DOM, and scrape structured data during builds and QA.
Overview
Agent Browser is an agent skill most often used in Build (also Ship testing and Validate prototyping) that exposes a Rust/Node headless browser CLI so agents can navigate, interact with, and snapshot web pages via struct
Install
npx skills add https://github.com/am-will/codex-skills --skill agent-browserWhat is this skill?
- Rust-first headless browser CLI with Node.js fallback for agent-driven automation
- Structured commands: open, snapshot, click (@refs), fill, get text, screenshot, close
- npm global install path plus from-source pnpm build workflow
- Real Chrome profile + CDP remote debugging for OAuth and logged-in sessions
- read_when triggers: web UI automation, structured extraction, programmatic forms, UI testing
- Quick-start documents 7 core CLI steps: open, snapshot, click, fill, get text, screenshot, close
Adoption & trust: 1.2k installs on skills.sh; 941 GitHub stars; 1/3 security scanners passed (skills.sh audits).
What problem does it solve?
Your agent can write code but cannot reliably open real pages, click controls, or pull structured DOM state without a dedicated browser automation bridge.
Who is it for?
Solo builders wiring agents to scrape, test, or operate web UIs from the terminal when structured refs and snapshots beat one-off Puppeteer snippets.
Skip if: Teams that only need static unit tests with no browser, or production monitoring without reviewing install permissions and the Security Audits panel on this page.
When should I use this skill?
Automating web interactions, extracting structured data from pages, filling forms programmatically, or testing web UIs.
What do I get? / Deliverables
Agents run repeatable browser sessions—open, snapshot, click, fill, extract, screenshot—with an optional logged-in Chrome profile path for OAuth-heavy sites.
- Automated browser session logs and element snapshots
- Screenshots and extracted page text for agent reasoning
- Repeatable CLI command sequences for web flows
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Agent-browser is cataloged under Build because its core job is extending what AI agents can do in the product—structured web automation is foundational agent tooling, not a one-off ship-only test harness. Fits agent-tooling: it is infrastructure agents invoke (open/snapshot/click/fill) rather than a human-only frontend framework or a standalone backend API skill.
Where it fits
Wire an agent to open your staging app, snapshot refs, and fill a signup form before merging an integration PR.
Run a quick agent-led smoke pass: open production, screenshot key pages, and verify text on a billing screen.
Prove a competitor or pricing page scrape works via snapshot and get text before committing to a full data pipeline.
How it compares
Browser automation CLI for agents—not a hosted Playwright MCP server or manual-only QA checklist.
Common Questions / FAQ
Who is agent browser for?
Indie and solo developers using Claude Code, Codex, or Cursor who want agents to automate real browser tasks during build, validate, and ship workflows.
When should I use agent browser?
During Build agent-tooling when integrating web flows; during Validate prototype when proving scrapers or forms; during Ship testing when agents need UI smoke checks via snapshot and click commands.
Is agent browser safe to install?
It requires node/npm and can control a real browser with network and filesystem access—review the Security Audits panel on this Prism page and pin versions before global install.
SKILL.md
READMESKILL.md - Agent Browser
# Agent Browser A fast Rust-based headless browser automation CLI with Node.js fallback that enables AI agents to navigate, click, type, and snapshot pages via structured commands. ## Installation ### npm recommended ```bash npm install -g agent-browser agent-browser install agent-browser install --with-deps ``` ### From Source ```bash git clone https://github.com/vercel-labs/agent-browser cd agent-browser pnpm install pnpm build agent-browser install ``` ## Quick Start ```bash agent-browser open example.com agent-browser snapshot agent-browser click @e2 agent-browser fill @e3 "test@example.com" agent-browser get text @e1 agent-browser screenshot page.png agent-browser close ``` ## Using Real Chrome Profile (for OAuth/Logged-in Sessions) For sites requiring Google/Discord/etc login (like star-swap.com): **Method 1: Launch Chrome with custom profile, connect via CDP** ```bash # Terminal 1: Launch Chrome with your real profile and remote debugging google-chrome --remote-debugging-port=9222 --user-data-dir=/home/willr/.config/google-chrome/Default & # Terminal 2: Connect agent-browser to that Chrome instance agent-browser --cdp 9222 open "https://star-swap.com" agent-browser --cdp 9222 snapshot -i agent-browser --cdp 9222 click e2 # This reuses your existing Google session - no re-login needed! # Works for: Google OAuth, Discord OAuth, any site you're logged into in Chrome ``` **Method 2: Session persistence (first-time manual login)** ```bash # First time: headed mode, login manually agent-browser --headed --session starswap open "https://star-swap.com" # Complete Google OAuth manually in the browser window # Close when done # Future runs: cookies persist! agent-browser --session starswap open "https://star-swap.com" # Already logged in automatically ``` **am.will.ryan Chrome profile:** `/home/willr/.config/google-chrome/Default` ## Core Commands ### Navigation ```bash agent-browser open <url> agent-browser back agent-browser forward agent-browser reload ``` ### Interaction ```bash agent-browser click <sel> agent-browser dblclick <sel> agent-browser focus <sel> agent-browser type <sel> <text> agent-browser fill <sel> <text> agent-browser clear <sel> agent-browser press <key> agent-browser keydown <key> agent-browser keyup <key> agent-browser hover <sel> agent-browser select <sel> <val> agent-browser check <sel> agent-browser uncheck <sel> agent-browser drag <src> <tgt> agent-browser upload <sel> <files> ``` ### Extraction and Info ```bash agent-browser snapshot agent-browser get text <sel> agent-browser get html <sel> agent-browser get value <sel> agent-browser get attr <sel> <attr> agent-browser get title agent-browser get url agent-browser get count <sel> agent-browser get box <sel> agent-browser screenshot [path] agent-browser pdf <path> ``` ### Check State ```bash agent-browser is visible <sel> agent-browser is enabled <sel> agent-browser is checked <sel> ``` ### Find Elements - agent-browser find role <role> <action> [value] - agent-browser find text <text> <action> - agent-browser find label <label> <action> [value] - agent-browser find placeholder <ph> <action> [value] - agent-browser find alt <text> <action> - agent-browser find title <text> <action> - agent-browser find testid <id> <action> [value] Actions include click, fill, check, hover, and text. ### Wait and Timing ```bash agent-browser wait <selector> agent-browser wait <ms> agent-browser wait --text "Welcome" agent-browser wait --url "**/dash" agent-browser wait --load networkidle ``` ### Advanced Control ```bash agen