Agent Browser Automation

Name: Agent Browser Automation
Author: aradotso

aradotso/trending-skills

1.9k installs
66 repo stars
Updated July 9, 2026
aradotso/trending-skills

agent-browser-automation is an agent skill that Headless browser automation CLI for AI agents using native Rust binary with Chrome DevTools Protocol.

About

Skill by ara so https ara so Daily 2026 Skills collection agent browser is a headless browser automation CLI built in Rust designed for AI agents It wraps Chrome via the Chrome DevTools Protocol CDP and exposes a fast ergonomic command line interface for navigation interaction accessibility snapshots screenshots network interception and more with no Node js or Playwright runtime required Recommended npm global bash npm install g agent browser agent browser install Download Chrome for Testing first time only macOS Homebrew bash brew install agent browser agent browser install Rust Cargo bash cargo install agent browser agent browser install The agent browser automation agent skill provides documented workflows prerequisites triggers and safety guidance from its SKILL md source Agents load it when user requests match the description and follow step by step instructions without inventing capabilities It integrates with standard agent tooling for the tasks inputs outputs and failure modes described in the repository documentation

description: Headless browser automation CLI for AI agents using native Rust binary with Chrome DevTools Protocol
- automate browser with agent-browser
- use agent-browser for web scraping
Follow agent-browser-automation SKILL.md steps and documented constraints.
Follow agent-browser-automation SKILL.md steps and documented constraints.

Agent Browser Automation by the numbers

1,950 all-time installs (skills.sh)
+19 installs in the week ending Jul 28, 2026 (Skillselion tracking)
Ranked #614 of 16,659 AI & Agent Building skills by installs in the Skillselion catalog
Security screen: MEDIUM risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

At a glance

agent-browser-automation capabilities & compatibility

Capabilities: description: headless browser automation cli for · automate browser with agent browser · use agent browser for web scraping · follow agent browser automation skill.md steps a
Use cases: orchestration

From the docs

What agent-browser-automation says it does

description: Headless browser automation CLI for AI agents using native Rust binary with Chrome DevTools Protocol

SKILL.md

- automate browser with agent-browser

SKILL.md

- use agent-browser for web scraping

SKILL.md

npx skills add https://github.com/aradotso/trending-skills --skill agent-browser-automation

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/aradotso/trending-skills/agent-browser-automation.svg)](https://skillselion.com/skills/aradotso/trending-skills/agent-browser-automation)

Installs	1.9k
repo stars	★ 66
Security audit	1 / 3 scanners passed
Last updated	July 9, 2026
Repository	aradotso/trending-skills ↗

When should an agent use agent-browser-automation and what problem does it solve?

Headless browser automation CLI for AI agents using native Rust binary with Chrome DevTools Protocol

Who is it for?

Developers invoking agent-browser-automation as documented in the skill source.

Skip if: Skip when requirements fall outside agent-browser-automation documented scope.

When should I use this skill?

Headless browser automation CLI for AI agents using native Rust binary with Chrome DevTools Protocol

What you get

Outputs aligned with the agent-browser-automation SKILL.md workflow and stated deliverables.

Browser screenshots
Accessibility snapshots
Automated interaction logs

Files

SKILL.mdMarkdownGitHub ↗

agent-browser

Skill by ara.so — Daily 2026 Skills collection.

agent-browser is a headless browser automation CLI built in Rust, designed for AI agents. It wraps Chrome via the Chrome DevTools Protocol (CDP) and exposes a fast, ergonomic command-line interface for navigation, interaction, accessibility snapshots, screenshots, network interception, and more — with no Node.js or Playwright runtime required.

Installation

Recommended (npm global)

npm install -g agent-browser
agent-browser install  # Download Chrome for Testing (first time only)

macOS (Homebrew)

brew install agent-browser
agent-browser install

Rust / Cargo

cargo install agent-browser
agent-browser install

Local project dependency

npm install agent-browser
# Add to package.json scripts or invoke via npx

Linux (with system dependencies)

agent-browser install --with-deps

Quick Start

agent-browser open https://example.com
agent-browser snapshot                        # Accessibility tree with @refs (best for AI)
agent-browser click @e2                       # Click by ref from snapshot
agent-browser fill @e3 "hello@example.com"   # Fill by ref
agent-browser get text @e1                    # Get text content
agent-browser screenshot page.png
agent-browser close

Core Commands

Navigation

agent-browser open <url>           # Navigate (aliases: goto, navigate)
agent-browser get url              # Get current URL
agent-browser get title            # Get page title
agent-browser close                # Close browser (aliases: quit, exit)

Accessibility Snapshot (recommended for AI agents)

agent-browser snapshot             # Returns accessibility tree with @ref IDs
agent-browser snapshot -i          # Interactive / compact mode

Snapshot output includes @eN refs you can use directly:

@e1 [button] "Submit"
@e2 [textbox] "Email" value=""
@e3 [link] "Sign in"

Then act on them:

agent-browser fill @e2 "user@example.com"
agent-browser click @e1

Interaction

agent-browser click <sel>                     # Click element
agent-browser dblclick <sel>                  # Double-click
agent-browser fill <sel> <text>               # Clear and fill input
agent-browser type <sel> <text>               # Type into element
agent-browser press <key>                     # Press key (Enter, Tab, Control+a)
agent-browser keyboard type <text>            # Type at current focus (real keystrokes)
agent-browser keyboard inserttext <text>      # Insert text without key events
agent-browser hover <sel>                     # Hover element
agent-browser select <sel> <value>            # Select dropdown option
agent-browser check <sel>                     # Check checkbox
agent-browser uncheck <sel>                   # Uncheck checkbox
agent-browser scroll down 500                 # Scroll (up/down/left/right, optional px)
agent-browser scroll down --selector "#feed"  # Scroll within element
agent-browser scrollintoview <sel>            # Scroll element into view
agent-browser drag <src> <target>             # Drag and drop
agent-browser upload <sel> /path/file.pdf     # Upload file

Screenshots & PDF

agent-browser screenshot                          # Save to temp dir, print path
agent-browser screenshot page.png                 # Save to path
agent-browser screenshot --full page.png          # Full-page screenshot
agent-browser screenshot --annotate               # Numbered element labels overlay
agent-browser screenshot --screenshot-dir ./shots # Custom output directory
agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80
agent-browser pdf output.pdf                      # Save page as PDF

Getting Element Info

agent-browser get text <sel>           # Text content
agent-browser get html <sel>           # innerHTML
agent-browser get value <sel>          # Input value
agent-browser get attr <sel> <attr>    # Attribute value
agent-browser get count <sel>          # Count matching elements
agent-browser get box <sel>            # Bounding box
agent-browser get styles <sel>         # Computed styles
agent-browser get cdp-url              # CDP WebSocket URL

State Checks

agent-browser is visible <sel>
agent-browser is enabled <sel>
agent-browser is checked <sel>

Semantic Locators (find)

agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "test@example.com"
agent-browser find placeholder "Search..." fill "rust"
agent-browser find testid "login-btn" click
agent-browser find first ".item" click
agent-browser find nth 2 "a" text
agent-browser find role textbox fill "hello" --name "Username"

Actions: click, fill, type, hover, focus, check, uncheck, text

Waiting

agent-browser wait "#modal"                          # Wait for element visible
agent-browser wait 2000                              # Wait N milliseconds
agent-browser wait --text "Welcome back"             # Wait for text
agent-browser wait --url "**/dashboard"              # Wait for URL pattern
agent-browser wait --load networkidle                # Wait for load state
agent-browser wait --fn "window.appReady === true"   # Wait for JS condition
agent-browser wait "#spinner" --state hidden         # Wait for element to disappear

Load states: load, domcontentloaded, networkidle

JavaScript Eval

agent-browser eval "document.title"
agent-browser eval "JSON.stringify(window.__STATE__)"
agent-browser eval -b "BASE64_ENCODED_JS"
echo "return document.body.innerHTML" | agent-browser eval --stdin

Batch Execution (efficient multi-step)

echo '[
  ["open", "https://example.com"],
  ["snapshot", "-i"],
  ["fill", "@e2", "user@example.com"],
  ["click", "@e1"],
  ["screenshot", "result.png"]
]' | agent-browser batch --json

# Stop on first failure
agent-browser batch --bail < commands.json

Tabs & Frames

agent-browser tab                    # List tabs
agent-browser tab new https://...    # New tab with URL
agent-browser tab 2                  # Switch to tab 2
agent-browser tab close              # Close current tab
agent-browser frame "#my-iframe"     # Switch into iframe
agent-browser frame main             # Return to main frame

Cookies & Storage

agent-browser cookies
agent-browser cookies set session_id "abc123"
agent-browser cookies clear

agent-browser storage local
agent-browser storage local set theme dark
agent-browser storage local clear
agent-browser storage session set cart '{"items":[]}'

Network

agent-browser network route "**/api/users" --body '{"users":[]}'  # Mock response
agent-browser network route "**/ads/**" --abort                    # Block requests
agent-browser network unroute                                       # Remove all routes
agent-browser network requests --filter api                        # View requests
agent-browser network har start
agent-browser network har stop recording.har

Browser Settings

agent-browser set viewport 1280 800
agent-browser set viewport 375 812 2        # With device pixel ratio (retina)
agent-browser set device "iPhone 14"
agent-browser set geo 37.7749 -122.4194
agent-browser set offline on
agent-browser set headers '{"X-Custom":"value"}'
agent-browser set credentials admin secret
agent-browser set media dark

Auth State

agent-browser state save ./auth.json    # Save cookies + localStorage
agent-browser state load ./auth.json    # Restore auth state
agent-browser state list                # List saved states
agent-browser state show auth.json      # Summary of saved state

Dialogs

agent-browser dialog accept             # Accept alert/confirm/prompt
agent-browser dialog accept "My input"  # Accept prompt with text
agent-browser dialog dismiss

Clipboard

agent-browser clipboard read
agent-browser clipboard write "Hello, World!"
agent-browser clipboard copy           # Ctrl+C current selection
agent-browser clipboard paste          # Ctrl+V

Diff & Visual Testing

agent-browser diff snapshot                                  # vs last snapshot
agent-browser diff snapshot --baseline before.txt            # vs saved file
agent-browser diff snapshot --selector "#main" --compact
agent-browser diff screenshot --baseline before.png
agent-browser diff screenshot --baseline b.png -o diff.png
agent-browser diff url https://v1.example.com https://v2.example.com
agent-browser diff url https://v1.example.com https://v2.example.com --screenshot
agent-browser diff url https://v1.example.com https://v2.example.com --selector "#content"

Debug & Profiling

agent-browser trace start trace.zip
agent-browser trace stop
agent-browser profiler start
agent-browser profiler stop profile.json
agent-browser console                  # View console messages
agent-browser errors                   # View uncaught JS exceptions
agent-browser highlight "#button"      # Visually highlight element
agent-browser inspect                  # Open Chrome DevTools
agent-browser connect 9222             # Connect to existing browser via CDP port

Common Patterns

Login flow and save session

#!/bin/bash
agent-browser open https://app.example.com/login
agent-browser fill "#email" "$LOGIN_EMAIL"
agent-browser fill "#password" "$LOGIN_PASSWORD"
agent-browser click "[type=submit]"
agent-browser wait --url "**/dashboard"
agent-browser state save ./session.json

AI agent loop with snapshot-driven interaction

#!/bin/bash
agent-browser open https://app.example.com
agent-browser state load ./session.json

# Get snapshot, parse @refs, act
SNAPSHOT=$(agent-browser snapshot)
echo "$SNAPSHOT"

# Agent determines @e5 is the search box
agent-browser fill @e5 "quarterly report"
agent-browser press Enter
agent-browser wait --load networkidle
agent-browser snapshot
agent-browser screenshot results.png

Batch commands from a script (JSON)

cat > commands.json << 'EOF'
[
  ["open", "https://news.ycombinator.com"],
  ["wait", "--load", "networkidle"],
  ["get", "title"],
  ["snapshot"],
  ["screenshot", "hn.png"]
]
EOF

agent-browser batch --json < commands.json

Scrape with mocked network

agent-browser open https://api-heavy-app.example.com
agent-browser network route "**/api/slow-endpoint" --body '{"data":"mocked"}'
agent-browser snapshot
agent-browser network unroute

Full-page screenshot with annotations

agent-browser open https://example.com
agent-browser wait --load networkidle
agent-browser screenshot --full --annotate annotated.png

Connect to already-running Chrome

# Start Chrome with remote debugging
google-chrome --remote-debugging-port=9222 &

agent-browser connect 9222
agent-browser open https://example.com
agent-browser snapshot

Emulate mobile device

agent-browser set device "iPhone 14"
agent-browser open https://example.com
agent-browser screenshot mobile.png

HAR recording for network analysis

agent-browser open https://example.com
agent-browser network har start
agent-browser click "#load-data"
agent-browser wait --load networkidle
agent-browser network har stop session.har

Selector Reference

Format	Example	Notes
`@ref`	`@e1`, `@e12`	From `snapshot` output — preferred for AI
CSS	`#id`, `.class`, `[attr=val]`	Standard CSS selectors
Text	`"Sign In"`	Exact text match
XPath	`//button[@type='submit']`	Full XPath

Troubleshooting

Chrome not found

agent-browser install              # Downloads Chrome for Testing
agent-browser install --with-deps  # Linux: also installs system libs

Element not found / timing issues

agent-browser wait "#my-element"              # Wait for visibility first
agent-browser wait --load networkidle         # Wait for page to settle
agent-browser wait --fn "!!document.querySelector('#app')"

Selector issues — use snapshot refs instead

# Instead of fragile CSS:
agent-browser click ".btn.btn-primary.submit-form"

# Use snapshot refs:
agent-browser snapshot  # Find @e7 = [button] "Submit"
agent-browser click @e7

Debug what's on the page

agent-browser screenshot debug.png        # Visual check
agent-browser snapshot                    # Accessibility tree
agent-browser console                     # JS console output
agent-browser errors                      # Uncaught exceptions
agent-browser eval "document.readyState"

Auth issues between sessions

agent-browser state save ./auth.json   # After successful login
agent-browser state load ./auth.json   # At start of next session

Handling alerts/dialogs

# Set up handler BEFORE the action that triggers dialog
agent-browser dialog accept
agent-browser click "#delete-button"

Performance — use batch for multi-step workflows

# Slow: one process per command
agent-browser open https://example.com
agent-browser fill "#q" "search"
agent-browser click "#submit"

# Fast: single process, multiple commands
echo '[["open","https://example.com"],["fill","#q","search"],["click","#submit"]]' \
  | agent-browser batch --json

Related skills

Setup Matt Pocock SkillsScaffold the per-repo configuration that Matt Pocock’s engineering agent skills rely on so they understand the issue tracker, triage labels, and domain documentation la462k185k

Lark Skill MakerQuickly turn any Lark/Feishu OpenAPI call or multi-step workflow into a reusable agent skill with its own SKILL.md.379k15.8k

CavemanSlash token usage by roughly 75% while keeping every technical detail intact when working with Claude Code, Cursor or similar agents.378k92.5k

Lark AppsConnect Claude, Cursor or custom agents directly to Lark (Feishu) for messaging, document automation, approval workflows and enterprise data access.375k

Running Claude Code Via Litellm CopilotRun Claude Code at a fraction of the cost by routing requests through LiteLLM to the GitHub Copilot Chat API.270k72

Codex PetGenerate a complete Codex Pet spritesheet and metadata from one reference image without needing an OpenAI key or Codex Pro.246k8

How it compares

Choose over Playwright or Puppeteer when agents need a minimal Rust CLI without Node.js dependencies.

FAQ

What is agent-browser-automation?

Headless browser automation CLI for AI agents using native Rust binary with Chrome DevTools Protocol

When should I use agent-browser-automation?

Headless browser automation CLI for AI agents using native Rust binary with Chrome DevTools Protocol

Is agent-browser-automation safe to install?

Review the Security Audits panel on this page before production use.

AI & Agent Buildingagents

About

Agent Browser Automation by the numbers

agent-browser-automation capabilities & compatibility

What agent-browser-automation says it does

Add your badge

When should an agent use agent-browser-automation and what problem does it solve?

Who is it for?

When should I use this skill?

What you get

Files

agent-browser

Installation

Recommended (npm global)

macOS (Homebrew)

Rust / Cargo

Local project dependency

Linux (with system dependencies)

Quick Start

Core Commands

Navigation

Accessibility Snapshot (recommended for AI agents)

Interaction

Screenshots & PDF

Getting Element Info

State Checks

Semantic Locators (find)

Waiting

JavaScript Eval

Batch Execution (efficient multi-step)

Tabs & Frames

Cookies & Storage

Network

Browser Settings

Auth State

Dialogs

Clipboard

Diff & Visual Testing

Debug & Profiling

Common Patterns

Login flow and save session

AI agent loop with snapshot-driven interaction

Batch commands from a script (JSON)

Scrape with mocked network

Full-page screenshot with annotations

Connect to already-running Chrome

Emulate mobile device

HAR recording for network analysis

Selector Reference

Troubleshooting

Chrome not found

Element not found / timing issues

Selector issues — use snapshot refs instead

Debug what's on the page

Auth issues between sessions

Handling alerts/dialogs

Performance — use batch for multi-step workflows

Related skills

How it compares

FAQ

What is agent-browser-automation?

When should I use agent-browser-automation?

Is agent-browser-automation safe to install?

This week in AI coding