Browser Automation

Name: Browser Automation
Author: web-infra-dev

web-infra-dev/midscene-skills

Run vision-driven Midscene browser automation from your agent to navigate sites, scrape data, fill forms, and QA freshly built UI without brittle DOM selectors.

Overview

browser-automation is an agent skill most often used in Ship (also Build and Validate) that runs vision-driven Midscene browser steps from screenshots to navigate, scrape, interact, and QA web UI.

Install

npx skills add https://github.com/web-infra-dev/midscene-skills --skill browser-automation

What is this skill?

Vision-driven Midscene.js automation from screenshots—no DOM or accessibility labels required
Three run modes: default headless Puppeteer, CDP attach, and Bridge mode for an existing Chrome (does not hijack mouse/k
3 critical workflow rules: never run Midscene in the background, one command at a time, allow each command to finish bef
Bash-invoked flows for browse, scrape, forms, clicks, screenshots, and multi-step web workflows
Connect to user Chrome via CDP, DevTools Protocol, or remote debugging when local inspection matters
3 critical workflow rules (no background runs, one command at a time, allow completion)
3 browser connection modes: headless Puppeteer, CDP, and Bridge

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 3.4k installs on skills.sh; 240 GitHub stars; 2/3 security scanners passed (skills.sh audits).

What problem does it solve?

You need to exercise real websites and fresh UI flows, but DOM selectors, manual clicking, and flaky automation make scraping, forms, and post-build verification slow and unreliable.

Who is it for?

Solo builders shipping web apps who want agent-run browser QA, scraping, and form workflows using Midscene headless Puppeteer or attached Chrome via CDP/Bridge.

Skip if: Teams that need always-on production monitoring, non-web clients only, or fully unattended parallel browser farms—the skill explicitly forbids background and chained Midscene runs.

When should I use this skill?

User wants to browse or navigate web pages, scrape or extract site data, fill forms or click UI, verify or QA frontend behavior, take screenshots, automate multi-step web workflows, test what was just built, or connect t

What do I get? / Deliverables

Your agent completes synchronous Midscene browser commands—navigation, interaction, extraction, and screenshots—with observable output after each step so you can validate UI behavior and finish multi-step web tasks confi

Browser screenshots from each synchronous Midscene step
Scraped or extracted web data from completed flows
Documented pass/fail observations from UI verification runs

Recommended Skills

Agent Browservercel-labs/agent-browser

agent-browser is a Node-installed browser automation CLI built for AI agents that need dependable programmatic web inter…428k installs·35.5k stars

Lark Imlarksuite/cli

Lark IM is a Larksuite agent skill that exposes Feishu/Lark instant messaging to Claude Code, Cursor, and similar agents…210k installs·13.7k stars

Lark Calendarlarksuite/cli

lark-calendar is an agent skill for Feishu/Lark Calendar v4 exposed via lark-cli. Solo builders and small teams who alre…209k installs·13.7k stars

Lark Sheetslarksuite/cli

Skill for programmatic Feishu spreadsheet and worksheet management—create tables, bulk data IO, lookup, and export—using…209k installs·13.7k stars

Lark Vclarksuite/cli

lark-vc is an agent skill for Feishu/Lark video conferencing history and artifacts through lark-cli. After calls end, so…208k installs·13.7k stars

Lark Contactlarksuite/cli

CLI skill for Lark directory lookup: search employees and fetch metadata by open_id, with clear boundaries vs IM, calend…208k installs·13.7k stars

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

The skill’s headline triggers are verify, validate, test, and QA frontend behavior in a browser—work that solo builders do most often right before or after shipping changes. Testing is the canonical shelf because the documented loop is screenshot-analyze-act for checking whether pages and flows behave correctly, not for ideation or long-term ops monitoring.

Also useful

BuildUI/UX & frontend

Also useful

ValidatePrototype & spike

Where it fits

Example use

ValidatePrototype & spike

Walk a staging landing page with headless Midscene to confirm the signup form submits before you scope the full product.

Example use

BuildUI/UX & frontend

After implementing a settings screen, run one synchronous command at a time to click toggles and capture screenshots for your agent to compare against the spec.

Example use

ShipTesting & QA

Regression-test checkout and navigation paths in headless Puppeteer before you tag a release.

Example use

ShipCode review

Collect page screenshots through Bridge mode attached to Chrome as visual evidence during a pre-launch review.

Example use

GrowSupport & community

Reproduce a customer-reported UI bug in the browser and save screenshots to attach to a fix prompt.

How it compares

Use this vision-first Midscene skill package for screenshot-driven web acts—not a generic Playwright script generator or an MCP browser server you leave running in parallel.

Common Questions / FAQ

Who is browser-automation for?

It is for solo and indie builders using Claude Code, Cursor, or Codex who want their agent to browse, scrape, test, and screenshot real web UIs via Midscene without writing selector-heavy automation.

When should I use browser-automation?

Use it during Validate to click through a prototype landing page, during Build to test what you just shipped to staging, and during Ship to QA forms, navigation, and multi-step flows—with headless Puppeteer or your Chrome over CDP/Bridge.

Is browser-automation safe to install?

It invokes Bash and drives a real browser with network access, so review what URLs and credentials your agent uses; check the Security Audits panel on this Prism page before trusting it in production repos.

SKILL.md

READMESKILL.md - Browser Automation

# Browser Automation

> **CRITICAL RULES — VIOLATIONS WILL BREAK THE WORKFLOW:**
>
> 1. **Never run midscene commands in the background.** Each command must run synchronously so you can read its output (especially screenshots) before deciding the next action. Background execution breaks the screenshot-analyze-act loop.
> 2. **Run only one midscene command at a time.** Wait for the previous command to finish, read the screenshot, then decide the next action. Never chain multiple commands together.
> 3. **Allow enough time for each command to complete.** Midscene commands involve AI inference and screen interaction, which can take longer than typical shell commands. A typical command needs about 1 minute; complex `act` commands may need even longer.
> 4. **Always report task results before finishing.** After completing the automation task, you MUST proactively summarize the results to the user — including key data found, actions completed, screenshots taken, and any relevant findings. Never silently end after the last automation step; the user expects a complete response in a single interaction.

Automate web browsing using `npx -y @midscene/web@1`. By default, launches a headless Chrome via Puppeteer that **persists across CLI calls** — no session loss between commands. Also supports **CDP mode** and **Bridge mode** to connect to an existing Chrome browser.

## What `act` Can Do

Inside a single `act` call in the browser, Midscene can click, right-click, double-click, hover, type or clear text, press keys, scroll, drag, long-press, and continue through multi-step page flows based on what is currently visible. When touch input is enabled, it can also handle swipe- or pinch-style interactions on touch-oriented pages.

## When to Use

This skill has three modes. Choose based on the user's intent:

### Mode Selection Guide

| Mode | When to use | How it works |
|------|------------|-------------|
| **Puppeteer (default)** | User wants to browse a URL, scrape data, test UI — no need for their own browser | Launches a new headless Chrome, isolated from user's browser |
| **CDP mode** | User says "connect to my Chrome", "control my browser", "CDP", "remote debugging", or wants to operate their existing browser. Also use when the task **implicitly requires login state** (e.g., "check my orders", "open my dashboard", "look at my account") | Connects to user's Chrome via DevTools Protocol. Requires remote debugging enabled (`chrome://inspect` > "Allow remote debugging"). No extension needed |
| **Bridge mode** | User explicitly mentions "bridge", "extension", or has Midscene Chrome Extension installed and prefers to use it | Connects to user's Chrome via the Midscene Chrome Extension |

**CDP vs Bridge**: Both control the user's real Chrome with login sessions preserved. CDP only needs a Chrome setting toggle; Bridge needs a Chrome Extension installed. If the user doesn't specify, prefer **CDP mode** as it has fewer prerequisites.

### Precheck: detect available CDP target

Before using CDP mode, run a quick precheck to verify Ch

What is this skill?

Vision-driven Midscene.js automation from screenshots—no DOM or accessibility labels required

Three run modes: default headless Puppeteer, CDP attach, and Bridge mode for an existing Chrome (does not hijack mouse/k

3 critical workflow rules: never run Midscene in the background, one command at a time, allow each command to finish bef

Bash-invoked flows for browse, scrape, forms, clicks, screenshots, and multi-step web workflows

Connect to user Chrome via CDP, DevTools Protocol, or remote debugging when local inspection matters

3 critical workflow rules (no background runs, one command at a time, allow completion)

3 browser connection modes: headless Puppeteer, CDP, and Bridge

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 3.4k installs on skills.sh; 240 GitHub stars; 2/3 security scanners passed (skills.sh audits).

Who is it for?

Solo builders shipping web apps who want agent-run browser QA, scraping, and form workflows using Midscene headless Puppeteer or attached Chrome via CDP/Bridge.

Skip if: Teams that need always-on production monitoring, non-web clients only, or fully unattended parallel browser farms—the skill explicitly forbids background and chained Midscene runs.

What do I get? / Deliverables

Browser screenshots from each synchronous Midscene step

Scraped or extracted web data from completed flows

Documented pass/fail observations from UI verification runs

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

BuildUI/UX & frontend

Also useful

ValidatePrototype & spike

Where it fits

Example use

ValidatePrototype & spike

Walk a staging landing page with headless Midscene to confirm the signup form submits before you scope the full product.

Example use

BuildUI/UX & frontend

After implementing a settings screen, run one synchronous command at a time to click toggles and capture screenshots for your agent to compare against the spec.

Example use

ShipTesting & QA

Regression-test checkout and navigation paths in headless Puppeteer before you tag a release.

Example use

ShipCode review

Collect page screenshots through Bridge mode attached to Chrome as visual evidence during a pre-launch review.

Example use

GrowSupport & community

Reproduce a customer-reported UI bug in the browser and save screenshots to attach to a fix prompt.

SKILL.md

READMESKILL.md - Browser Automation

# Browser Automation

> **CRITICAL RULES — VIOLATIONS WILL BREAK THE WORKFLOW:**
>
> 1. **Never run midscene commands in the background.** Each command must run synchronously so you can read its output (especially screenshots) before deciding the next action. Background execution breaks the screenshot-analyze-act loop.
> 2. **Run only one midscene command at a time.** Wait for the previous command to finish, read the screenshot, then decide the next action. Never chain multiple commands together.
> 3. **Allow enough time for each command to complete.** Midscene commands involve AI inference and screen interaction, which can take longer than typical shell commands. A typical command needs about 1 minute; complex `act` commands may need even longer.
> 4. **Always report task results before finishing.** After completing the automation task, you MUST proactively summarize the results to the user — including key data found, actions completed, screenshots taken, and any relevant findings. Never silently end after the last automation step; the user expects a complete response in a single interaction.

Automate web browsing using `npx -y @midscene/web@1`. By default, launches a headless Chrome via Puppeteer that **persists across CLI calls** — no session loss between commands. Also supports **CDP mode** and **Bridge mode** to connect to an existing Chrome browser.

## What `act` Can Do

Inside a single `act` call in the browser, Midscene can click, right-click, double-click, hover, type or clear text, press keys, scroll, drag, long-press, and continue through multi-step page flows based on what is currently visible. When touch input is enabled, it can also handle swipe- or pinch-style interactions on touch-oriented pages.

## When to Use

This skill has three modes. Choose based on the user's intent:

### Mode Selection Guide

| Mode | When to use | How it works |
|------|------------|-------------|
| **Puppeteer (default)** | User wants to browse a URL, scrape data, test UI — no need for their own browser | Launches a new headless Chrome, isolated from user's browser |
| **CDP mode** | User says "connect to my Chrome", "control my browser", "CDP", "remote debugging", or wants to operate their existing browser. Also use when the task **implicitly requires login state** (e.g., "check my orders", "open my dashboard", "look at my account") | Connects to user's Chrome via DevTools Protocol. Requires remote debugging enabled (`chrome://inspect` > "Allow remote debugging"). No extension needed |
| **Bridge mode** | User explicitly mentions "bridge", "extension", or has Midscene Chrome Extension installed and prefers to use it | Connects to user's Chrome via the Midscene Chrome Extension |

**CDP vs Bridge**: Both control the user's real Chrome with login sessions preserved. CDP only needs a Chrome setting toggle; Bridge needs a Chrome Extension installed. If the user doesn't specify, prefer **CDP mode** as it has fewer prerequisites.

### Precheck: detect available CDP target

Before using CDP mode, run a quick precheck to verify Ch

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is browser-automation for?

When should I use browser-automation?

Is browser-automation safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is browser-automation for?

When should I use browser-automation?

Is browser-automation safe to install?

SKILL.md