Chrome Bridge Automation

Name: Chrome Bridge Automation
Author: web-infra-dev

web-infra-dev/midscene-skills

Drive vision-based browser automation in the user's logged-in Chrome for QA, scraping, and multi-step web workflows via Midscene Bridge.

Overview

chrome-bridge-automation is an agent skill most often used in Ship testing (also Build frontend) that runs vision-driven Midscene Bridge automation in the user's Chrome for QA, scraping, and workflows.

Install

npx skills add https://github.com/web-infra-dev/midscene-skills --skill chrome-bridge-automation

What is this skill?

Vision-driven automation from screenshots—no DOM or a11y tree required
Bridge mode uses the user's Chrome via Midscene extension—cookies and sessions preserved
CDP-based control without taking over mouse or keyboard
Strict synchronous one-command-at-a-time loop; background runs break screenshot-analyze-act
Supports navigate, scrape, forms, screenshots, and multi-step web workflows
Two critical workflow rules: no background midscene commands and only one command at a time

Compatible agents: Claude Code, Cursor, Codex, Windsurf

Adoption & trust: 666 installs on skills.sh; 240 GitHub stars; 1/3 security scanners passed (skills.sh audits).

What problem does it solve?

You need to test or automate a web app that only works with real login state or non-DOM UIs, and headless scrapers miss what you see on screen.

Who is it for?

Builders validating freshly shipped UI, authenticated flows, or heterogeneous frontends who already use Chrome with the Midscene extension.

Skip if: Fully offline CLIs, pure API backends with no browser surface, or workflows that require firing many parallel browser jobs at once.

When should I use this skill?

User wants browse, scrape, forms, UI QA, screenshots, or multi-step web automation in their own Chrome via Midscene Bridge.

What do I get? / Deliverables

The agent completes synchronous screenshot-driven steps in your Chrome—navigation, interaction, extraction, or QA—with session state intact.

Step-by-step browser actions with screenshot evidence
Extracted page data or completed form workflows
UI validation outcome from real-session Chrome runs

Recommended Skills

Agent Browservercel-labs/open-agents

agent-browser is a Vercel Open Agents skill that wraps a CLI for programmatic browser control—ideal when solo builders n…404k installs·5.6k stars

Tddmattpocock/skills

TDD is an agent skill that coaches test-driven development using the red-green-refactor loop for solo and indie builders…214k installs·121k stars

Use My Browserxixu-me/skills

Use My Browser skill forces agents to classify tasks as static-capable or browser-required before choosing tools—staying…198k installs·61 stars

Test Driven Developmentobra/superpowers

Test-Driven Development is an agent skill from obra/superpowers that forces a test-first implementation ritual: write a …118k installs·221k stars

Verification Before Completionobra/superpowers

Verification Before Completion is an agent skill from the Superpowers lineage that blocks premature success claims durin…100k installs·221k stars

Webapp Testinganthropics/skills

webapp-testing is an agent skill for solo builders who need to prove that a local web application actually works—not jus…90.9k installs·148k stars

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Canonical shelf is Ship testing because the skill emphasizes validating and QA-ing UI behavior in a real browser after builds exist. Testing subphase covers frontend verification, screenshots, and automated interaction loops before release confidence.

Also useful

BuildUI/UX & frontend

Also useful

GrowContent & marketing

Where it fits

Example use

BuildUI/UX & frontend

After implementing a dashboard, step through clicks and screenshots in Bridge mode to confirm layout and errors.

Example use

ShipTesting & QA

Run a synchronous Midscene sequence to regression-test checkout or settings flows before release.

Example use

GrowContent & marketing

Extract or verify published marketing pages while staying logged into the CMS in Chrome.

Example use

ValidatePrototype & spike

Click through a landing prototype in the user's browser to validate copy and CTA placement visually.

How it compares

Vision + real-user Chrome bridge—not Playwright-only DOM automation or a hosted headless farm.

Common Questions / FAQ

Who is chrome-bridge-automation for?

Solo builders and small teams using Claude Code or similar agents to QA, scrape, or automate web tasks in their own Chrome with Midscene Bridge.

When should I use chrome-bridge-automation?

Use it during Build when validating new frontend work in a real browser, during Ship for UI QA and regression checks, or whenever logged-in scraping and multi-step web automation are required.

Is chrome-bridge-automation safe to install?

It runs Bash-driven Midscene commands against your browser; review the Security Audits panel on this page and avoid automating sensitive accounts without oversight.

SKILL.md

READMESKILL.md - Chrome Bridge Automation

# Chrome Bridge Automation

> **CRITICAL RULES — VIOLATIONS WILL BREAK THE WORKFLOW:**
>
> 1. **Never run midscene commands in the background.** Each command must run synchronously so you can read its output (especially screenshots) before deciding the next action. Background execution breaks the screenshot-analyze-act loop.
> 2. **Run only one midscene command at a time.** Wait for the previous command to finish, read the screenshot, then decide the next action. Never chain multiple commands together.
> 3. **Allow enough time for each command to complete.** Midscene commands involve AI inference and screen interaction, which can take longer than typical shell commands. A typical command needs about 1 minute; complex `act` commands may need even longer.
> 4. **Always report task results before finishing.** After completing the automation task, you MUST proactively summarize the results to the user — including key data found, actions completed, screenshots taken, and any relevant findings. Never silently end after the last automation step; the user expects a complete response in a single interaction.

Automate the user's real Chrome browser via the Midscene Chrome Extension (Bridge mode), preserving cookies, sessions, and login state. You (the AI agent) act as the brain, deciding which actions to take based on screenshots.

## What `act` Can Do

Inside a single `act` call in Chrome Bridge mode, Midscene can click, right-click, double-click, hover, type or clear text, press keys, scroll, drag, long-press, and continue through multi-step page flows in the user's real Chrome session based on what is currently visible. When touch input is enabled, it can also handle swipe- or pinch-style interactions on touch-oriented pages.

## Command Format

**CRITICAL — Every command MUST follow this EXACT format. Do NOT modify the command prefix.**

```
npx @midscene/web@1 --bridge <subcommand> [args]
```

- `--bridge` flag is **MANDATORY** here — it activates Bridge mode to connect to the user's desktop Chrome browser

## Prerequisites

The user has already prepared Chrome and the Midscene Extension. Do NOT check browser or extension status before connecting — just connect directly.

Midscene requires models with strong visual grounding capabilities. The following environment variables must be configured — either as system environment variables or in a `.env` file in the current working directory (Midscene loads `.env` automatically):

```bash
MIDSCENE_MODEL_API_KEY="your-api-key"
MIDSCENE_MODEL_NAME="model-name"
MIDSCENE_MODEL_BASE_URL="https://..."
MIDSCENE_MODEL_FAMILY="family-identifier"
```

Example: Gemini (Gemini-3-Flash)

```bash
MIDSCENE_MODEL_API_KEY="your-google-api-key"
MIDSCENE_MODEL_NAME="gemini-3-flash"
MIDSCENE_MODEL_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai/"
MIDSCENE_MODEL_FAMILY="gemini"
```

Example: Qwen

What is this skill?

Vision-driven automation from screenshots—no DOM or a11y tree required

Bridge mode uses the user's Chrome via Midscene extension—cookies and sessions preserved

CDP-based control without taking over mouse or keyboard

Strict synchronous one-command-at-a-time loop; background runs break screenshot-analyze-act

Supports navigate, scrape, forms, screenshots, and multi-step web workflows

Two critical workflow rules: no background midscene commands and only one command at a time

Compatible agents: Claude Code, Cursor, Codex, Windsurf

Adoption & trust: 666 installs on skills.sh; 240 GitHub stars; 1/3 security scanners passed (skills.sh audits).

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

BuildUI/UX & frontend

Also useful

GrowContent & marketing

Where it fits

Example use

BuildUI/UX & frontend

After implementing a dashboard, step through clicks and screenshots in Bridge mode to confirm layout and errors.

Example use

ShipTesting & QA

Run a synchronous Midscene sequence to regression-test checkout or settings flows before release.

Example use

GrowContent & marketing

Extract or verify published marketing pages while staying logged into the CMS in Chrome.

Example use

ValidatePrototype & spike

Click through a landing prototype in the user's browser to validate copy and CTA placement visually.

SKILL.md

READMESKILL.md - Chrome Bridge Automation

# Chrome Bridge Automation

> **CRITICAL RULES — VIOLATIONS WILL BREAK THE WORKFLOW:**
>
> 1. **Never run midscene commands in the background.** Each command must run synchronously so you can read its output (especially screenshots) before deciding the next action. Background execution breaks the screenshot-analyze-act loop.
> 2. **Run only one midscene command at a time.** Wait for the previous command to finish, read the screenshot, then decide the next action. Never chain multiple commands together.
> 3. **Allow enough time for each command to complete.** Midscene commands involve AI inference and screen interaction, which can take longer than typical shell commands. A typical command needs about 1 minute; complex `act` commands may need even longer.
> 4. **Always report task results before finishing.** After completing the automation task, you MUST proactively summarize the results to the user — including key data found, actions completed, screenshots taken, and any relevant findings. Never silently end after the last automation step; the user expects a complete response in a single interaction.

Automate the user's real Chrome browser via the Midscene Chrome Extension (Bridge mode), preserving cookies, sessions, and login state. You (the AI agent) act as the brain, deciding which actions to take based on screenshots.

## What `act` Can Do

Inside a single `act` call in Chrome Bridge mode, Midscene can click, right-click, double-click, hover, type or clear text, press keys, scroll, drag, long-press, and continue through multi-step page flows in the user's real Chrome session based on what is currently visible. When touch input is enabled, it can also handle swipe- or pinch-style interactions on touch-oriented pages.

## Command Format

**CRITICAL — Every command MUST follow this EXACT format. Do NOT modify the command prefix.**

```
npx @midscene/web@1 --bridge <subcommand> [args]
```

- `--bridge` flag is **MANDATORY** here — it activates Bridge mode to connect to the user's desktop Chrome browser

## Prerequisites

The user has already prepared Chrome and the Midscene Extension. Do NOT check browser or extension status before connecting — just connect directly.

Midscene requires models with strong visual grounding capabilities. The following environment variables must be configured — either as system environment variables or in a `.env` file in the current working directory (Midscene loads `.env` automatically):

```bash
MIDSCENE_MODEL_API_KEY="your-api-key"
MIDSCENE_MODEL_NAME="model-name"
MIDSCENE_MODEL_BASE_URL="https://..."
MIDSCENE_MODEL_FAMILY="family-identifier"
```

Example: Gemini (Gemini-3-Flash)

```bash
MIDSCENE_MODEL_API_KEY="your-google-api-key"
MIDSCENE_MODEL_NAME="gemini-3-flash"
MIDSCENE_MODEL_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai/"
MIDSCENE_MODEL_FAMILY="gemini"
```

Example: Qwen

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is chrome-bridge-automation for?

When should I use chrome-bridge-automation?

Is chrome-bridge-automation safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is chrome-bridge-automation for?

When should I use chrome-bridge-automation?

Is chrome-bridge-automation safe to install?

SKILL.md