
Chrome Bridge Automation
Drive vision-based browser automation in the user's logged-in Chrome for QA, scraping, and multi-step web workflows via Midscene Bridge.
Overview
chrome-bridge-automation is an agent skill most often used in Ship testing (also Build frontend) that runs vision-driven Midscene Bridge automation in the user's Chrome for QA, scraping, and workflows.
Install
npx skills add https://github.com/web-infra-dev/midscene-skills --skill chrome-bridge-automationWhat is this skill?
- Vision-driven automation from screenshots—no DOM or a11y tree required
- Bridge mode uses the user's Chrome via Midscene extension—cookies and sessions preserved
- CDP-based control without taking over mouse or keyboard
- Strict synchronous one-command-at-a-time loop; background runs break screenshot-analyze-act
- Supports navigate, scrape, forms, screenshots, and multi-step web workflows
- Two critical workflow rules: no background midscene commands and only one command at a time
Adoption & trust: 666 installs on skills.sh; 240 GitHub stars; 1/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need to test or automate a web app that only works with real login state or non-DOM UIs, and headless scrapers miss what you see on screen.
Who is it for?
Builders validating freshly shipped UI, authenticated flows, or heterogeneous frontends who already use Chrome with the Midscene extension.
Skip if: Fully offline CLIs, pure API backends with no browser surface, or workflows that require firing many parallel browser jobs at once.
When should I use this skill?
User wants browse, scrape, forms, UI QA, screenshots, or multi-step web automation in their own Chrome via Midscene Bridge.
What do I get? / Deliverables
The agent completes synchronous screenshot-driven steps in your Chrome—navigation, interaction, extraction, or QA—with session state intact.
- Step-by-step browser actions with screenshot evidence
- Extracted page data or completed form workflows
- UI validation outcome from real-session Chrome runs
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Canonical shelf is Ship testing because the skill emphasizes validating and QA-ing UI behavior in a real browser after builds exist. Testing subphase covers frontend verification, screenshots, and automated interaction loops before release confidence.
Where it fits
After implementing a dashboard, step through clicks and screenshots in Bridge mode to confirm layout and errors.
Run a synchronous Midscene sequence to regression-test checkout or settings flows before release.
Extract or verify published marketing pages while staying logged into the CMS in Chrome.
Click through a landing prototype in the user's browser to validate copy and CTA placement visually.
How it compares
Vision + real-user Chrome bridge—not Playwright-only DOM automation or a hosted headless farm.
Common Questions / FAQ
Who is chrome-bridge-automation for?
Solo builders and small teams using Claude Code or similar agents to QA, scrape, or automate web tasks in their own Chrome with Midscene Bridge.
When should I use chrome-bridge-automation?
Use it during Build when validating new frontend work in a real browser, during Ship for UI QA and regression checks, or whenever logged-in scraping and multi-step web automation are required.
Is chrome-bridge-automation safe to install?
It runs Bash-driven Midscene commands against your browser; review the Security Audits panel on this page and avoid automating sensitive accounts without oversight.
SKILL.md
READMESKILL.md - Chrome Bridge Automation
# Chrome Bridge Automation > **CRITICAL RULES — VIOLATIONS WILL BREAK THE WORKFLOW:** > > 1. **Never run midscene commands in the background.** Each command must run synchronously so you can read its output (especially screenshots) before deciding the next action. Background execution breaks the screenshot-analyze-act loop. > 2. **Run only one midscene command at a time.** Wait for the previous command to finish, read the screenshot, then decide the next action. Never chain multiple commands together. > 3. **Allow enough time for each command to complete.** Midscene commands involve AI inference and screen interaction, which can take longer than typical shell commands. A typical command needs about 1 minute; complex `act` commands may need even longer. > 4. **Always report task results before finishing.** After completing the automation task, you MUST proactively summarize the results to the user — including key data found, actions completed, screenshots taken, and any relevant findings. Never silently end after the last automation step; the user expects a complete response in a single interaction. Automate the user's real Chrome browser via the Midscene Chrome Extension (Bridge mode), preserving cookies, sessions, and login state. You (the AI agent) act as the brain, deciding which actions to take based on screenshots. ## What `act` Can Do Inside a single `act` call in Chrome Bridge mode, Midscene can click, right-click, double-click, hover, type or clear text, press keys, scroll, drag, long-press, and continue through multi-step page flows in the user's real Chrome session based on what is currently visible. When touch input is enabled, it can also handle swipe- or pinch-style interactions on touch-oriented pages. ## Command Format **CRITICAL — Every command MUST follow this EXACT format. Do NOT modify the command prefix.** ``` npx @midscene/web@1 --bridge <subcommand> [args] ``` - `--bridge` flag is **MANDATORY** here — it activates Bridge mode to connect to the user's desktop Chrome browser ## Prerequisites The user has already prepared Chrome and the Midscene Extension. Do NOT check browser or extension status before connecting — just connect directly. Midscene requires models with strong visual grounding capabilities. The following environment variables must be configured — either as system environment variables or in a `.env` file in the current working directory (Midscene loads `.env` automatically): ```bash MIDSCENE_MODEL_API_KEY="your-api-key" MIDSCENE_MODEL_NAME="model-name" MIDSCENE_MODEL_BASE_URL="https://..." MIDSCENE_MODEL_FAMILY="family-identifier" ``` Example: Gemini (Gemini-3-Flash) ```bash MIDSCENE_MODEL_API_KEY="your-google-api-key" MIDSCENE_MODEL_NAME="gemini-3-flash" MIDSCENE_MODEL_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai/" MIDSCENE_MODEL_FAMILY="gemini" ``` Example: Qwen