
web-infra-dev/midscene-skills
8 skills13.9k installs1.9k starsGitHub
Install
npx skills add https://github.com/web-infra-dev/midscene-skillsSkills in this repo
1Browser Automationbrowser-automation is an agent skill package that wires Midscene.js vision-guided browser control into Claude Code and similar agents via Bash. It targets solo and indie builders who need real-browser checks—opening pages, scraping structured data, filling forms, clicking through flows, and capturing screenshots—without maintaining fragile CSS or accessibility selectors. Default operation uses headless Puppeteer so the agent does not take over the user’s mouse or keyboard; CDP and Bridge modes let you attach to an already-running Chrome when you want to debug against a familiar profile. The SKILL.md enforces a strict synchronous loop: run one Midscene command, read the screenshot output, then decide the next action—background or chained runs break the workflow. Use it when you have just built a frontend and want to see if it works, when you need repeatable web QA before launch, or when you want agent-driven site automation during build and validate prototypes on staging URLs.3.4kinstalls2Desktop Computer AutomationDesktop Computer Automation is a Midscene-powered agent skill for controlling real desktops with natural language while operating entirely from screenshots. Solo builders shipping Electron, Qt, or other native shells often hit a wall when Playwright-style browser tools cannot see the OS chrome, file dialogs, or multi-window flows. This skill closes that gap on macOS, Windows, and Linux, and extends to remote Windows hosts over RDP when you need to exercise a server-side desktop environment. The documentation is explicit that local mode captures the user’s actual mouse and keyboard, so you should treat sessions as interactive test runs, not silent CI unless you have isolated VMs. For web-only products, the skill itself tells you to prefer browser automation instead. Powered by Midscene.js, it fits agents that already run Bash and can follow strict sequencing: wait for each command, read screenshot output, then decide the next action. Use it when you must validate desktop-native UX before Ship or automate repetitive desktop workflows during Build integrations.2.9kinstalls3Android Device AutomationAndroid Device Automation is a Midscene-powered agent skill for solo builders and small teams who need to validate mobile apps on hardware without maintaining fragile automation IDs. It drives the phone through ADB, interprets screenshots with vision models, and executes steps you describe in plain language—open apps, tap controls, scroll, type, and confirm what users actually see. That makes it practical for indie apps, internal dogfooding, and pre-ship smoke tests when you do not have a full Appium suite. The workflow is intentionally sequential: each command runs to completion, you read the latest screenshot, then choose the next action—parallel or background runs break the loop. Install when you ship Android and want an agent to exercise flows, install builds, or visually verify fixes on a physical device or emulator.1.9kinstalls4Ios Device Automationios-device-automation teaches agents to control real iOS devices and simulators using Midscene’s vision-first CLI rather than brittle selectors. Solo builders use it when they need to tap, swipe, or verify flows from plain language while reading each screenshot before the next action. The skill stresses operational discipline: one synchronous command at a time, no background jobs, and patience for AI-heavy steps so the analyze-act loop stays coherent. That makes it suitable for indie QA on hybrid or custom UI where accessibility trees fail. It pairs with WebDriverAgent-backed setups and Bash invocation from Claude Code, Cursor, or Codex. Use after simulators or devices are reachable; it is not a project bootstrap skill. Expect iterative human-or-agent judgment between commands rather than fire-and-forget test scripts.1.8kinstalls5Harmonyos Device Automationharmonyos-device-automation teaches your coding agent to operate HarmonyOS NEXT devices through Midscene and HDC using only what appears on screen. Indie mobile builders shipping Huawei ecosystem builds use it when stack-specific test IDs are missing or hybrid UIs break traditional selectors. The skill encodes non-negotiable rules: run each midscene command synchronously, never parallelize, and always read the latest screenshot before the next action—preserving the screenshot-analyze-act loop Midscene depends on. Triggers cover harmony, hdc, 鸿蒙, and plain-language QA requests like verifying an app on a Huawei tablet. It complements unit tests by offering end-to-end visual verification on real hardware during ship prep.1.4kinstalls6Vitest Midscene E2eVitest Midscene E2E troubleshooting is an agent skill for solo builders shipping browser-tested SaaS or agent-heavy apps with Midscene’s natural-language test actions on top of Vitest. The bundled guidance targets the most common Ship-phase failures: tests that time out at the default 180 seconds, assertions that run before the app finishes loading, and vague aiAct descriptions that hit the wrong control. You install it when AI E2E is green locally but flaky in CI, or when migrating from selector-based tests without rewriting your mental model for waits and prompts. It matters because Midscene shifts failure modes from missing CSS selectors to timing and language precision—fixing those early avoids false negatives that block releases. Pair it with your existing Vitest config and Midscene agent setup; this slice focuses on operational debug patterns rather than initial scaffold.1.2kinstalls7Computer AutomationComputer-automation is an agent skill that wires Midscene.js into desktop workflows so solo builders can click, type, launch apps, and verify windows using natural language and screen captures. It targets native macOS, Windows, and Linux surfaces and remote Windows hosts over RDP—any UI visible on screen regardless of stack. The workflow is intentionally fragile if violated: each Midscene command must run synchronously, one at a time, so the agent can read screenshot output before the next action. SKILL.md warns that local mode takes over the user’s real mouse and keyboard, so it is a poor default for ordinary web testing. Install it when you need to exercise Electron, Qt, or other desktop-only clients, or to operate a headless Windows server via RDP where browser automation cannot reach the UI.745installs8Chrome Bridge Automationchrome-bridge-automation teaches agents to run Midscene.js Bridge mode against the builder's real desktop Chrome. Instead of headless DOM scripts, it reasons over screenshots so canvas, shadow DOM, and unconventional UIs remain reachable. Sessions stay authentic because the extension bridges DevTools Protocol while leaving local input alone. The workflow is deliberately sequential: one midscene command, read output (especially screenshots), then decide the next action—parallel or background runs violate the skill's critical rules. Solo builders use it right after implementing a web feature to sanity-check UI, logged-in flows, data extraction, or regression paths without rebuilding auth fixtures in a sterile browser.666installs