
Android Device Automation
Run vision-driven Android UI checks and end-to-end flows on a real device with natural-language steps instead of brittle selectors.
Install
npx skills add https://github.com/web-infra-dev/midscene-skills --skill android-device-automationWhat is this skill?
- Controls Android via ADB with natural-language Midscene commands—no DOM or accessibility tree required
- Screenshot-analyze-act loop: taps, swipes, text input, app launch, and capture
- Works on any on-screen UI regardless of native, WebView, or hybrid stacks
- Hard gates: one synchronous Midscene command at a time—never background or parallel chains
- Trigger phrases for QA: test on device, visual verification on mobile, check the app on Android
Adoption & trust: 1.9k installs on skills.sh; 240 GitHub stars; 1/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).
Recommended Skills
Agent Browservercel-labs/open-agents
Tddmattpocock/skills
Use My Browserxixu-me/skills
Test Driven Developmentobra/superpowers
Verification Before Completionobra/superpowers
Webapp Testinganthropics/skills
Journey fit
Common Questions / FAQ
Is Android Device Automation safe to install?
skills.sh reports 1 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.
SKILL.md
READMESKILL.md - Android Device Automation
# Android Device Automation > **CRITICAL RULES — VIOLATIONS WILL BREAK THE WORKFLOW:** > > 1. **Never run midscene commands in the background.** Each command must run synchronously so you can read its output (especially screenshots) before deciding the next action. Background execution breaks the screenshot-analyze-act loop. > 2. **Run only one midscene command at a time.** Wait for the previous command to finish, read the screenshot, then decide the next action. Never chain multiple commands together. > 3. **Allow enough time for each command to complete.** Midscene commands involve AI inference and screen interaction, which can take longer than typical shell commands. A typical command needs about 1 minute; complex `act` commands may need even longer. > 4. **Always report task results before finishing.** After completing the automation task, you MUST proactively summarize the results to the user — including key data found, actions completed, screenshots taken, and any relevant findings. Never silently end after the last automation step; the user expects a complete response in a single interaction. Automate Android devices using `npx -y @midscene/android@1`. Each CLI command maps directly to an MCP tool — you (the AI agent) act as the brain, deciding which actions to take based on screenshots. ## What `act` Can Do Inside a single `act` call on Android, Midscene can tap, double-tap, long-press, type, clear text, scroll or swipe in any direction, pull to refresh, drag items, zoom with two fingers, press keys, and use system navigation such as Back, Home, or recent apps while working from the current visible screen. ## Prerequisites Midscene requires models with strong visual grounding capabilities. The following environment variables must be configured — either as system environment variables or in a `.env` file in the current working directory (Midscene loads `.env` automatically): ```bash MIDSCENE_MODEL_API_KEY="your-api-key" MIDSCENE_MODEL_NAME="model-name" MIDSCENE_MODEL_BASE_URL="https://..." MIDSCENE_MODEL_FAMILY="family-identifier" ``` Example: Gemini (Gemini-3-Flash) ```bash MIDSCENE_MODEL_API_KEY="your-google-api-key" MIDSCENE_MODEL_NAME="gemini-3-flash" MIDSCENE_MODEL_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai/" MIDSCENE_MODEL_FAMILY="gemini" ``` Example: Qwen 3.5 ```bash MIDSCENE_MODEL_API_KEY="your-aliyun-api-key" MIDSCENE_MODEL_NAME="qwen3.5-plus" MIDSCENE_MODEL_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1" MIDSCENE_MODEL_FAMILY="qwen3.5" MIDSCENE_MODEL_REASONING_ENABLED="false" # If using OpenRouter, set: # MIDSCENE_MODEL_API_KEY="your-openrouter-api-key" # MIDSCENE_MODEL_NAME="qwen/qwen3.5-plus" # MIDSCENE_MODEL_BASE_URL="https://openrouter.ai/api/v1" ``` Example: Doubao Seed 2.0 Lite ```bash MIDSCENE_MODEL_API_KEY="your-doubao-api-key" MIDSCENE_MODEL_NAME="doubao-seed-2-0-lite" MIDSCENE_MODEL_BASE_URL="https://ark.cn-beijing.volces.com/api/v3" MIDSCENE_MODEL_FAMILY="doubao-seed" ``` Commonly used models: Doubao Seed 2.0 Lite, Qwen 3.5, Zhipu GLM-4.6V, Gemini-3-Pro, Gemini-3-Flash. If the model is not configured, ask the user to set it up. S