
Ios Device Automation
Run vision-driven, natural-language iOS device and simulator checks with Midscene CLI when DOM or accessibility trees are unavailable or unreliable.
Overview
ios-device-automation is an agent skill for the Ship phase that runs vision-driven iOS testing with Midscene CLI one synchronous command at a time.
Install
npx skills add https://github.com/web-infra-dev/midscene-skills --skill ios-device-automationWhat is this skill?
- Vision-driven control via Midscene CLI and WebDriverAgent—no DOM or accessibility labels required
- Hard rules: never background Midscene, never chain commands, wait ~1 minute per step for AI inference
- Triggers cover tap, swipe, navigate, QA, and end-to-end checks on iPhone and iPad
- Works on any on-screen UI regardless of native stack
- Powered by Midscene.js with natural-language act commands
- Typical Midscene command needs about 1 minute to complete
Adoption & trust: 1.8k installs on skills.sh; 240 GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need to QA your iOS app on device or simulator but traditional automation cannot see or target the UI you care about.
Who is it for?
Solo builders doing visual E2E checks on iPhone/iPad when Midscene and WebDriverAgent are already configured.
Skip if: Fast parallel CI suites, Android-only testing, or workflows that require chaining many shell commands without reading screenshots between steps.
When should I use this skill?
iOS, iPhone, iPad, tap/swipe on device, mobile app testing, visual verification, end-to-end test on iOS, QA on iPhone or iPad.
What do I get? / Deliverables
You complete natural-language iOS interactions with screenshot-guided steps so you can verify flows visually before shipping.
- Screenshot-informed sequence of iOS UI actions
- Visual verification outcome for stated natural-language test goals
Recommended Skills
Journey fit
End-to-end visual verification on iPhone/iPad belongs in Ship when you are proving the app works before release, not when you are only scaffolding UI. The skill enforces a screenshot-analyze-act test loop with synchronous Midscene commands—core mobile QA and testing discipline.
How it compares
Vision-loop mobile QA via CLI—not XCTest unit tests or Argent MCP simulator bootstrapping alone.
Common Questions / FAQ
Who is ios-device-automation for?
Indie developers and small teams who want AI agents to operate iOS screens from screenshots and natural language during manual-style QA.
When should I use ios-device-automation?
During ship testing when you need to tap, swipe, navigate, or verify an iOS app on iPhone, iPad, or simulator—and during build integration spikes when validating agent-driven mobile checks.
Is ios-device-automation safe to install?
It allows Bash and drives real device UI; review the Security Audits panel on this Prism page and run only on devices and accounts you control.
SKILL.md
READMESKILL.md - Ios Device Automation
# iOS Device Automation > **CRITICAL RULES — VIOLATIONS WILL BREAK THE WORKFLOW:** > > 1. **Never run midscene commands in the background.** Each command must run synchronously so you can read its output (especially screenshots) before deciding the next action. Background execution breaks the screenshot-analyze-act loop. > 2. **Run only one midscene command at a time.** Wait for the previous command to finish, read the screenshot, then decide the next action. Never chain multiple commands together. > 3. **Allow enough time for each command to complete.** Midscene commands involve AI inference and screen interaction, which can take longer than typical shell commands. A typical command needs about 1 minute; complex `act` commands may need even longer. > 4. **Always report task results before finishing.** After completing the automation task, you MUST proactively summarize the results to the user — including key data found, actions completed, screenshots taken, and any relevant findings. Never silently end after the last automation step; the user expects a complete response in a single interaction. Automate iOS devices using `npx -y @midscene/ios@1`. Each CLI command maps directly to an MCP tool — you (the AI agent) act as the brain, deciding which actions to take based on screenshots. ## What `act` Can Do Inside a single `act` call on iOS, Midscene can tap, double-tap, long-press, type, clear text, scroll, drag items, zoom with two fingers, press keys, and use system navigation such as Home or the app switcher while working from the current visible screen. ## Prerequisites Midscene requires models with strong visual grounding capabilities. The following environment variables must be configured — either as system environment variables or in a `.env` file in the current working directory (Midscene loads `.env` automatically): ```bash MIDSCENE_MODEL_API_KEY="your-api-key" MIDSCENE_MODEL_NAME="model-name" MIDSCENE_MODEL_BASE_URL="https://..." MIDSCENE_MODEL_FAMILY="family-identifier" ``` Example: Gemini (Gemini-3-Flash) ```bash MIDSCENE_MODEL_API_KEY="your-google-api-key" MIDSCENE_MODEL_NAME="gemini-3-flash" MIDSCENE_MODEL_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai/" MIDSCENE_MODEL_FAMILY="gemini" ``` Example: Qwen 3.5 ```bash MIDSCENE_MODEL_API_KEY="your-aliyun-api-key" MIDSCENE_MODEL_NAME="qwen3.5-plus" MIDSCENE_MODEL_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1" MIDSCENE_MODEL_FAMILY="qwen3.5" MIDSCENE_MODEL_REASONING_ENABLED="false" # If using OpenRouter, set: # MIDSCENE_MODEL_API_KEY="your-openrouter-api-key" # MIDSCENE_MODEL_NAME="qwen/qwen3.5-plus" # MIDSCENE_MODEL_BASE_URL="https://openrouter.ai/api/v1" ``` Example: Doubao Seed 2.0 Lite ```bash MIDSCENE_MODEL_API_KEY="your-doubao-api-key" MIDSCENE_MODEL_NAME="doubao-seed-2-0-lite" MIDSCENE_MODEL_BASE_URL="https://ark.cn-beijing.volces.com/api/v3" MIDSCENE_MODEL_FAMILY="doubao-seed" ``` Commonly used models: Doubao Seed 2.0 Lite, Qwen 3.5, Zhipu GLM-4.6V, Gemini-3-Pro, Gemini-3-Flash. If the model is not configured, ask the user to set it up. See [Model Configuration](https://midscenejs.com/model-common-config) for supported providers. ## Commands ### Connect to Device ```bas