
Harmonyos Device Automation
Run vision-driven, natural-language UI checks on HarmonyOS NEXT phones and tablets over HDC when DOM or accessibility trees are unavailable.
Overview
harmonyos-device-automation is an agent skill for the Ship phase that controls HarmonyOS NEXT devices with Midscene vision AI and HDC using one synchronous command at a time.
Install
npx skills add https://github.com/web-infra-dev/midscene-skills --skill harmonyos-device-automationWhat is this skill?
- Vision-driven automation from screenshots—no DOM or accessibility labels required
- HarmonyOS NEXT control via HDC: tap, swipe, text input, app launch, screenshots
- Strict workflow: one Midscene command at a time, never background execution
- Natural-language commands for QA on 鸿蒙 / Huawei devices
- Powered by Midscene.js inference plus on-device interaction
- 3 critical workflow rules: no background runs, one command at a time, wait for AI+device completion
Adoption & trust: 1.4k installs on skills.sh; 240 GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You cannot reliably automate or QA your HarmonyOS app with DOM-based tools because the UI is vision-only or lacks stable accessibility labels.
Who is it for?
Solo builders with HDC set up who need conversational, screenshot-grounded E2E checks on HarmonyOS NEXT before store or sideload release.
Skip if: Teams without physical Harmony hardware, HDC access, or tolerance for slower AI-driven device steps compared with fast headless unit tests.
When should I use this skill?
User mentions harmony, harmonyos, 鸿蒙, hdc, Huawei device QA, or wants to test or verify an app on HarmonyOS with Midscene.
What do I get? / Deliverables
You complete scripted taps, swipes, inputs, and visual checks on a connected Harmony device with agent-readable screenshots after each synchronous Midscene step.
- Step-by-step device actions with screenshot evidence
- Natural-language-driven test sessions on Harmony hardware
Recommended Skills
Journey fit
Device-level verification belongs in Ship because you are proving the app works on real Harmony hardware before release. Testing is the canonical shelf for synchronous screenshot-analyze-act loops that validate taps, swipes, and flows on production-like builds.
How it compares
Use for vision-first Harmony QA instead of Appium-style selector tests when labels and web views are unreliable.
Common Questions / FAQ
Who is harmonyos-device-automation for?
Indie and small-team mobile developers shipping on HarmonyOS NEXT who want agent-driven device tests without maintaining brittle selector maps.
When should I use harmonyos-device-automation?
Use it in Ship during testing when you need to verify flows on a Huawei phone or tablet, reproduce a bug visually, or run end-to-end checks described in natural language.
Is harmonyos-device-automation safe to install?
Check the Security Audits panel on this page; the skill allows Bash and drives real devices, so review commands and device data exposure before running on production accounts.
SKILL.md
READMESKILL.md - Harmonyos Device Automation
# HarmonyOS Device Automation > **CRITICAL RULES — VIOLATIONS WILL BREAK THE WORKFLOW:** > > 1. **Never run midscene commands in the background.** Each command must run synchronously so you can read its output (especially screenshots) before deciding the next action. Background execution breaks the screenshot-analyze-act loop. > 2. **Run only one midscene command at a time.** Wait for the previous command to finish, read the screenshot, then decide the next action. Never chain multiple commands together. > 3. **Allow enough time for each command to complete.** Midscene commands involve AI inference and screen interaction, which can take longer than typical shell commands. A typical command needs about 1 minute; complex `act` commands may need even longer. Automate HarmonyOS NEXT devices using `npx -y @midscene/harmony@1`. Each CLI command maps directly to an MCP tool — you (the AI agent) act as the brain, deciding which actions to take based on screenshots. ## What `act` Can Do Inside a single `act` call on HarmonyOS, Midscene can tap, double-tap, long-press, type, clear text, scroll, drag items, press keys, and use system navigation such as Back, Home, or recent apps while working from the current visible screen. Two-finger zoom is not available because the underlying HarmonyOS automation layer does not expose multi-touch input. ## Prerequisites Midscene requires models with strong visual grounding capabilities. The following environment variables must be configured — either as system environment variables or in a `.env` file in the current working directory (Midscene loads `.env` automatically): ```bash MIDSCENE_MODEL_API_KEY="your-api-key" MIDSCENE_MODEL_NAME="model-name" MIDSCENE_MODEL_BASE_URL="https://..." MIDSCENE_MODEL_FAMILY="family-identifier" ``` Example: Gemini (Gemini-3-Flash) ```bash MIDSCENE_MODEL_API_KEY="your-google-api-key" MIDSCENE_MODEL_NAME="gemini-3-flash" MIDSCENE_MODEL_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai/" MIDSCENE_MODEL_FAMILY="gemini" ``` Example: Qwen 3.5 ```bash MIDSCENE_MODEL_API_KEY="your-aliyun-api-key" MIDSCENE_MODEL_NAME="qwen3.5-plus" MIDSCENE_MODEL_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1" MIDSCENE_MODEL_FAMILY="qwen3.5" MIDSCENE_MODEL_REASONING_ENABLED="false" # If using OpenRouter, set: # MIDSCENE_MODEL_API_KEY="your-openrouter-api-key" # MIDSCENE_MODEL_NAME="qwen/qwen3.5-plus" # MIDSCENE_MODEL_BASE_URL="https://openrouter.ai/api/v1" ``` Example: Doubao Seed 2.0 Lite ```bash MIDSCENE_MODEL_API_KEY="your-doubao-api-key" MIDSCENE_MODEL_NAME="doubao-seed-2-0-lite" MIDSCENE_MODEL_BASE_URL="https://ark.cn-beijing.volces.com/api/v3" MIDSCENE_MODEL_FAMILY="doubao-seed" ``` Commonly used models: Doubao Seed 2.0 Lite, Qwen 3.5, Zhipu GLM-4.6V, Gemini-3-Pro, Gemini-3-Flash. If the model is not configured, ask the user to set it up. See [Model Configuration](https://midscenejs.com/model-common-config) for supported providers. ## HDC Setup HDC (HarmonyOS Device Connector) must be installed and accessible. Common setup: - Install via [DevEco Studio](https://developer.huawei.com/consumer/cn/deveco-studio/) - Or set `HDC_HOME` environm