
Open Autoglm Phone Agent
Deploy and configure Open-AutoGLM so a vision-language model drives real phone actions on Android, HarmonyOS, or iOS from natural language tasks.
Overview
Open-AutoGLM Phone Agent is an agent skill most often used in Build (also Ship testing) that guides setup of a vision-language phone agent controlling Android, HarmonyOS, or iOS via ADB, HDC, or WebDriverAgent.
Install
npx skills add https://github.com/aradotso/trending-skills --skill open-autoglm-phone-agentWhat is this skill?
- Natural language → AutoGLM-Phone-9B (or Multilingual) → screen perception → device actions
- Android via ADB, HarmonyOS NEXT via HDC, iOS via WebDriverAgent
- Self-host inference with vLLM or SGLang, or remote BigModel/ModelScope APIs
- Multi-step tasks such as opening apps and completing in-app searches from one prompt
- Python 3.10+ agent framework oriented to vision-language phone use
- AutoGLM-Phone-9B vision-language model (9B parameters)
- Device bridges: ADB, HDC, WebDriverAgent
- Python 3.10+ requirement
Adoption & trust: 1.3k installs on skills.sh; 31 GitHub stars; 1/3 security scanners passed (skills.sh audits).
What problem does it solve?
You want to automate real mobile workflows from plain English but writing brittle coordinate-based scripts does not survive UI changes or multi-app journeys.
Who is it for?
Builders prototyping mobile AI agents, scripted demos on devices, or VLM-based smoke tests with Python 3.10+ and working ADB/HDC/WDA tooling.
Skip if: Products that only need Play Store compliance checklists without device control, or teams that cannot run local GPUs/APIs and physical device policies.
When should I use this skill?
Set up AutoGLM phone agent, control Android/HarmonyOS/iOS with AI, automate phone tasks in natural language, deploy AutoGLM for phone automation, configure ADB/HDC agents, or run Python phone-use agents with a vision mod
What do I get? / Deliverables
You have Open-AutoGLM configured with a served AutoGLM model and device bridge so tasks like opening apps and completing searches run from natural language with screenshot-driven actions.
- Configured Open-AutoGLM agent connecting model serving to device control
- Documented device and platform bridge setup (ADB/HDC/WDA)
- Runnable natural-language task examples with screenshot-action loop
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Canonical shelf is Build/agent-tooling because the skill is about standing up the AutoGLM phone agent stack—model serving, device bridges, and Python agent wiring—not one-off manual QA clicks. Agent-tooling captures VLM-driven device control (screenshot in, structured actions out) as reusable automation infrastructure for your product or internal bots.
Where it fits
Connect vLLM-hosted AutoGLM-Phone-9B to ADB so your agent can complete in-app purchase test flows from a single sentence.
Run nightly natural-language smoke tasks on a staging build before store submission.
Replay scripted phone workflows to verify a production hotfix without manual QA repetition.
How it compares
Use for VLM-driven phone automation with Open-AutoGLM—not for desktop-only browser agents or a single MCP tool with no mobile bridge.
Common Questions / FAQ
Who is open-autoglm-phone-agent for?
Indie developers and agent builders automating Android, HarmonyOS NEXT, or iOS devices with Open-AutoGLM and AutoGLM vision-language models.
When should I use open-autoglm-phone-agent?
During Build when wiring phone-use agents; during Ship/testing when running natural-language mobile smoke flows; whenever triggers match set up AutoGLM, control Android with AI, or configure ADB phone agents.
Is open-autoglm-phone-agent safe to install?
Phone agents can access apps, accounts, and network actions on paired devices—review the Security Audits panel on this Prism page, lock down API keys, and use test devices before production accounts.
SKILL.md
READMESKILL.md - Open Autoglm Phone Agent
# Open-AutoGLM Phone Agent > Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection. Open-AutoGLM is an open-source AI phone agent framework that enables natural language control of Android, HarmonyOS NEXT, and iOS devices. It uses the AutoGLM vision-language model (9B parameters) to perceive screen content and execute multi-step tasks like "open Meituan and search for nearby hot pot restaurants." ## Architecture Overview ``` User Natural Language → AutoGLM VLM → Screen Perception → ADB/HDC/WebDriverAgent → Device Actions ``` - **Model**: AutoGLM-Phone-9B (Chinese-optimized) or AutoGLM-Phone-9B-Multilingual - **Device control**: ADB (Android), HDC (HarmonyOS NEXT), WebDriverAgent (iOS) - **Model serving**: vLLM or SGLang (self-hosted) or BigModel/ModelScope API - **Input**: Screenshot + task description → Output: structured action commands ## Installation ### Prerequisites - Python 3.10+ - ADB installed and in PATH (Android) or HDC (HarmonyOS) or WebDriverAgent (iOS) - Android device with Developer Mode + USB Debugging enabled - ADB Keyboard APK installed on Android device (for text input) ### Install the framework ```bash git clone https://github.com/zai-org/Open-AutoGLM.git cd Open-AutoGLM pip install -r requirements.txt pip install -e . ``` ### Verify ADB connection ```bash # Android adb devices # Expected: emulator-5554 device # HarmonyOS NEXT hdc list targets # Expected: 7001005458323933328a01bce01c2500 ``` ## Model Deployment Options ### Option A: Third-party API (Recommended for quick start) **BigModel (ZhipuAI)** ```bash export BIGMODEL_API_KEY="your-bigmodel-api-key" python main.py \ --base-url https://open.bigmodel.cn/api/paas/v4 \ --model "autoglm-phone" \ --apikey $BIGMODEL_API_KEY \ "打开美团搜索附近的火锅店" ``` **ModelScope** ```bash export MODELSCOPE_API_KEY="your-modelscope-api-key" python main.py \ --base-url https://api-inference.modelscope.cn/v1 \ --model "ZhipuAI/AutoGLM-Phone-9B" \ --apikey $MODELSCOPE_API_KEY \ "open Meituan and find nearby hotpot" ``` ### Option B: Self-hosted with vLLM ```bash # Install vLLM (or use official Docker: docker pull vllm/vllm-openai:v0.12.0) pip install vllm # Start model server (strictly follow these parameters) python3 -m vllm.entrypoints.openai.api_server \ --served-model-name autoglm-phone-9b \ --allowed-local-media-path / \ --mm-encoder-tp-mode data \ --mm_processor_cache_type shm \ --mm_processor_kwargs '{"max_pixels":5000000}' \ --max-model-len 25480 \ --chat-template-content-format string \ --limit-mm-per-prompt '{"image":10}' \ --model zai-org/AutoGLM-Phone-9B \ --port 8000 ``` ### Option C: Self-hosted with SGLang ```bash # Install SGLang or use: docker pull lmsysorg/sglang:v0.5.6.post1 # Inside container: pip install nvidia-cudnn-cu12==9.16.0.29 python3 -m sglang.launch_server \ --model-path zai-org/AutoGLM-Phone-9B \ --served-model-name autoglm-phone-9b \ --context-length 25480 \ --mm-enable-dp-encoder \ --mm-process-config '{"image":{"max_pixels":5000000}}' \ --port 8000 ``` ### Verify deployment ```bash python scripts/check_deployment_cn.py \ --base-url http://localhost:8000/v1 \ --model autoglm-phone-9b ``` Expected output includes a `<think>...</think>` block followed by `<answer>do(action="Launch", app="...")`. **If the chain-of-thought is very short or garbled, the model deployment has failed.** ## Running the Agent ### Basic CLI usage ```bash # Android device