Open Autoglm Phone Agent

Name: Open Autoglm Phone Agent
Author: aradotso

aradotso/trending-skills

1.4k installs
66 repo stars
Updated July 9, 2026
aradotso/trending-skills

open-autoglm-phone-agent is an agent skill that expert skill for open-autoglm, an ai phone agent framework that controls android/harmonyos/ios devices via natural language using the autoglm vision-language model.

About

open-autoglm-phone-agent is an agent skill from aradotso/trending-skills that expert skill for open-autoglm, an ai phone agent framework that controls android/harmonyos/ios devices via natural language using the autoglm vision-language model. # Open-AutoGLM Phone Agent > Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection. Open-AutoGLM is an open-source AI phone agent framework that enables natural language control of Android, HarmonyOS NEXT, and iOS devices. It uses the AutoGLM vision-language model (9B parameters) to perceive screen content and execute multi-step tasks Developers invoke open-autoglm-phone-agent during idea/discover work for ai & agent building tasks. The skill documents triggers, prerequisites, and step-by-step workflows grounded in SKILL.md. Compatible with Claude Code, Cursor, and Codex agent runtimes that load marketplace skills.

Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection.
User Natural Language → AutoGLM VLM → Screen Perception → ADB/HDC/WebDriverAgent → Device Actions
Model**: AutoGLM-Phone-9B (Chinese-optimized) or AutoGLM-Phone-9B-Multilingual
Device control**: ADB (Android), HDC (HarmonyOS NEXT), WebDriverAgent (iOS)
Model serving**: vLLM or SGLang (self-hosted) or BigModel/ModelScope API

Open Autoglm Phone Agent by the numbers

1,356 all-time installs (skills.sh)
+8 installs in the week ending Jul 28, 2026 (Skillselion tracking)
Ranked #850 of 16,659 AI & Agent Building skills by installs in the Skillselion catalog
Security screen: MEDIUM risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

At a glance

open-autoglm-phone-agent capabilities & compatibility

Capabilities: skill by [ara.so](https://ara.so) — daily 2026 s · user natural language → autoglm vlm → screen per · model**: autoglm phone 9b (chinese optimized) or · device control**: adb (android), hdc (harmonyos · model serving**: vllm or sglang (self hosted) or
Use cases: orchestration

From the docs

What open-autoglm-phone-agent says it does

> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection.

SKILL.md

User Natural Language → AutoGLM VLM → Screen Perception → ADB/HDC/WebDriverAgent → Device Actions

SKILL.md

- **Model**: AutoGLM-Phone-9B (Chinese-optimized) or AutoGLM-Phone-9B-Multilingual

SKILL.md

npx skills add https://github.com/aradotso/trending-skills --skill open-autoglm-phone-agent

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/aradotso/trending-skills/open-autoglm-phone-agent.svg)](https://skillselion.com/skills/aradotso/trending-skills/open-autoglm-phone-agent)

Installs	1.4k
repo stars	★ 66
Security audit	1 / 3 scanners passed
Last updated	July 9, 2026
Repository	aradotso/trending-skills ↗

What it does

Expert skill for Open-AutoGLM, an AI phone agent framework that controls Android/HarmonyOS/iOS devices via natural language using the AutoGLM vision-language model

Who is it for?

Developers working on ai & agent building during idea tasks.

Skip if: Tasks outside AI & Agent Building scope described in SKILL.md.

When should I use this skill?

Expert skill for Open-AutoGLM, an AI phone agent framework that controls Android/HarmonyOS/iOS devices via natural language using the AutoGLM vision-language model

What you get

Completed ai & agent building workflow aligned with SKILL.md steps.

Configured phone agent
ADB device bridge
Natural-language automation runbook

By the numbers

Supports Android, HarmonyOS, and iOS device control

Files

SKILL.mdMarkdownGitHub ↗

Open-AutoGLM Phone Agent

Skill by ara.so — Daily 2026 Skills collection.

Open-AutoGLM is an open-source AI phone agent framework that enables natural language control of Android, HarmonyOS NEXT, and iOS devices. It uses the AutoGLM vision-language model (9B parameters) to perceive screen content and execute multi-step tasks like "open Meituan and search for nearby hot pot restaurants."

Architecture Overview

User Natural Language → AutoGLM VLM → Screen Perception → ADB/HDC/WebDriverAgent → Device Actions

Model: AutoGLM-Phone-9B (Chinese-optimized) or AutoGLM-Phone-9B-Multilingual
Device control: ADB (Android), HDC (HarmonyOS NEXT), WebDriverAgent (iOS)
Model serving: vLLM or SGLang (self-hosted) or BigModel/ModelScope API
Input: Screenshot + task description → Output: structured action commands

Installation

Prerequisites

Python 3.10+
ADB installed and in PATH (Android) or HDC (HarmonyOS) or WebDriverAgent (iOS)
Android device with Developer Mode + USB Debugging enabled
ADB Keyboard APK installed on Android device (for text input)

Install the framework

git clone https://github.com/zai-org/Open-AutoGLM.git
cd Open-AutoGLM
pip install -r requirements.txt
pip install -e .

Verify ADB connection

# Android
adb devices
# Expected: emulator-5554   device

# HarmonyOS NEXT
hdc list targets
# Expected: 7001005458323933328a01bce01c2500

Model Deployment Options

Option A: Third-party API (Recommended for quick start)

BigModel (ZhipuAI)

export BIGMODEL_API_KEY="your-bigmodel-api-key"
python main.py \
  --base-url https://open.bigmodel.cn/api/paas/v4 \
  --model "autoglm-phone" \
  --apikey $BIGMODEL_API_KEY \
  "打开美团搜索附近的火锅店"

ModelScope

export MODELSCOPE_API_KEY="your-modelscope-api-key"
python main.py \
  --base-url https://api-inference.modelscope.cn/v1 \
  --model "ZhipuAI/AutoGLM-Phone-9B" \
  --apikey $MODELSCOPE_API_KEY \
  "open Meituan and find nearby hotpot"

Option B: Self-hosted with vLLM

# Install vLLM (or use official Docker: docker pull vllm/vllm-openai:v0.12.0)
pip install vllm

# Start model server (strictly follow these parameters)
python3 -m vllm.entrypoints.openai.api_server \
  --served-model-name autoglm-phone-9b \
  --allowed-local-media-path / \
  --mm-encoder-tp-mode data \
  --mm_processor_cache_type shm \
  --mm_processor_kwargs '{"max_pixels":5000000}' \
  --max-model-len 25480 \
  --chat-template-content-format string \
  --limit-mm-per-prompt '{"image":10}' \
  --model zai-org/AutoGLM-Phone-9B \
  --port 8000

Option C: Self-hosted with SGLang

# Install SGLang or use: docker pull lmsysorg/sglang:v0.5.6.post1
# Inside container: pip install nvidia-cudnn-cu12==9.16.0.29

python3 -m sglang.launch_server \
  --model-path zai-org/AutoGLM-Phone-9B \
  --served-model-name autoglm-phone-9b \
  --context-length 25480 \
  --mm-enable-dp-encoder \
  --mm-process-config '{"image":{"max_pixels":5000000}}' \
  --port 8000

Verify deployment

python scripts/check_deployment_cn.py \
  --base-url http://localhost:8000/v1 \
  --model autoglm-phone-9b

Expected output includes a <think>...</think> block followed by <answer>do(action="Launch", app="..."). If the chain-of-thought is very short or garbled, the model deployment has failed.

Running the Agent

Basic CLI usage

# Android device (default)
python main.py \
  --base-url http://localhost:8000/v1 \
  --model autoglm-phone-9b \
  "打开小红书搜索美食"

# HarmonyOS device
python main.py \
  --base-url http://localhost:8000/v1 \
  --model autoglm-phone-9b \
  --device-type hdc \
  "打开设置查看WiFi"

# Multilingual model for English apps
python main.py \
  --base-url http://localhost:8000/v1 \
  --model autoglm-phone-9b-multilingual \
  "Open Instagram and search for travel photos"

Key CLI parameters

Parameter	Description	Default
`--base-url`	Model service endpoint	Required
`--model`	Model name on server	Required
`--apikey`	API key for third-party services	None
`--device-type`	`adb` (Android) or `hdc` (HarmonyOS)	`adb`
`--device-id`	Specific device serial number	Auto-detect

Python API Usage

Basic agent invocation

from phone_agent import PhoneAgent
from phone_agent.config import AgentConfig

config = AgentConfig(
    base_url="http://localhost:8000/v1",
    model="autoglm-phone-9b",
    device_type="adb",  # or "hdc" for HarmonyOS
)

agent = PhoneAgent(config)

# Run a task
result = agent.run("打开淘宝搜索蓝牙耳机")
print(result)

Custom task with device selection

from phone_agent import PhoneAgent
from phone_agent.config import AgentConfig
import os

config = AgentConfig(
    base_url=os.environ["MODEL_BASE_URL"],
    model=os.environ["MODEL_NAME"],
    apikey=os.environ.get("MODEL_API_KEY"),
    device_type="adb",
    device_id="emulator-5554",  # specific device
)

agent = PhoneAgent(config)

# Task with sensitive operation confirmation
result = agent.run(
    "在京东购买最便宜的蓝牙耳机",
    confirm_sensitive=True  # prompt user before purchase actions
)

Direct model API call (for testing/integration)

import openai
import base64
import os
from pathlib import Path

client = openai.OpenAI(
    base_url=os.environ["MODEL_BASE_URL"],
    api_key=os.environ.get("MODEL_API_KEY", "dummy"),
)

# Load screenshot
screenshot_path = "screenshot.png"
with open(screenshot_path, "rb") as f:
    image_b64 = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="autoglm-phone-9b",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{image_b64}"},
                },
                {
                    "type": "text",
                    "text": "Task: 搜索附近的咖啡店\nCurrent step: Navigate to search",
                },
            ],
        }
    ],
)

print(response.choices[0].message.content)
# Output format: <think>...</think>\n<answer>do(action="...", ...)

Parsing model action output

import re

def parse_action(model_output: str) -> dict:
    """Parse AutoGLM model output into structured action."""
    # Extract answer block
    answer_match = re.search(r'<answer>(.*?)(?:</answer>|$)', model_output, re.DOTALL)
    if not answer_match:
        return {"action": "unknown"}
    
    answer = answer_match.group(1).strip()
    
    # Parse do() call
    # Format: do(action="ActionName", param1="value1", param2="value2")
    action_match = re.search(r'do\(action="([^"]+)"(.*?)\)', answer, re.DOTALL)
    if not action_match:
        return {"action": "unknown", "raw": answer}
    
    action_name = action_match.group(1)
    params_str = action_match.group(2)
    
    # Parse parameters
    params = {}
    for param_match in re.finditer(r'(\w+)="([^"]*)"', params_str):
        params[param_match.group(1)] = param_match.group(2)
    
    return {"action": action_name, **params}

# Example usage
output = '<think>需要启动京东</think>\n<answer>do(action="Launch", app="京东")'
action = parse_action(output)
# {"action": "Launch", "app": "京东"}

ADB Device Control Patterns

Common ADB operations used by the agent

import subprocess

def take_screenshot(device_id: str = None) -> bytes:
    """Capture current device screen."""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["exec-out", "screencap", "-p"])
    result = subprocess.run(cmd, capture_output=True)
    return result.stdout

def send_tap(x: int, y: int, device_id: str = None):
    """Tap at screen coordinates."""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["shell", "input", "tap", str(x), str(y)])
    subprocess.run(cmd)

def send_text_adb_keyboard(text: str, device_id: str = None):
    """Send text via ADB Keyboard (must be installed and enabled)."""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    # Enable ADB keyboard first
    cmd_enable = cmd + ["shell", "ime", "set", "com.android.adbkeyboard/.AdbIME"]
    subprocess.run(cmd_enable)
    # Send text
    cmd_text = cmd + ["shell", "am", "broadcast", "-a", "ADB_INPUT_TEXT",
                      "--es", "msg", text]
    subprocess.run(cmd_text)

def swipe(x1: int, y1: int, x2: int, y2: int, duration_ms: int = 300, device_id: str = None):
    """Swipe gesture on screen."""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["shell", "input", "swipe",
                str(x1), str(y1), str(x2), str(y2), str(duration_ms)])
    subprocess.run(cmd)

def press_back(device_id: str = None):
    """Press Android back button."""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["shell", "input", "keyevent", "KEYCODE_BACK"])
    subprocess.run(cmd)

def launch_app(package_name: str, device_id: str = None):
    """Launch app by package name."""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["shell", "monkey", "-p", package_name, "-c",
                "android.intent.category.LAUNCHER", "1"])
    subprocess.run(cmd)

Midscene.js Integration

For JavaScript/TypeScript automation using AutoGLM:

// .env configuration
// MIDSCENE_MODEL_NAME=autoglm-phone
// MIDSCENE_OPENAI_BASE_URL=https://open.bigmodel.cn/api/paas/v4
// MIDSCENE_OPENAI_API_KEY=your-api-key

import { AndroidAgent } from "@midscene/android";

const agent = new AndroidAgent();
await agent.aiAction("打开微信发送消息给张三");
await agent.aiQuery("当前页面显示的消息内容是什么？");

Remote ADB (WiFi Debugging)

# Connect device via USB first, then enable TCP/IP mode
adb tcpip 5555

# Get device IP address
adb shell ip addr show wlan0

# Connect wirelessly (disconnect USB after this)
adb connect 192.168.1.100:5555

# Verify connection
adb devices
# 192.168.1.100:5555   device

# Use with agent
python main.py \
  --base-url http://model-server:8000/v1 \
  --model autoglm-phone-9b \
  --device-id "192.168.1.100:5555" \
  "打开支付宝查看余额"

Common Action Types

The AutoGLM model outputs structured actions:

Action	Description	Example
`Launch`	Open an app	`do(action="Launch", app="微信")`
`Tap`	Tap screen element	`do(action="Tap", element="搜索框")`
`Type`	Input text	`do(action="Type", text="火锅")`
`Swipe`	Scroll/swipe	`do(action="Swipe", direction="up")`
`Back`	Press back button	`do(action="Back")`
`Home`	Go to home screen	`do(action="Home")`
`Finish`	Task complete	`do(action="Finish", result="已完成搜索")`

Model Selection Guide

Model	Use Case	Languages
`AutoGLM-Phone-9B`	Chinese apps (WeChat, Taobao, Meituan)	Chinese-optimized
`AutoGLM-Phone-9B-Multilingual`	International apps, mixed content	Chinese + English + others

HuggingFace: zai-org/AutoGLM-Phone-9B / zai-org/AutoGLM-Phone-9B-Multilingual
ModelScope: ZhipuAI/AutoGLM-Phone-9B / ZhipuAI/AutoGLM-Phone-9B-Multilingual

Environment Variables Reference

# Model service
export MODEL_BASE_URL="http://localhost:8000/v1"
export MODEL_NAME="autoglm-phone-9b"
export MODEL_API_KEY=""  # Required for BigModel/ModelScope APIs

# BigModel API
export BIGMODEL_API_KEY=""
export BIGMODEL_BASE_URL="https://open.bigmodel.cn/api/paas/v4"

# ModelScope API
export MODELSCOPE_API_KEY=""
export MODELSCOPE_BASE_URL="https://api-inference.modelscope.cn/v1"

# Device configuration
export ADB_DEVICE_ID=""      # Leave empty for auto-detect
export HDC_DEVICE_ID=""      # HarmonyOS device ID

Troubleshooting

Model output is garbled or very short chain-of-thought

Cause: Incorrect vLLM/SGLang startup parameters. Fix: Ensure --chat-template-content-format string (vLLM) and --mm-process-config with max_pixels:5000000 are set. Check transformers version compatibility.

`adb devices` shows no devices

Fix: 1. Verify USB cable supports data transfer (not charge-only) 2. Accept "Allow USB debugging" dialog on phone 3. Try adb kill-server && adb start-server 4. Some devices require reboot after enabling developer options

Text input not working on Android

Fix: ADB Keyboard must be installed AND enabled:

adb shell ime enable com.android.adbkeyboard/.AdbIME
adb shell ime set com.android.adbkeyboard/.AdbIME

Agent stuck in a loop

Cause: Model cannot identify a path to complete the task. Fix: The framework includes sensitive operation confirmation — ensure confirm_sensitive=True for purchase/delete tasks. For login/CAPTCHA screens, the agent supports human takeover.

vLLM CUDA out of memory

Fix: AutoGLM-Phone-9B requires ~20GB VRAM. Use --tensor-parallel-size 2 for multi-GPU, or use the API service instead.

Connection refused to model server

Fix: Check firewall rules. For remote server:

# Test connectivity
curl http://YOUR_SERVER_IP:8000/v1/models
# Should return model list JSON

HDC device not recognized (HarmonyOS)

Fix: HarmonyOS NEXT (not earlier versions) is required. Enable developer mode in Settings → About → Version Number (tap 10 times rapidly).

iOS Setup

For iPhone automation, see the dedicated setup guide:

# After configuring WebDriverAgent per docs/ios_setup/ios_setup.md
python main.py \
  --base-url http://localhost:8000/v1 \
  --model autoglm-phone-9b-multilingual \
  --device-type ios \
  "Open Maps and navigate to Central Park"

Related skills

Setup Matt Pocock SkillsScaffold the per-repo configuration that Matt Pocock’s engineering agent skills rely on so they understand the issue tracker, triage labels, and domain documentation la462k185k

Lark Skill MakerQuickly turn any Lark/Feishu OpenAPI call or multi-step workflow into a reusable agent skill with its own SKILL.md.379k15.8k

CavemanSlash token usage by roughly 75% while keeping every technical detail intact when working with Claude Code, Cursor or similar agents.378k92.5k

Lark AppsConnect Claude, Cursor or custom agents directly to Lark (Feishu) for messaging, document automation, approval workflows and enterprise data access.375k

Running Claude Code Via Litellm CopilotRun Claude Code at a fraction of the cost by routing requests through LiteLLM to the GitHub Copilot Chat API.270k72

Codex PetGenerate a complete Codex Pet spritesheet and metadata from one reference image without needing an OpenAI key or Codex Pro.246k8

How it compares

Choose open-autoglm-phone-agent when you need a vision-language agent that drives real phone UI via ADB rather than script-only mobile test frameworks.

FAQ

What does open-autoglm-phone-agent do?

Expert skill for Open-AutoGLM, an AI phone agent framework that controls Android/HarmonyOS/iOS devices via natural language using the AutoGLM vision-language model

When should I use open-autoglm-phone-agent?

During idea discover work for ai & agent building.

Is open-autoglm-phone-agent safe to install?

Review the Security Audits panel on this listing before production use.

AI & Agent Buildingagents

About

Open Autoglm Phone Agent by the numbers

open-autoglm-phone-agent capabilities & compatibility

What open-autoglm-phone-agent says it does

Add your badge

What it does

Who is it for?

When should I use this skill?

What you get

By the numbers

Files

Open-AutoGLM Phone Agent

Architecture Overview

Installation

Prerequisites

Install the framework

Verify ADB connection

Model Deployment Options

Option A: Third-party API (Recommended for quick start)

Option B: Self-hosted with vLLM

Option C: Self-hosted with SGLang

Verify deployment

Running the Agent

Basic CLI usage

Key CLI parameters

Python API Usage

Basic agent invocation

Custom task with device selection

Direct model API call (for testing/integration)

Parsing model action output

ADB Device Control Patterns

Common ADB operations used by the agent

Midscene.js Integration

Remote ADB (WiFi Debugging)

Common Action Types

Model Selection Guide

Environment Variables Reference

Troubleshooting

Model output is garbled or very short chain-of-thought

adb devices shows no devices

Text input not working on Android

Agent stuck in a loop

vLLM CUDA out of memory

Connection refused to model server

HDC device not recognized (HarmonyOS)

iOS Setup

Related skills

How it compares

FAQ

What does open-autoglm-phone-agent do?

When should I use open-autoglm-phone-agent?

Is open-autoglm-phone-agent safe to install?

This week in AI coding

`adb devices` shows no devices