Name: Open Autoglm Phone Agent
Author: Aradotso
Install
Terminal · npx
$npx skills add https://github.com/vercel-labs/agent-skills --skill vercel-react-best-practices
Works with Paperclip
How Open Autoglm Phone Agent fits into a Paperclip company.

Open Autoglm Phone Agent drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.
SaaS FactoryPaired
Pre-configured AI company — 18 agents, 18 skills, one-time purchase.
$27$59
Explore pack
Source file
SKILL.md486 linesmarkdown
Expand
1---2name: open-autoglm-phone-agent3description: Expert skill for Open-AutoGLM, an AI phone agent framework that controls Android/HarmonyOS/iOS devices via natural language using the AutoGLM vision-language model4triggers:5  - set up AutoGLM phone agent6  - control android phone with AI7  - automate phone tasks with natural language8  - deploy AutoGLM model for phone automation9  - configure ADB phone agent10  - run phone agent with AutoGLM11  - phone use agent python setup12  - automate mobile device with vision model13---14 15# Open-AutoGLM Phone Agent16 17> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection.18 19Open-AutoGLM is an open-source AI phone agent framework that enables natural language control of Android, HarmonyOS NEXT, and iOS devices. It uses the AutoGLM vision-language model (9B parameters) to perceive screen content and execute multi-step tasks like "open Meituan and search for nearby hot pot restaurants."20 21## Architecture Overview22 23```24User Natural Language → AutoGLM VLM → Screen Perception → ADB/HDC/WebDriverAgent → Device Actions25```26 27- **Model**: AutoGLM-Phone-9B (Chinese-optimized) or AutoGLM-Phone-9B-Multilingual28- **Device control**: ADB (Android), HDC (HarmonyOS NEXT), WebDriverAgent (iOS)29- **Model serving**: vLLM or SGLang (self-hosted) or BigModel/ModelScope API30- **Input**: Screenshot + task description → Output: structured action commands31 32## Installation33 34### Prerequisites35 36- Python 3.10+37- ADB installed and in PATH (Android) or HDC (HarmonyOS) or WebDriverAgent (iOS)38- Android device with Developer Mode + USB Debugging enabled39- ADB Keyboard APK installed on Android device (for text input)40 41### Install the framework42 43```bash44git clone https://github.com/zai-org/Open-AutoGLM.git45cd Open-AutoGLM46pip install -r requirements.txt47pip install -e .48```49 50### Verify ADB connection51 52```bash53# Android54adb devices55# Expected: emulator-5554   device56 57# HarmonyOS NEXT58hdc list targets59# Expected: 7001005458323933328a01bce01c250060```61 62## Model Deployment Options63 64### Option A: Third-party API (Recommended for quick start)65 66**BigModel (ZhipuAI)**67```bash68export BIGMODEL_API_KEY="your-bigmodel-api-key"69python main.py \70  --base-url https://open.bigmodel.cn/api/paas/v4 \71  --model "autoglm-phone" \72  --apikey $BIGMODEL_API_KEY \73  "打开美团搜索附近的火锅店"74```75 76**ModelScope**77```bash78export MODELSCOPE_API_KEY="your-modelscope-api-key"79python main.py \80  --base-url https://api-inference.modelscope.cn/v1 \81  --model "ZhipuAI/AutoGLM-Phone-9B" \82  --apikey $MODELSCOPE_API_KEY \83  "open Meituan and find nearby hotpot"84```85 86### Option B: Self-hosted with vLLM87 88```bash89# Install vLLM (or use official Docker: docker pull vllm/vllm-openai:v0.12.0)90pip install vllm91 92# Start model server (strictly follow these parameters)93python3 -m vllm.entrypoints.openai.api_server \94  --served-model-name autoglm-phone-9b \95  --allowed-local-media-path / \96  --mm-encoder-tp-mode data \97  --mm_processor_cache_type shm \98  --mm_processor_kwargs '{"max_pixels":5000000}' \99  --max-model-len 25480 \100  --chat-template-content-format string \101  --limit-mm-per-prompt '{"image":10}' \102  --model zai-org/AutoGLM-Phone-9B \103  --port 8000104```105 106### Option C: Self-hosted with SGLang107 108```bash109# Install SGLang or use: docker pull lmsysorg/sglang:v0.5.6.post1110# Inside container: pip install nvidia-cudnn-cu12==9.16.0.29111 112python3 -m sglang.launch_server \113  --model-path zai-org/AutoGLM-Phone-9B \114  --served-model-name autoglm-phone-9b \115  --context-length 25480 \116  --mm-enable-dp-encoder \117  --mm-process-config '{"image":{"max_pixels":5000000}}' \118  --port 8000119```120 121### Verify deployment122 123```bash124python scripts/check_deployment_cn.py \125  --base-url http://localhost:8000/v1 \126  --model autoglm-phone-9b127```128 129Expected output includes a `<think>...</think>` block followed by `<answer>do(action="Launch", app="...")`. **If the chain-of-thought is very short or garbled, the model deployment has failed.**130 131## Running the Agent132 133### Basic CLI usage134 135```bash136# Android device (default)137python main.py \138  --base-url http://localhost:8000/v1 \139  --model autoglm-phone-9b \140  "打开小红书搜索美食"141 142# HarmonyOS device143python main.py \144  --base-url http://localhost:8000/v1 \145  --model autoglm-phone-9b \146  --device-type hdc \147  "打开设置查看WiFi"148 149# Multilingual model for English apps150python main.py \151  --base-url http://localhost:8000/v1 \152  --model autoglm-phone-9b-multilingual \153  "Open Instagram and search for travel photos"154```155 156### Key CLI parameters157 158| Parameter | Description | Default |159|-----------|-------------|---------|160| `--base-url` | Model service endpoint | Required |161| `--model` | Model name on server | Required |162| `--apikey` | API key for third-party services | None |163| `--device-type` | `adb` (Android) or `hdc` (HarmonyOS) | `adb` |164| `--device-id` | Specific device serial number | Auto-detect |165 166## Python API Usage167 168### Basic agent invocation169 170```python171from phone_agent import PhoneAgent172from phone_agent.config import AgentConfig173 174config = AgentConfig(175    base_url="http://localhost:8000/v1",176    model="autoglm-phone-9b",177    device_type="adb",  # or "hdc" for HarmonyOS178)179 180agent = PhoneAgent(config)181 182# Run a task183result = agent.run("打开淘宝搜索蓝牙耳机")184print(result)185```186 187### Custom task with device selection188 189```python190from phone_agent import PhoneAgent191from phone_agent.config import AgentConfig192import os193 194config = AgentConfig(195    base_url=os.environ["MODEL_BASE_URL"],196    model=os.environ["MODEL_NAME"],197    apikey=os.environ.get("MODEL_API_KEY"),198    device_type="adb",199    device_id="emulator-5554",  # specific device200)201 202agent = PhoneAgent(config)203 204# Task with sensitive operation confirmation205result = agent.run(206    "在京东购买最便宜的蓝牙耳机",207    confirm_sensitive=True  # prompt user before purchase actions208)209```210 211### Direct model API call (for testing/integration)212 213```python214import openai215import base64216import os217from pathlib import Path218 219client = openai.OpenAI(220    base_url=os.environ["MODEL_BASE_URL"],221    api_key=os.environ.get("MODEL_API_KEY", "dummy"),222)223 224# Load screenshot225screenshot_path = "screenshot.png"226with open(screenshot_path, "rb") as f:227    image_b64 = base64.b64encode(f.read()).decode()228 229response = client.chat.completions.create(230    model="autoglm-phone-9b",231    messages=[232        {233            "role": "user",234            "content": [235                {236                    "type": "image_url",237                    "image_url": {"url": f"data:image/png;base64,{image_b64}"},238                },239                {240                    "type": "text",241                    "text": "Task: 搜索附近的咖啡店\nCurrent step: Navigate to search",242                },243            ],244        }245    ],246)247 248print(response.choices[0].message.content)249# Output format: <think>...</think>\n<answer>do(action="...", ...)250```251 252### Parsing model action output253 254```python255import re256 257def parse_action(model_output: str) -> dict:258    """Parse AutoGLM model output into structured action."""259    # Extract answer block260    answer_match = re.search(r'<answer>(.*?)(?:</answer>|$)', model_output, re.DOTALL)261    if not answer_match:262        return {"action": "unknown"}263    264    answer = answer_match.group(1).strip()265    266    # Parse do() call267    # Format: do(action="ActionName", param1="value1", param2="value2")268    action_match = re.search(r'do\(action="([^"]+)"(.*?)\)', answer, re.DOTALL)269    if not action_match:270        return {"action": "unknown", "raw": answer}271    272    action_name = action_match.group(1)273    params_str = action_match.group(2)274    275    # Parse parameters276    params = {}277    for param_match in re.finditer(r'(\w+)="([^"]*)"', params_str):278        params[param_match.group(1)] = param_match.group(2)279    280    return {"action": action_name, **params}281 282# Example usage283output = '<think>需要启动京东</think>\n<answer>do(action="Launch", app="京东")'284action = parse_action(output)285# {"action": "Launch", "app": "京东"}286```287 288## ADB Device Control Patterns289 290### Common ADB operations used by the agent291 292```python293import subprocess294 295def take_screenshot(device_id: str = None) -> bytes:296    """Capture current device screen."""297    cmd = ["adb"]298    if device_id:299        cmd.extend(["-s", device_id])300    cmd.extend(["exec-out", "screencap", "-p"])301    result = subprocess.run(cmd, capture_output=True)302    return result.stdout303 304def send_tap(x: int, y: int, device_id: str = None):305    """Tap at screen coordinates."""306    cmd = ["adb"]307    if device_id:308        cmd.extend(["-s", device_id])309    cmd.extend(["shell", "input", "tap", str(x), str(y)])310    subprocess.run(cmd)311 312def send_text_adb_keyboard(text: str, device_id: str = None):313    """Send text via ADB Keyboard (must be installed and enabled)."""314    cmd = ["adb"]315    if device_id:316        cmd.extend(["-s", device_id])317    # Enable ADB keyboard first318    cmd_enable = cmd + ["shell", "ime", "set", "com.android.adbkeyboard/.AdbIME"]319    subprocess.run(cmd_enable)320    # Send text321    cmd_text = cmd + ["shell", "am", "broadcast", "-a", "ADB_INPUT_TEXT",322                      "--es", "msg", text]323    subprocess.run(cmd_text)324 325def swipe(x1: int, y1: int, x2: int, y2: int, duration_ms: int = 300, device_id: str = None):326    """Swipe gesture on screen."""327    cmd = ["adb"]328    if device_id:329        cmd.extend(["-s", device_id])330    cmd.extend(["shell", "input", "swipe",331                str(x1), str(y1), str(x2), str(y2), str(duration_ms)])332    subprocess.run(cmd)333 334def press_back(device_id: str = None):335    """Press Android back button."""336    cmd = ["adb"]337    if device_id:338        cmd.extend(["-s", device_id])339    cmd.extend(["shell", "input", "keyevent", "KEYCODE_BACK"])340    subprocess.run(cmd)341 342def launch_app(package_name: str, device_id: str = None):343    """Launch app by package name."""344    cmd = ["adb"]345    if device_id:346        cmd.extend(["-s", device_id])347    cmd.extend(["shell", "monkey", "-p", package_name, "-c",348                "android.intent.category.LAUNCHER", "1"])349    subprocess.run(cmd)350```351 352## Midscene.js Integration353 354For JavaScript/TypeScript automation using AutoGLM:355 356```javascript357// .env configuration358// MIDSCENE_MODEL_NAME=autoglm-phone359// MIDSCENE_OPENAI_BASE_URL=https://open.bigmodel.cn/api/paas/v4360// MIDSCENE_OPENAI_API_KEY=your-api-key361 362import { AndroidAgent } from "@midscene/android";363 364const agent = new AndroidAgent();365await agent.aiAction("打开微信发送消息给张三");366await agent.aiQuery("当前页面显示的消息内容是什么？");367```368 369## Remote ADB (WiFi Debugging)370 371```bash372# Connect device via USB first, then enable TCP/IP mode373adb tcpip 5555374 375# Get device IP address376adb shell ip addr show wlan0377 378# Connect wirelessly (disconnect USB after this)379adb connect 192.168.1.100:5555380 381# Verify connection382adb devices383# 192.168.1.100:5555   device384 385# Use with agent386python main.py \387  --base-url http://model-server:8000/v1 \388  --model autoglm-phone-9b \389  --device-id "192.168.1.100:5555" \390  "打开支付宝查看余额"391```392 393## Common Action Types394 395The AutoGLM model outputs structured actions:396 397| Action | Description | Example |398|--------|-------------|---------|399| `Launch` | Open an app | `do(action="Launch", app="微信")` |400| `Tap` | Tap screen element | `do(action="Tap", element="搜索框")` |401| `Type` | Input text | `do(action="Type", text="火锅")` |402| `Swipe` | Scroll/swipe | `do(action="Swipe", direction="up")` |403| `Back` | Press back button | `do(action="Back")` |404| `Home` | Go to home screen | `do(action="Home")` |405| `Finish` | Task complete | `do(action="Finish", result="已完成搜索")` |406 407## Model Selection Guide408 409| Model | Use Case | Languages |410|-------|----------|-----------|411| `AutoGLM-Phone-9B` | Chinese apps (WeChat, Taobao, Meituan) | Chinese-optimized |412| `AutoGLM-Phone-9B-Multilingual` | International apps, mixed content | Chinese + English + others |413 414- HuggingFace: `zai-org/AutoGLM-Phone-9B` / `zai-org/AutoGLM-Phone-9B-Multilingual`415- ModelScope: `ZhipuAI/AutoGLM-Phone-9B` / `ZhipuAI/AutoGLM-Phone-9B-Multilingual`416 417## Environment Variables Reference418 419```bash420# Model service421export MODEL_BASE_URL="http://localhost:8000/v1"422export MODEL_NAME="autoglm-phone-9b"423export MODEL_API_KEY=""  # Required for BigModel/ModelScope APIs424 425# BigModel API426export BIGMODEL_API_KEY=""427export BIGMODEL_BASE_URL="https://open.bigmodel.cn/api/paas/v4"428 429# ModelScope API430export MODELSCOPE_API_KEY=""431export MODELSCOPE_BASE_URL="https://api-inference.modelscope.cn/v1"432 433# Device configuration434export ADB_DEVICE_ID=""      # Leave empty for auto-detect435export HDC_DEVICE_ID=""      # HarmonyOS device ID436```437 438## Troubleshooting439 440### Model output is garbled or very short chain-of-thought441**Cause**: Incorrect vLLM/SGLang startup parameters.442**Fix**: Ensure `--chat-template-content-format string` (vLLM) and `--mm-process-config` with `max_pixels:5000000` are set. Check transformers version compatibility.443 444### `adb devices` shows no devices445**Fix**: 4461. Verify USB cable supports data transfer (not charge-only)4472. Accept "Allow USB debugging" dialog on phone4483. Try `adb kill-server && adb start-server`4494. Some devices require reboot after enabling developer options450 451### Text input not working on Android452**Fix**: ADB Keyboard must be installed AND enabled:453```bash454adb shell ime enable com.android.adbkeyboard/.AdbIME455adb shell ime set com.android.adbkeyboard/.AdbIME456```457 458### Agent stuck in a loop459**Cause**: Model cannot identify a path to complete the task.460**Fix**: The framework includes sensitive operation confirmation — ensure `confirm_sensitive=True` for purchase/delete tasks. For login/CAPTCHA screens, the agent supports human takeover.461 462### vLLM CUDA out of memory463**Fix**: AutoGLM-Phone-9B requires ~20GB VRAM. Use `--tensor-parallel-size 2` for multi-GPU, or use the API service instead.464 465### Connection refused to model server466**Fix**: Check firewall rules. For remote server:467```bash468# Test connectivity469curl http://YOUR_SERVER_IP:8000/v1/models470# Should return model list JSON471```472 473### HDC device not recognized (HarmonyOS)474**Fix**: HarmonyOS NEXT (not earlier versions) is required. Enable developer mode in Settings → About → Version Number (tap 10 times rapidly).475 476## iOS Setup477 478For iPhone automation, see the dedicated setup guide:479```bash480# After configuring WebDriverAgent per docs/ios_setup/ios_setup.md481python main.py \482  --base-url http://localhost:8000/v1 \483  --model autoglm-phone-9b-multilingual \484  --device-type ios \485  "Open Maps and navigate to Central Park"486```
Related skills
Agency Agents Ai Specialists

Install Agency Agents Ai Specialists skill for Claude Code from aradotso/trending-skills.
Agent Browser Automation

Install Agent Browser Automation skill for Claude Code from aradotso/trending-skills.
Antigravity Manager

Install Antigravity Manager skill for Claude Code from aradotso/trending-skills.