
Pyautogui Automation
Drive Windows/macOS desktop apps through a JSON-returning PyAutoGUI CLI for UI checks, WeChat publisher flows, or repetitive click-type tasks without Selenium.
Overview
pyautogui-automation is an agent skill most often used in Ship (also Operate, Build) that runs PyAutoGUI desktop actions via a JSON CLI for screenshots, input, and on-screen image matching.
Install
npx skills add https://github.com/steelan9199/wechat-publisher --skill pyautogui-automationWhat is this skill?
- Eight capability areas: screenshot, mouse, color, keyboard, image recognition, dialogs, system info, wait/pause
- Single entrypoint: python scripts/automation.py <action> with JSON results for agents
- Screenshot modes: full screen, region, and clipboard (pywin32 on Windows)
- Image recognition: locate on screen and wait for appear/disappear
- Keyboard shortcuts: copy, paste, select-all via dedicated actions
- 8 documented capability categories in the feature table (screenshot through wait/pause)
- All operations return JSON from scripts/automation.py
Adoption & trust: 607 installs on skills.sh; 5 GitHub stars; 1/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need to automate a native desktop app or repeat UI steps that web drivers cannot reach, and you want machine-readable results for an agent loop.
Who is it for?
Solo builders automating WeChat Desktop or other GUI tools on Windows/macOS who already run Python in the repo.
Skip if: Headless CI on Linux without a display, cross-platform web apps better served by Playwright, or unattended production bots without human oversight.
When should I use this skill?
When the user needs to automate desktop applications, perform UI testing, or execute repetitive GUI tasks with PyAutoGUI (per skill frontmatter).
What do I get? / Deliverables
Each automation.py invocation returns structured JSON (paths for screenshots, coordinates found, success flags) so you can script verification or publisher workflows step by step.
- PNG screenshots at specified paths or clipboard
- JSON action results including coordinates and recognition status
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Desktop UI automation is canonically shelved under Ship → testing because the skill is triggered for UI tests and repeatable desktop verification before you trust a release. Testing subphase fits screenshot, click, and image-on-screen assertions used to validate desktop clients—not backend unit tests.
Where it fits
Capture a region screenshot after clicking through a desktop publish dialog to confirm the preview loaded.
Replay keyboard shortcuts and clicks to post weekly updates when the web API is unavailable.
Bridge a WeChat publisher pipeline by locating tray icons with locateOnScreen and typing into native fields.
How it compares
Desktop coordinate automation via PyAutoGUI—not browser E2E (Playwright) and not OS-level RPA with vendor IDE lock-in.
Common Questions / FAQ
Who is pyautogui-automation for?
Developers and indie operators who publish or test via desktop clients and want agent-driven PyAutoGUI scripts with JSON output.
When should I use pyautogui-automation?
In Ship testing for UI smoke tests; in Operate iterate when repeating publisher clicks; in Build integrations when scripting desktop-only tools alongside WeChat publisher skills.
Is pyautogui-automation safe to install?
It can control mouse, keyboard, and screen capture on your machine—check Security Audits on this page and run only in controlled environments.
SKILL.md
READMESKILL.md - Pyautogui Automation
# PyAutoGUI 自动化操作 ## 功能概览 | 功能类别 | 支持的操作 | | -------- | ---------------------------------------------------- | | 截图 | 全屏截图、区域截图、截图到剪贴板 | | 鼠标控制 | 点击、双击、移动、相对移动、拖拽、滚动、按下/释放 | | 颜色操作 | 获取像素颜色、查找颜色位置 | | 键盘操作 | 输入文本、按键、组合键、快捷操作(复制/粘贴/全选等) | | 图像识别 | 在屏幕上查找图片位置、等待图片出现/消失 | | 对话框 | 警告、确认、输入对话框 | | 系统信息 | 屏幕分辨率、鼠标位置、窗口信息 | | 工具 | 等待、暂停 | ## 快速开始 ### 基本使用模式 ```bash python scripts/automation.py <action> [参数...] ``` 所有操作返回 JSON 格式结果。 **脚本位置**: [scripts/automation.py](scripts/automation.py) ## 操作详解 ### 截图 ```bash # 全屏截图(自动生成文件名) python scripts/automation.py screenshot # 指定输出路径 python scripts/automation.py screenshot --output my_screenshot.png # 区域截图 python scripts/automation.py screenshot --output region.png --region 100,100,400,300 # 截图到剪贴板(需要安装 pywin32) python scripts/automation.py screenshot_to_clipboard python scripts/automation.py screenshot_to_clipboard --region 100,100,400,300 ``` ### 鼠标点击 ```bash # 左键单击坐标 (100, 200) python scripts/automation.py click --x 100 --y 200 # 右键双击 python scripts/automation.py click --x 100 --y 200 --button right --clicks 2 # 快捷双击 python scripts/automation.py double_click --x 100 --y 200 ``` ### 颜色操作 ```bash # 获取指定坐标的颜色 python scripts/automation.py get_pixel_color --x 100 --y 200 # 返回: {"rgb": [255, 255, 255], "hex": "#ffffff"} # 查找颜色位置(精确匹配) python scripts/automation.py find_color --rgb 255,255,255 # 查找颜色(带容差) python scripts/automation.py find_color --rgb 255,255,255 --tolerance 10 # 在指定区域查找 python scripts/automation.py find_color --rgb 255,0,0 --region 0,0,800,600 ``` ### 鼠标控制 ```bash # 获取当前鼠标位置 python scripts/automation.py get_mouse_position # 移动鼠标(瞬间) python scripts/automation.py move_mouse --x 500 --y 300 # 移动鼠标(动画效果,0.5秒) python scripts/automation.py move_mouse --x 500 --y 300 --duration 0.5 # 相对当前位置移动鼠标 python scripts/automation.py move_mouse_rel --x 100 --y -50 python scripts/automation.py move_mouse_rel --x 100 --y -50 --duration 0.5 # 拖拽鼠标 python scripts/automation.py drag_mouse --x 800 --y 600 --duration 1.0 # 鼠标按下(不释放) python scripts/automation.py mouse_down --button left # 鼠标释放 python scripts/automation.py mouse_up --button left # 滚动(正数向上,负数向下) python scripts/automation.py scroll --amount 500 python scripts/automation.py scroll --amount -500 --x 500 --y 300 ``` ### 屏幕信息 ```bash # 获取屏幕分辨率 python scripts/automation.py get_screen_size # 返回: {"width": 1920, "height": 1080} # 获取当前活动窗口信息(需要安装 pywin32) python scripts/automation.py get_active_window # 返回: {"title": "窗口标题", "left": 100, "top": 100, "width": 800, "height": 600} # 获取所有可见窗口列表(需要安装 pywin32) python scripts/automation.py get_all_windows # 返回: {"count": 5, "windows": [...]} ``` ### 等待 ```bash # 等待 2 秒 python scripts/automation.py sleep --seconds 2 ``` ### 键盘操作 ```bash # 输入文本 python scripts/automation.py type_text --text "Hello World" # 输入文本(带间隔) python scripts/automation.py type_text --text "Hello" --interval 0.1 # 按下按键 python scripts/automation.py press_key --key enter python scripts/automation.py press_key --key esc # 组合键 python scripts/automation.py hotkey --keys ctrl,c python scripts/automation.py hotkey --keys ctrl,shift,esc # 快捷操作 python scripts/automation.py copy # Ctrl+C python scripts/automation.py paste # Ctrl+V python scripts/automation.py cut # Ctrl+X python scripts/automation.py select_all # Ctrl+A python scripts/automation.py undo # Ctrl+Z python scripts/automation.py redo # Ctrl+Y python scripts/automation.py save # Ctrl+S ``` 常用按键名称: `enter`, `esc`, `tab`, `space