
Computer Control
Run Claude Computer Use against a real or virtual display when a workflow has no API and you need screenshot-driven GUI automation with explicit opt-in.
Install
npx skills add https://github.com/athola/claude-night-market --skill computer-controlWhat is this skill?
- Three-layer stack: phantom.display (xdotool/scrot), phantom.loop (API conversation cycle), phantom.cli (tasks and readin
- Explicit opt-in per inclusive-defaults: screenshots and synthesized input are never default-on
- Targets GUI workflows lacking CLI alternatives with visual verification in the loop
- Documented NOT FOR cases: prefer CLI/API or Playwright/CDP for browser-only automation
- model_hint standard for routine computer-use tasks
Adoption & trust: 1 installs on skills.sh; 304 GitHub stars; 2/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).
Recommended Skills
Journey fit
Agent tooling is the canonical shelf because the skill wires phantom.display, phantom.loop, and phantom.cli for Computer Use—not a one-off app feature. Computer control extends what coding agents can do on the desktop; it belongs with other agent-runtime capabilities rather than generic frontend or docs work.
Common Questions / FAQ
Is Computer Control safe to install?
skills.sh reports 2 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.
SKILL.md
READMESKILL.md - Computer Control
# Computer Control Skill Use Claude's Computer Use API to see and control desktop environments through screenshots and mouse/keyboard actions. ## When To Use - Automating GUI-based workflows that lack CLI alternatives - Testing web applications through visual interaction - Filling forms, navigating menus, or interacting with desktop apps - Building automation pipelines that need visual verification ## When NOT To Use - Tasks achievable through CLI or API (no GUI needed) - Browser automation better served by Playwright or CDP > **Why this stays opt-in.** Per > [docs/inclusive-defaults.md][inc] (TRUE-exception > category 4), Computer Use takes screenshots and > synthesizes keyboard/mouse input: cross-process side > effects that must always be explicitly invoked, never > default-on. [inc]: ../../../../docs/inclusive-defaults.md ## Architecture The computer use system has three layers: 1. **Display Toolkit** (`phantom.display`) - executes OS-level actions via xdotool/scrot on the real or virtual display 2. **Agent Loop** (`phantom.loop`) - manages the conversation cycle between Claude API and the display toolkit 3. **CLI** (`phantom.cli`) - command-line interface for running tasks or checking environment readiness ``` User Task | v Agent Loop <----> Claude API (beta) | | v v Display Toolkit tool_use responses | (click, type, screenshot) v OS Commands (xdotool, scrot) | v Display (X11 / Xvfb / WSLg) ``` ## Quick Start ### Check environment ```bash cd plugins/phantom uv run python -m phantom.cli --check ``` ### Run a task ```bash export ANTHROPIC_API_KEY="sk-ant-..." uv run python -m phantom.cli "Open Firefox and search for Claude AI" ``` ### Use in Python ```python from phantom.display import DisplayConfig, DisplayToolkit from phantom.loop import LoopConfig, run_loop result = run_loop( task="Take a screenshot of the desktop", api_key="sk-ant-...", loop_config=LoopConfig( model="claude-sonnet-4-6", max_iterations=10, ), display_config=DisplayConfig(width=1920, height=1080), ) print(f"Done in {result.iterations} iterations") print(result.final_text) ``` ## API Versions | Model | Tool Version | Beta Flag | |-------|-------------|-----------| | Opus 4.6, Sonnet 4.6, Opus 4.5 | `computer_20251124` | `computer-use-2025-11-24` | | Sonnet 4.5, Haiku 4.5, older | `computer_20250124` | `computer-use-2025-01-24` | The `resolve_tool_version()` function handles this mapping automatically based on the model name. ## Available Actions **All versions:** - `screenshot` - capture display - `left_click` - click at `[x, y]` - `type` - type text string - `key` - press key combo (e.g., `ctrl+s`) - `mouse_move` - move cursor **Enhanced (20250124+):** - `scroll` - scroll with direction and amount - `left_click_drag` - drag between coordinates - `right_click`, `middle_click`, `double_click`, `triple_click` - `hold_key` - hold key for duration - `wait` - pause between actions **Latest (20251124):** - `zoom` - inspect screen region at full resolution ## Safety Computer use carries risks. Follow these guidelines: 1. **Use a sandbox**: Run in Docker or a VM, not your main OS 2. **Limit access**: Do not provide login credentials unless necessary, and never for banking or sensitive services 3. **Set iteration caps**: Always use `max_iterations` to prevent runaway API costs 4. **Human approval**: For actions with real-world consequences, add confirmation callbacks via `on_action` 5. **Close sensitive apps**: Claude sees the full screen via screenshots; close anything private before starting ## Environment Requirements **Linux (native or WSL2 with WSLg):