Computer Control

Name: Computer Control
Author: athola

athola/claude-night-market

Run Claude Computer Use against a real or virtual display when a workflow has no API and you need screenshot-driven GUI automation with explicit opt-in.

Install

npx skills add https://github.com/athola/claude-night-market --skill computer-control

What is this skill?

Three-layer stack: phantom.display (xdotool/scrot), phantom.loop (API conversation cycle), phantom.cli (tasks and readin
Explicit opt-in per inclusive-defaults: screenshots and synthesized input are never default-on
Targets GUI workflows lacking CLI alternatives with visual verification in the loop
Documented NOT FOR cases: prefer CLI/API or Playwright/CDP for browser-only automation
model_hint standard for routine computer-use tasks

Adoption & trust: 1 installs on skills.sh; 304 GitHub stars; 2/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

Recommended Skills

Agent Browservercel-labs/agent-browser

agent-browser is a Node-installed browser automation CLI built for AI agents that need dependable programmatic web inter…428k installs·35.5k stars

Lark Imlarksuite/cli

Lark IM is a Larksuite agent skill that exposes Feishu/Lark instant messaging to Claude Code, Cursor, and similar agents…210k installs·13.7k stars

Lark Calendarlarksuite/cli

lark-calendar is an agent skill for Feishu/Lark Calendar v4 exposed via lark-cli. Solo builders and small teams who alre…209k installs·13.7k stars

Lark Sheetslarksuite/cli

Skill for programmatic Feishu spreadsheet and worksheet management—create tables, bulk data IO, lookup, and export—using…209k installs·13.7k stars

Lark Vclarksuite/cli

lark-vc is an agent skill for Feishu/Lark video conferencing history and artifacts through lark-cli. After calls end, so…208k installs·13.7k stars

Lark Contactlarksuite/cli

CLI skill for Lark directory lookup: search employees and fetch metadata by open_id, with clear boundaries vs IM, calend…208k installs·13.7k stars

Journey fit

Primary fit

BuildAgent skills & templates

Agent tooling is the canonical shelf because the skill wires phantom.display, phantom.loop, and phantom.cli for Computer Use—not a one-off app feature. Computer control extends what coding agents can do on the desktop; it belongs with other agent-runtime capabilities rather than generic frontend or docs work.

Common Questions / FAQ

Is Computer Control safe to install?

skills.sh reports 2 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

SKILL.md

READMESKILL.md - Computer Control

# Computer Control Skill

Use Claude's Computer Use API to see and control desktop
environments through screenshots and mouse/keyboard actions.

## When To Use

- Automating GUI-based workflows that lack CLI alternatives
- Testing web applications through visual interaction
- Filling forms, navigating menus, or interacting with desktop apps
- Building automation pipelines that need visual verification

## When NOT To Use

- Tasks achievable through CLI or API (no GUI needed)
- Browser automation better served by Playwright or CDP

> **Why this stays opt-in.** Per
> [docs/inclusive-defaults.md][inc] (TRUE-exception
> category 4), Computer Use takes screenshots and
> synthesizes keyboard/mouse input: cross-process side
> effects that must always be explicitly invoked, never
> default-on.

[inc]: ../../../../docs/inclusive-defaults.md

## Architecture

The computer use system has three layers:

1. **Display Toolkit** (`phantom.display`) - executes OS-level
   actions via xdotool/scrot on the real or virtual display
2. **Agent Loop** (`phantom.loop`) - manages the conversation
   cycle between Claude API and the display toolkit
3. **CLI** (`phantom.cli`) - command-line interface for running
   tasks or checking environment readiness

```
User Task
    |
    v
Agent Loop  <---->  Claude API (beta)
    |                   |
    v                   v
Display Toolkit    tool_use responses
    |              (click, type, screenshot)
    v
OS Commands (xdotool, scrot)
    |
    v
Display (X11 / Xvfb / WSLg)
```

## Quick Start

### Check environment

```bash
cd plugins/phantom
uv run python -m phantom.cli --check
```

### Run a task

```bash
export ANTHROPIC_API_KEY="sk-ant-..."
uv run python -m phantom.cli "Open Firefox and search for Claude AI"
```

### Use in Python

```python
from phantom.display import DisplayConfig, DisplayToolkit
from phantom.loop import LoopConfig, run_loop

result = run_loop(
    task="Take a screenshot of the desktop",
    api_key="sk-ant-...",
    loop_config=LoopConfig(
        model="claude-sonnet-4-6",
        max_iterations=10,
    ),
    display_config=DisplayConfig(width=1920, height=1080),
)

print(f"Done in {result.iterations} iterations")
print(result.final_text)
```

## API Versions

| Model | Tool Version | Beta Flag |
|-------|-------------|-----------|
| Opus 4.6, Sonnet 4.6, Opus 4.5 | `computer_20251124` | `computer-use-2025-11-24` |
| Sonnet 4.5, Haiku 4.5, older | `computer_20250124` | `computer-use-2025-01-24` |

The `resolve_tool_version()` function handles this mapping
automatically based on the model name.

## Available Actions

**All versions:**
- `screenshot` - capture display
- `left_click` - click at `[x, y]`
- `type` - type text string
- `key` - press key combo (e.g., `ctrl+s`)
- `mouse_move` - move cursor

**Enhanced (20250124+):**
- `scroll` - scroll with direction and amount
- `left_click_drag` - drag between coordinates
- `right_click`, `middle_click`, `double_click`, `triple_click`
- `hold_key` - hold key for duration
- `wait` - pause between actions

**Latest (20251124):**
- `zoom` - inspect screen region at full resolution

## Safety

Computer use carries risks. Follow these guidelines:

1. **Use a sandbox**: Run in Docker or a VM, not your main OS
2. **Limit access**: Do not provide login credentials unless
   necessary, and never for banking or sensitive services
3. **Set iteration caps**: Always use `max_iterations` to
   prevent runaway API costs
4. **Human approval**: For actions with real-world consequences,
   add confirmation callbacks via `on_action`
5. **Close sensitive apps**: Claude sees the full screen via
   screenshots; close anything private before starting

## Environment Requirements

**Linux (native or WSL2 with WSLg):

What is this skill?

Three-layer stack: phantom.display (xdotool/scrot), phantom.loop (API conversation cycle), phantom.cli (tasks and readin

Explicit opt-in per inclusive-defaults: screenshots and synthesized input are never default-on

Targets GUI workflows lacking CLI alternatives with visual verification in the loop

Documented NOT FOR cases: prefer CLI/API or Playwright/CDP for browser-only automation

model_hint standard for routine computer-use tasks

Adoption & trust: 1 installs on skills.sh; 304 GitHub stars; 2/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

Journey fit

Primary fit

BuildAgent skills & templates

SKILL.md

READMESKILL.md - Computer Control

# Computer Control Skill

Use Claude's Computer Use API to see and control desktop
environments through screenshots and mouse/keyboard actions.

## When To Use

- Automating GUI-based workflows that lack CLI alternatives
- Testing web applications through visual interaction
- Filling forms, navigating menus, or interacting with desktop apps
- Building automation pipelines that need visual verification

## When NOT To Use

- Tasks achievable through CLI or API (no GUI needed)
- Browser automation better served by Playwright or CDP

> **Why this stays opt-in.** Per
> [docs/inclusive-defaults.md][inc] (TRUE-exception
> category 4), Computer Use takes screenshots and
> synthesizes keyboard/mouse input: cross-process side
> effects that must always be explicitly invoked, never
> default-on.

[inc]: ../../../../docs/inclusive-defaults.md

## Architecture

The computer use system has three layers:

1. **Display Toolkit** (`phantom.display`) - executes OS-level
   actions via xdotool/scrot on the real or virtual display
2. **Agent Loop** (`phantom.loop`) - manages the conversation
   cycle between Claude API and the display toolkit
3. **CLI** (`phantom.cli`) - command-line interface for running
   tasks or checking environment readiness

```
User Task
    |
    v
Agent Loop  <---->  Claude API (beta)
    |                   |
    v                   v
Display Toolkit    tool_use responses
    |              (click, type, screenshot)
    v
OS Commands (xdotool, scrot)
    |
    v
Display (X11 / Xvfb / WSLg)
```

## Quick Start

### Check environment

```bash
cd plugins/phantom
uv run python -m phantom.cli --check
```

### Run a task

```bash
export ANTHROPIC_API_KEY="sk-ant-..."
uv run python -m phantom.cli "Open Firefox and search for Claude AI"
```

### Use in Python

```python
from phantom.display import DisplayConfig, DisplayToolkit
from phantom.loop import LoopConfig, run_loop

result = run_loop(
    task="Take a screenshot of the desktop",
    api_key="sk-ant-...",
    loop_config=LoopConfig(
        model="claude-sonnet-4-6",
        max_iterations=10,
    ),
    display_config=DisplayConfig(width=1920, height=1080),
)

print(f"Done in {result.iterations} iterations")
print(result.final_text)
```

## API Versions

| Model | Tool Version | Beta Flag |
|-------|-------------|-----------|
| Opus 4.6, Sonnet 4.6, Opus 4.5 | `computer_20251124` | `computer-use-2025-11-24` |
| Sonnet 4.5, Haiku 4.5, older | `computer_20250124` | `computer-use-2025-01-24` |

The `resolve_tool_version()` function handles this mapping
automatically based on the model name.

## Available Actions

**All versions:**
- `screenshot` - capture display
- `left_click` - click at `[x, y]`
- `type` - type text string
- `key` - press key combo (e.g., `ctrl+s`)
- `mouse_move` - move cursor

**Enhanced (20250124+):**
- `scroll` - scroll with direction and amount
- `left_click_drag` - drag between coordinates
- `right_click`, `middle_click`, `double_click`, `triple_click`
- `hold_key` - hold key for duration
- `wait` - pause between actions

**Latest (20251124):**
- `zoom` - inspect screen region at full resolution

## Safety

Computer use carries risks. Follow these guidelines:

1. **Use a sandbox**: Run in Docker or a VM, not your main OS
2. **Limit access**: Do not provide login credentials unless
   necessary, and never for banking or sensitive services
3. **Set iteration caps**: Always use `max_iterations` to
   prevent runaway API costs
4. **Human approval**: For actions with real-world consequences,
   add confirmation callbacks via `on_action`
5. **Close sensitive apps**: Claude sees the full screen via
   screenshots; close anything private before starting

## Environment Requirements

**Linux (native or WSL2 with WSLg):

Install

What is this skill?

Recommended Skills

Journey fit

Is Computer Control safe to install?

SKILL.md

This week for builders

Install

What is this skill?

Recommended Skills

Journey fit

Is Computer Control safe to install?

SKILL.md