
Computer Use
Drive local desktop apps through Orca’s computer-use CLI—read UI via accessibility trees, screenshot, click, type, and scroll—when Codex-native desktop tools are not the right surface.
Overview
computer-use is an agent skill for the Build phase that controls local desktop apps via Orca’s computer-use CLI using accessibility trees, screenshots, and guarded UI actions.
Install
npx skills add https://github.com/stablyai/orca --skill computer-useWhat is this skill?
- Uses `orca computer` with `--json` for agent-driven app state and actions
- Lists apps, reads accessibility trees, screenshots (path in JSON), click, type, keys, scroll, drag, set value
- Requires `orca status` and `orca computer capabilities` checks before interaction
- Explicit guardrails: no submit, purchase, delete, or account changes unless the user asked
- Local dev path: `./config/scripts/orca-dev computer` in the Orca worktree
- Documented actions include list apps, get state, click, type, press keys, scroll, drag, and set value
- JSON responses omit inline screenshot bytes; image written to screenshot.path
Adoption & trust: 1.2k installs on skills.sh; 4.4k GitHub stars.
What problem does it solve?
Your agent must interact with native desktop apps but native computer tools or scripts are brittle or lack structured state.
Who is it for?
Solo builders on Orca who need JSON-friendly desktop automation for reading UI and safe clicks or typing during agent workflows.
Skip if: Headless servers without a desktop session, destructive or send/purchase flows without explicit user approval, or tasks better handled by official app APIs.
When should I use this skill?
Triggers include computer use, orca computer, list apps, get app state, read desktop UI, click, type, press key, scroll, drag, or set value on local apps.
What do I get? / Deliverables
Orca returns structured app state and performs only the UI actions the user authorized, with capabilities verified first.
- JSON app state and capability report
- Executed UI actions or screenshot path per request
Recommended Skills
Journey fit
Canonical shelf is Build because it extends how your agent interacts with real desktop software during product and tooling work. Agent-tooling fits Orca as the controlled CLI bridge for app state, capabilities checks, and safe UI actions.
How it compares
Use Orca’s documented computer CLI instead of raw OS automation when you need capability checks and JSON app state.
Common Questions / FAQ
Who is computer-use for?
Developers using Orca locally who want agents to inspect and act on desktop apps through a single CLI surface.
When should I use computer-use?
Use it during Build agent-tooling when triggers include computer use, list apps, get app state, read Slack or Spotify UI, click, type, scroll, or set values—with user-approved scope only.
Is computer-use safe to install?
It can drive real UI actions on your machine; review the Security Audits panel on this page and keep destructive or outbound actions behind explicit user requests.
SKILL.md
READMESKILL.md - Computer Use
# Computer Use Use this skill when the task should operate through Orca's desktop computer-use surface rather than native Codex computer tools, raw AppleScript, ad hoc screenshots, or direct app internals. ## Preconditions - Prefer the public `orca computer ...` command. - In this Orca worktree, use `./config/scripts/orca-dev computer ...` when testing the local dev runtime. - Prefer `--json` for agent-driven calls. Screenshot image bytes are omitted from JSON and written to `screenshot.path` when present. - Do not push, submit forms, send messages, buy items, delete data, or change account settings unless the user explicitly asked for that specific action. - If an app contains sensitive content, read only what the user requested and avoid unnecessary screenshots or logs. Check runtime availability first: ```bash orca status --json orca computer capabilities --json ``` For local development against this worktree: ```bash ./config/scripts/orca-dev status --json ``` ## Core Workflow Use a snapshot-act-snapshot loop: 1. Discover apps: ```bash orca computer list-apps --json ``` 2. Get a fresh state for the target app: ```bash orca computer get-app-state --app com.spotify.client --json ``` 3. Choose an element from that state. 4. Perform one action: ```bash orca computer click --app com.spotify.client --element-index 42 --json ``` 5. Inspect the action result before deciding whether to act again. Actions return a fresh state: ```bash orca computer click --app com.spotify.client --element-index 42 --json ``` Element indexes are scoped to the current app state. They can go stale after navigation, focus changes, scrolling, window changes, or app re-rendering. Never carry indexes across unrelated steps without refreshing state. ## App Selectors Prefer bundle IDs returned by `list-apps`: ```bash orca computer get-app-state --app com.microsoft.edgemac --json orca computer get-app-state --app com.spotify.client --json ``` Names are acceptable when unambiguous: ```bash orca computer get-app-state --app Spotify --json ``` Use `pid:<number>` only when bundle ID or name matching is ambiguous: ```bash orca computer get-app-state --app pid:12345 --json ``` ## Commands ```bash orca computer permissions --json orca computer capabilities --json orca computer list-apps --json orca computer list-windows --app <app> --json orca computer get-app-state --app <app> --json orca computer click --app <app> --element-index <index> --json orca computer perform-secondary-action --app <app> --element-index <index> --action <name> --json orca computer set-value --app <app> --element-index <index> --value "text" --json orca computer type-text --app <app> --text "text" --json orca computer press-key --app <app> --key Return --json orca computer hotkey --app <app> --key CmdOrCtrl+A --json orca computer paste-text --app <app> --text "text" --json orca computer scroll --app <app> (--element-index <index> | --x <x> --y <y>) --direction down --json orca computer drag --app <app> --from-x 100 --from-y 100 --to-x 300 --to-y 300 --json ``` Use `--no-screenshot` only when pixels are not needed. Screenshots are often the only useful signal for Electron, WebView, or canvas-heavy apps with shallow accessibility trees. Coordinates are window-local. Use coordinates from the latest screenshot/state for the same target window. Use `--text-stdin` or `--value-stdin` for sensitive text so payloa