
Opencli Browser
Drive a real Chrome session through opencli so your coding agent can inspect pages, fill forms, and complete logged-in flows with structured match envelopes instead of brittle screenshots.
Overview
opencli-browser is an agent skill most often used in Build (also Ship testing, Validate prototype) that drives live Chrome via opencli using session-based commands and structured match envelopes.
Install
npx skills add https://github.com/jackwener/opencli --skill opencli-browserWhat is this skill?
- Selector-first target contract with confidence and stale-ref guidance in every CLI envelope
- Session lifecycle: stable `<session>` names for multi-step flows and parallel isolated sessions
- Compound form fields, network capture, and agent-native structured responses—not human-oriented logs
- Hard prerequisite gate via `opencli doctor` before any browser commands
- Explicit split from adapter authoring—use opencli-adapter-author for reusable `~/.opencli/clis/` packages
Adoption & trust: 11.6k installs on skills.sh; 23.8k GitHub stars; 1/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need your agent to complete a real browser task on a logged-in or dynamic page, but ad-hoc clicking and guessing selectors wastes tokens and breaks on stale DOM refs.
Who is it for?
Solo builders automating one-off or recurring browser workflows (forms, warmups, data extraction) through opencli while keeping the agent on official CLI contracts.
Skip if: Authoring reusable per-site CLI adapters under ~/.opencli/clis/—use opencli-adapter-author instead—or headless-only scraping with no Chrome extension setup.
When should I use this skill?
An agent needs to drive a real Chrome window via opencli to inspect a page, fill forms, click logged-in flows, or extract data ad hoc—not when writing reusable adapters.
What do I get? / Deliverables
After `opencli doctor` passes, your agent runs a named browser session with selector-first actions and reads structured envelopes that say what matched, how confident the match is, and what to retry next.
- Structured opencli browser command envelopes documenting matches and next steps
- Completed multi-step browser session outcomes (forms submitted, data extracted)
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Browser driving is where solo builders wire agents to real SaaS UIs and auth-gated sites during implementation and integration work. Integrations subphase is the canonical shelf for CLI tools that connect agents to external systems—in this case live browser automation rather than REST alone.
Where it fits
Click through a staging signup funnel in a named session to confirm the prototype before you harden backend APIs.
Extract dashboard state from a vendor that has no public API by driving the logged-in UI with network capture.
Re-run a multi-step checkout or settings path in Chrome to verify a release when only the UI exposes the behavior.
How it compares
Task-focused opencli CLI driver for agents, not a generic Playwright skill pack or MCP browser server abstraction.
Common Questions / FAQ
Who is opencli-browser for?
Indie developers and small teams using Claude Code, Cursor, or Codex who already use opencli and want the agent to operate Chrome reliably with structured responses.
When should I use opencli-browser?
During Build integrations when wiring UI-only flows, during Validate prototype checks on a staging site, or during Ship testing when you need a human-paced logged-in path verified in a real window.
Is opencli-browser safe to install?
It grants shell access to opencli and can drive a browser with your logged-in session; review the Security Audits panel on this Prism page and restrict sessions to non-production accounts when possible.
SKILL.md
READMESKILL.md - Opencli Browser
# opencli-browser The first reader of this CLI is an agent, not a human. Every subcommand returns a structured envelope that tells you exactly what matched, how confident the match is, and what to do if it didn't. Lean on those envelopes — do not guess. This skill is for **driving a live browser** to accomplish an agent task. If you are building a reusable adapter under `~/.opencli/clis/<site>/` use `opencli-adapter-author` instead. --- ## Prerequisites ```bash opencli doctor ``` Until `doctor` is green, nothing else will work. Typical failures: Chrome not running, extension not installed, debug port blocked by 1Password / other extensions. The doctor output tells you which. --- ## Session lifecycle - `opencli browser *` commands require a `<session>` positional immediately after `browser`. Use the same session name for a multi-step flow; use a different name to isolate parallel browser work. - Use a stable session name for any multi-command or human-paced browser workflow. Example: `opencli browser fb-yaya-warmup open https://example.com`, then reuse `opencli browser fb-yaya-warmup state`, `extract`, `click`, etc. - Owned browser sessions keep a tab lease alive between calls. Release it with `opencli browser <session> close` or let the idle timeout expire. - `opencli browser <session> bind` binds the Chrome tab you already have open to that session. Use this for logged-in pages, SSO flows, or pages you manually positioned before handing control to the agent. - `--window foreground|background` (or `OPENCLI_WINDOW=foreground|background`) chooses whether OpenCLI creates/focuses a foreground browser window or uses a background browser window for owned sessions. ### Bind Tab ```bash opencli browser gmail bind opencli browser gmail state opencli browser gmail click "Search" opencli browser gmail network opencli browser gmail unbind ``` Binding never owns the user window and never closes the user tab. It fails closed if the tab is closed or becomes non-debuggable. Re-run `opencli browser <session> bind` when you switch to a different real tab. Navigation is allowed on bound sessions because the session now represents explicit agent ownership of that tab. Tab mutation (`tab new`, `tab select`, `tab close`) is still blocked for bound sessions. Use an owned session when you want OpenCLI to manage tab lifecycle. Bound sessions have no OpenCLI idle-close timer; the binding lasts until `unbind`, tab close, window close, or daemon restart. --- ## Mental model 1. **Selector-first target contract.** Every interaction command (`click`, `type`, `select`, `get text/value/attributes`) takes one `<target>`, which is *either* a numeric ref from `state`/`find` *or* a CSS selector. Use `--nth <n>` to disambiguate multiple CSS matches. 2. **Every envelope reports `matches_n` and `match_level`.** `match_level` is `exact`, `stable`, or `reidentified` — the CLI already rescued moderate DOM drift for you, but the level tells you how confident to be. 3. **Compact output first, full payload on demand.** `state` is a budget-aware snapshot; `get html --as json` supports `--depth/--children-max/--text-max`; `network` returns shape previews and you re-fetch a single body with `--detail <key>`. If you emit a giant payload you are burning context you did not need to burn. 4. **Structured errors are machine-readable.** On failure the CLI emits `{error: {code, message, hint?, candidates?}}`. Branch on `code`, not on message strings. --- ## Critical rules 1. **Always inspect before you act.** Ru