
Browser Automation
Let your coding agent drive real browser sessions with Playwright—navigate, click, fill forms, screenshot, and run page JS—instead of brittle built-in browser shortcuts.
Overview
browser-automation is an agent skill most often used in Build (also Validate and Ship) that runs Playwright browser sessions for navigate-click-fill-screenshot workflows via reusable handle functions.
Install
npx skills add https://github.com/sophieguanongit/openclaw-browser-automation --skill browser-automationWhat is this skill?
- Playwright-based navigation, click, fill, type, select, check, upload, hover, scroll, and key press via named handle* fu
- Reuses an existing tab by default and keeps cookies in a dedicated user-data directory for persistent logins
- CDP attach to an open Chrome or launch a fresh Chromium instance
- Screenshot (full page or selector), wait for elements/navigation, get text/value/attributes, and evaluate arbitrary Java
- Documented trigger: prefer this skill over the agent’s built-in browser tool when users ask for 打开网页, 点击, 填表, or 截图
- Documented handler groups cover page ops, interaction, wait/get, and advanced actions (evaluate, upload, hover, scroll).
- Default behavior reuses an existing page instead of opening a new window each invocation.
Adoption & trust: 2.8k installs on skills.sh; 1/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need dependable browser actions with saved login state, but built-in agent browser tools open fresh contexts and break on real forms.
Who is it for?
Solo builders automating repeatable web UI steps, demos, or agent-driven smoke checks against staging sites.
Skip if: Teams that only need a one-off curl to a REST API, or flows that must run entirely inside the IDE without Node/Playwright on disk.
When should I use this skill?
When the user says 打开网页, 点击, 填写表单, 截图, 网页操作, 自动填表, or 浏览器—or any equivalent request to open sites, click, fill forms, or screenshot; prefer this skill over the built-in browser tool.
What do I get? / Deliverables
Your agent reuses a controlled Chromium or CDP Chrome page, completes the requested web actions, and returns content or screenshots from the skill’s handle API.
- Executed browser actions with returned page content, element text/values, or image screenshots
- Persistent session state via the skill’s user-data directory when re-run
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Browser control is wired into the product and agent toolchain during implementation, when you need repeatable web flows and authenticated sessions. Playwright handlers and CDP/Chromium hooks are third-party automation integrations the agent invokes from the repo skill package.
Where it fits
Walk through a competitor signup funnel with handleNavigate and handleFill before you commit to your own UX.
Script admin-console clicks and uploads after deploying your SaaS backend.
Capture handleScreenshot evidence of critical paths on staging ahead of launch.
How it compares
Use this Playwright skill package for persistent sessions and rich DOM ops—not ad-hoc built-in browser snippets with no cookie profile.
Common Questions / FAQ
Who is browser-automation for?
Indie and solo developers using OpenClaw-style agent stacks who want Playwright-grade control when users ask for webpage opens, clicks, forms, or screenshots in chat.
When should I use browser-automation?
During Validate when you prototype a checkout or onboarding flow in a real browser; during Build when you integrate agent-driven ops against admin UIs; during Ship when you capture screenshots or regression evidence before release.
Is browser-automation safe to install?
It runs local Node against real browsers and can execute page JavaScript—review the Security Audits panel on this Prism page and inspect `index.js` before pointing it at production accounts.
SKILL.md
READMESKILL.md - Browser Automation
# 浏览器自动化 Skill 让 OpenClaw 控制浏览器进行自动化操作! ## 特点 - **复用页面**:默认复用现有页面,不会每次都打开新窗口 - **持久化 Cookie**:使用独立的用户数据目录,登录状态持久保存 - **完整操作**:支持点击、填表、截图、执行 JS 等 ## 功能 - 导航到 URL - 点击元素 - 填写表单 - 截图 - 获取页面内容 - 等待元素 - 执行 JavaScript ## 使用时机 当用户说: - "打开 xxx.com" - "点击登录按钮" - "帮我填写这个表单" - "截个图" - "网页上有什么内容" ## 可用函数 调用方式: `cd C:/Users/admin/.openclaw/skills/browser-automation && node -e "const h=require('./index.js'); h.handleXXX({...}).then(console.log)"` ### 页面操作 - `handleNavigate({url})` - 导航到 URL(复用现有页面) - `handleNewPage({url})` - 打开新页面 - `handleScreenshot({selector?, fullPage?})` - 截图 - `handleGetContent({selector?})` - 获取页面内容 - `handleClose()` - 关闭当前页面 ### 交互操作 - `handleClick({selector})` - 点击元素 - `handleFill({selector, value, clear?})` - 填写表单 - `handleType({selector, text, delay?})` - 模拟打字 - `handleSelect({selector, value})` - 下拉选择 - `handleCheck({selector, checked?})` - 勾选/取消勾选 ### 等待和获取 - `handleWait({selector, timeout?})` - 等待元素出现 - `handleWaitForNavigation({timeout?})` - 等待页面跳转 - `handleGetText({selector})` - 获取元素文本 - `handleGetValue({selector})` - 获取表单值 - `handleGetAttribute({selector, attribute})` - 获取属性 ### 高级操作 - `handleEvaluate({script})` - 执行 JavaScript - `handleUpload({selector, filePath})` - 上传文件 - `handlePress({key})` - 按键 - `handleHover({selector})` - 鼠标悬停 - `handleScroll({direction, amount?})` - 滚动页面 ### 状态 - `handleStatus()` - 获取当前浏览器状态 - `handleCloseBrowser()` - 关闭浏览器(下次会重新启动) ## 选择器语法 支持 CSS 选择器和文本选择器: - CSS: `#login-btn`, `.submit-button`, `input[name="email"]` - 文本: `text=登录`, `text=提交` - 组合: `button:has-text("提交")` ## 示例 ``` 用户: 打开 github.com Agent: [调用 handleNavigate({url: 'https://github.com'})] 已打开 GitHub... 用户: 点击登录 Agent: [调用 handleClick({selector: 'text=Sign in'})] 已点击登录... 用户: 填写邮箱 test@example.com Agent: [调用 handleFill({selector: '#login_field', value: 'test@example.com'})] 已填写邮箱... 用户: 截个图看看 Agent: [调用 handleScreenshot()] [返回截图] ``` node_modules/ *.log .DS_Store Thumbs.db { "name": "browser-automation", "version": "1.0.0", "lockfileVersion": 3, "requires": true, "packages": { "": { "name": "browser-automation", "version": "1.0.0", "dependencies": { "playwright": "^1.48.0" } }, "node_modules/fsevents": { "version": "2.3.2", "resolved": "https://registry.npmjs.org/fsevents/-/fsevents-2.3.2.tgz", "integrity": "sha512-xiqMQR4xAeHTuB9uWm+fFRcIOgKBMiOBP+eXiyT7jsgVCq1bkVygt00oASowB7EdtpOHaaPgKt812P9ab+DDKA==", "hasInstallScript": true, "license": "MIT", "optional": true, "os": [ "darwin" ], "engines": { "node": "^8.16.0 || ^10.6.0 || >=11.0.0" } }, "node_modules/playwright": { "version": "1.58.2", "resolved": "https://registry.npmjs.org/playwright/-/playwright-1.58.2.tgz", "integrity": "sha512-vA30H8Nvkq/cPBnNw4Q8TWz1EJyqgpuinBcHET0YVJVFldr8JDNiU9LaWAE1KqSkRYazuaBhTpB5ZzShOezQ6A==", "license": "Apache-2.0", "dependencies": { "playwright-core": "1.58.2" }, "bin": { "playwright": "cli.js" }, "engines": { "node": ">=18" }, "optionalDependencies": { "fsevents": "2.3.2" } }, "node_modules/playwright-core": { "version": "1.58.2", "resolved": "https://registry.npmjs.org/playwright-core/-/playwright-core-1.58.2.tgz", "integrity": "sha512-yZkEtftgwS8CsfYo7nm0KE8jsvm6i/PTgVtB8DL726wNf6H2IMsDuxCpJj59KDaxCtSnrWan2AeDqM7JBaultg==", "license": "Apache-2.0", "bin": { "playwright-core": "cli.js" }, "engines": { "node": ">=18" } } } } # Browser Automation Skill Tools ## 概述 这个 skill 使用 Playwright 控制浏览器进行自动化操作。 **特点**: - 默认复用现有页面,不打开新窗口 - Cookie 和登录状态持久化保存 - 支持所有常见浏览器操作 ## 使用方法 调用方式: `cd C:/Users/admin/.opencla