
Scrape Webpage
Let an agent fetch and normalize public webpage content for competitive research, validation copy checks, or build-time integration scaffolding when live HTML is the source of truth.
Overview
Scrape Webpage is an agent skill most often used in Idea (also Validate, Build) that lets coding agents retrieve and work with live public webpage content for research and automation.
Install
npx skills add https://github.com/adobe/skills --skill scrape-webpageWhat is this skill?
- Adobe skills monorepo skill for agent-driven webpage retrieval (Stardust v2.0.0 release line on skills.sh)
- Fits workflows that need structured capture from live URLs rather than manual copy-paste
- Pairs with broader Adobe agent skills catalog for content and experience automation
- Use when research or integration steps require current page state from the open web
- Treat network fetches as explicit side effects—scope URLs and respect site terms
Adoption & trust: 696 installs on skills.sh; 122 GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need accurate text or structure from a live URL and do not want to manually copy HTML or write one-off fetch scripts for every research task.
Who is it for?
Builders running Adobe-oriented agent stacks who repeatedly pull public marketing pages, docs, or listings into planning or content pipelines.
Skip if: Authenticated portals, bulk crawling at scale, or jurisdictions/sites where automated scraping violates terms—use official APIs or licensed data instead.
When should I use this skill?
When an agent workflow needs structured content extracted from a public webpage URL (confirm exact trigger in SKILL.md on install).
What do I get? / Deliverables
Your agent runs a repeatable webpage scrape workflow so research notes, competitive snapshots, or integration inputs reflect what is currently on the page.
- Extracted webpage text or structured capture suitable for agent follow-up steps
- Documented source URL and fetch timestamp for research citations
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
First shelf is Idea because solo builders most often install webpage scraping to research markets and competitors before committing to a build—though the same capability reappears later. Research is where uncached live pages replace guesswork—pricing pages, docs, and landing copy you need to cite or summarize.
Where it fits
Snapshot a competitor pricing page into structured notes before you commit to your own tier model.
Verify a partner’s public docs still describe the API surface you plan to integrate in a prototype.
Pull a reference HTML fragment from a documented public page to seed a parser or content migration script.
How it compares
Skill-packaged fetch workflow, not a self-hosted headless-browser MCP unless you compose one separately.
Common Questions / FAQ
Who is scrape-webpage for?
Solo builders and indie teams using Adobe skills with AI coding agents who need live webpage content during research or lightweight automation.
When should I use scrape-webpage?
In Idea/research for competitor and pricing pages; in Validate/scope when validating claims against live sites; in Build/integrations when a feature needs fresh HTML snippets from allowed public URLs.
Is scrape-webpage safe to install?
It performs network fetches to third-party sites—review the Security Audits panel on this Prism page, restrict URL allowlists in your agent policy, and never point it at credentials or private admin consoles.
SKILL.md
READMESKILL.md - Scrape Webpage
{"extends": "../../../../../release.config.cjs"} # [2.0.0](https://github.com/adobe/skills/compare/scrape-webpage-v1.0.0...scrape-webpage-v2.0.0) (2026-05-14) * feat(stardust)!: sweep — drop doctor, fossils, opt-out flags, live integration ([5d47099](https://github.com/adobe/skills/commit/5d47099881990beebabc039dc35428eb1e4776bb)) ### Bug Fixes * address PR review feedback ([17a795c](https://github.com/adobe/skills/commit/17a795cbc14ef1754cc2abcb0390e5d27e085af9)) * address rombert and abhishekgarg18 review feedback ([26c2c41](https://github.com/adobe/skills/commit/26c2c4164fadf394529693381259a127b1487941)) * **aem-eds:** align references/ with spec and silence orphan warnings ([b48cf80](https://github.com/adobe/skills/commit/b48cf802a0974eccfa74fa85898113d66f29f64d)) * **aem-eds:** register da-auth and ue-component-model in tile ([eb296bc](https://github.com/adobe/skills/commit/eb296bc360222f92821cedb266058dcb77f718a4)) * **aem-workflow:** correct workflow-model-design XML format and step resourceTypes (AEMaaCS) ([#123](https://github.com/adobe/skills/issues/123)) ([1c93e54](https://github.com/adobe/skills/commit/1c93e5418e3b3690f93243b22e1688f7175887a8)) * **appbuilder-project-init:** trim SKILL.md description under 1024-char limit ([f4b8ed5](https://github.com/adobe/skills/commit/f4b8ed5b0dddd7435f6d54ad7c80a8e464ee61b2)) * **marketplace:** rename project-management to aem-project-management in marketplace.json ([0a442dd](https://github.com/adobe/skills/commit/0a442dd550a3cfd502eae7e35e90f75b535b0709)) * remove unsupported user-invocable frontmatter field from aem-rde skill ([01a3493](https://github.com/adobe/skills/commit/01a3493b64ba8a6c8393d6eb2534fbfc7bd509ab)) * rename plugin to 'aem-project-management' ([c4e89d9](https://github.com/adobe/skills/commit/c4e89d98e0431fb8c0c53a835a0253f28cfa4c11)) * **replicate-content:** condense SKILL.md to pass tessl-review ([#119](https://github.com/adobe/skills/issues/119)) ([962fd1d](https://github.com/adobe/skills/commit/962fd1dbee913917a23352e9ea9405f3900d356a)), closes [#117](https://github.com/adobe/skills/issues/117) [#117](https://github.com/adobe/skills/issues/117) * **stardust:** fix tile.json to pass tessl-lint ([#116](https://github.com/adobe/skills/issues/116)) ([eb78c8a](https://github.com/adobe/skills/commit/eb78c8a148b22af98874b1a232ff8b7782d544ab)) * **stardust:** make migrate output genuinely zip-and-deploy portable ([#133](https://github.com/adobe/skills/issues/133)) ([1ab6833](https://github.com/adobe/skills/commit/1ab6833f1edfc87db31020cbe3f3b8b47f8e11fd)) * support second-level plugins in CODEOWNERS ([05a79b2](https://github.com/adobe/skills/commit/05a79b2f69ba05b01f7cf7ace95699e45c73b959)) * update GH username ([d9bbfda](https://github.com/adobe/skills/commit/d9bbfdaef1f68e910340b124ecb5788e396957d4)) ### Features * add Adobe for creativity plugin skeleton ([0ea30a4](https://github.com/adobe/skills/commit/0ea30a4c5643c93b9c62e7903c85e34d90238a02)) * add aem-rde skill under new aem-cloud-service plugin ([b40dffd](https://github.com/adobe/skills/commit/b40dffd4cccd56aeaafc80e34ed7bba941a0c37b)) * adding aem content distribution skill in AEMaaCS skills ([#84](https://github.com/adobe/skills/issues/84)) ([4c49279](https://github.com/adobe/skills/commit/4c492792d8dca63eec6755525074501d03f48127)), closes [#72](https://github.com/adobe/skills/issues/72) [adobe/skills#72](https://github.com/adobe/skills/issues/72) [#76](https://github.com/adobe/skills/issues/76) * **adobe-for-creativity:** add multiple new skills for photo and video editing ([a33c79a](https://github.com/adobe/skills/commit/a33c79a46a6ca1ceff49fd25a8f560254887485d)) * **appbuilder-project-init:** agentic Console project/workspace/API bootstrap ([b7ac7ca](https://github.com/adobe/skills/commit/b7ac7caabea313724c7cf9549919e5b15aae6713)) * **da-auth:** add da-auth skill and surface DA auth in CDD workflow ([#89](https://github.com/adobe/skills/issues/89)) ([fc7a2e8](https://github.com/adobe/ski