
Web Crawler
Scrape JS-heavy pages and pull public social posts, transcripts, and engagement data when plain web fetch fails.
Overview
web-crawler is an agent skill most often used in Idea (also Grow, Build) that scrapes blocked pages and extracts public social and transcript data.
Install
npx skills add https://github.com/starchild-ai-agent/official-skills --skill web-crawlerWhat is this skill?
- All-in-one router picks scraping backend automatically—user does not choose APIs
- Social coverage: YouTube transcripts, TikTok, Instagram, LinkedIn, Reddit, Threads, and ad-library style sources per SKI
- Falls back when native web_fetch hits boilerplate or bot blocks
- Requires Python binary; version 2.2.1 starchild metadata
- Scoped paid fallbacks—prefer free fetch first for simple pages
- skill version 2.2.1
- 15+ social and platform tags in metadata
Adoption & trust: 2.1k installs on skills.sh; 13 GitHub stars; 1/3 security scanners passed (skills.sh audits).
What problem does it solve?
Plain agent fetch returns empty or blocked HTML and you still need transcripts, threads, or profile posts for research or content.
Who is it for?
Builders validating niches with Reddit/YouTube signal or repurposing public transcripts when sites block simple HTTP fetch.
Skip if: Logged-in private data, bulk scraping without rate limits, or jurisdictions where scraping those targets is prohibited.
When should I use this skill?
web_fetch fails or user asks to scrape social profiles, posts, comments, transcripts, ads, trending content, or engagement from listed platforms.
What do I get? / Deliverables
You get structured extracted text or social records routed through the right scraper backend after Python-backed execution.
- Extracted page or post text/structured fields
- Transcripts or comment dumps scoped to the request
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Public data extraction most often starts in idea research before you commit to a niche or content angle. Routes YouTube, TikTok, Reddit, LinkedIn, and more when you need primary-source market or audience signals.
Where it fits
Pull Reddit and YouTube discussions to validate pain points in a niche.
Sample public creator posts to see language and hooks your ICP uses.
Harvest transcripts and threads to draft newsletters or short-form scripts.
Prototype an ingestion step that feeds a local dataset from blocked marketing sites.
How it compares
Skill-packaged scraping router—not a self-hosted Playwright MCP you operate separately.
Common Questions / FAQ
Who is web-crawler for?
Solo founders and content operators using Claude Code or Cursor who need social and web extraction inside the agent workflow.
When should I use web-crawler?
In idea research for audience threads, in grow for content mining, or in build integrations when prototyping pipelines that ingest public pages.
Is web-crawler safe to install?
It runs Python and may call paid scrape APIs; review Security Audits on this page and your platform ToS before crawling third-party sites.
SKILL.md
READMESKILL.md - Web Crawler
# Web crawler All-in-one scraping skill. Routes requests to the best backend automatically — the user does not need to know which API is used. Use this when: - Normal `web_fetch` fails, returns boilerplate, or a site blocks basic fetching - The user asks for YouTube video content/transcript - The user wants to scrape, fetch, or extract data from any social media platform - The user mentions social media profiles, posts, comments, transcripts, ads, trending content, or engagement metrics Prefer native `web_fetch` first for simple pages; paid fallback calls should be deliberate and scoped. ## What each service is for ### ScrapeCreators — Social media data extraction (27+ platforms) Use for any request involving social media profiles, posts, videos, comments, transcripts, search, ads, trending content, or engagement metrics. Covers TikTok, Instagram, YouTube, LinkedIn, Facebook, Twitter/X, Reddit, Threads, Bluesky, Pinterest, Snapchat, Twitch, Kick, Truth Social, TikTok Shop, Google search, and link-in-bio services (Linktree, Komi, Pillar, Linkbio, Linkme, Amazon Shop). **Base URL:** `https://api.scrapecreators.com` **Auth:** No user-supplied key needed. sc-proxy injects platform credentials automatically — just send the request. The `x-api-key` header can be any value or omitted entirely. Do NOT bail out or ask the user for a key if `$SCRAPECREATORS_API_KEY` looks unset; that env var is intentionally not required. **Method:** All endpoints use GET requests with query params. Responses are JSON. ### Firecrawl — Fallback web page scraper Only a fallback crawler for one web page when ordinary fetching fails. Use `POST /v2/scrape` with a single `url` and focused formats like `markdown`, `html`, `rawHtml`, `links`, `summary`, or constrained `json`/`question`/`highlights` extraction. Do not use Firecrawl crawl/map/search/agent/browser endpoints. Do not request screenshots, audio, branding, images, or browser actions unless the proxy policy is expanded later. --- ## ScrapeCreators — Intent routing Map user intent to the right endpoint. Endpoint paths use the pattern `/v1/platform/action`. **Important:** After selecting an endpoint from the tables below, fetch its OpenAPI spec at `https://docs.scrapecreators.com/{path}/openapi.json` for full parameter details, types, and example response before making the actual API call. For example: `https://docs.scrapecreators.com/v1/tiktok/profile/openapi.json` ### Profiles / User Info | Platform | Endpoint | Primary Param | Example | |----------|----------|---------------|---------| | TikTok | `/v1/tiktok/profile` | handle | `stoolpresidente` | | Instagram | `/v1/instagram/profile` | handle | `jane` | | YouTube | `/v1/youtube/channel` | handle, channelId, or url | `ThePatMcAfeeShow` | | LinkedIn (person) | `/v1/linkedin/profile` | url | `https://www.linkedin.com/in/parrsam/` | | LinkedIn (company) | `/v1/linkedin/company` | url | `https://linkedin.com/company/shopify` | | Facebook | `/v1/facebook/profile` | url | `https://www.facebook.com/mantraindianfolsom` | | Twitter/X | `/v1/twitter/profile` | handle | `elonmusk` | | Reddit | `/v1/reddit/subre