Webclaw

Name: Webclaw
Author: io.github.0xMassi

Give Claude Code or Cursor a stdio MCP server that turns any public URL into clean markdown for competitor research, docs ingestion, and content reuse without hand-copying pages.

Overview

io.github.0xMassi/webclaw is an Idea-phase MCP server for web extraction that scrapes, crawls, extracts, and summarizes any URL into clean markdown for coding agents.

What is this MCP server?

Scrape a single URL to structured, agent-friendly markdown
Crawl linked pages for deeper site captures when one page is not enough
Extract and summarize page content so agents get signal instead of raw HTML noise
Stdio MCP transport via npm package create-webclaw (v0.1.3) for Claude Code, Cursor, and other MCP hosts
Fits research, validation, and launch workflows that start from someone else’s website
Server schema version 0.1.3
npm registry identifier create-webclaw with stdio MCP transport
Capabilities described as scrape, crawl, extract, and summarize to markdown

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

What problem does it solve?

Solo builders waste agent turns copying messy HTML or fighting one-off scrapers when they need readable web content in the chat.

Who is it for?

Indie builders doing competitor and audience research, ingesting public docs, or summarizing marketing pages inside Claude Code, Cursor, or Codex with stdio MCP.

Skip if: Teams that need authenticated sessions, heavy anti-bot bypass, or a full managed crawling platform with SLAs and compliance review.

What do I get? / Deliverables

After you add webclaw to your MCP host, agents can pull normalized markdown from target URLs so research and spec work stays in one thread.

Clean markdown representations of single URLs or crawled pages
Summarized extractions suitable for notes, specs, and agent follow-up
Repeatable web capture workflows without bespoke scraper scripts per site

Recommended MCP Servers

1stDibs

The 1stDibs MCP server exposes browse-and-search capabilities against the 1stDibs luxury goods marketplace through a hos…

2Captcha MCParuxojuyu665/2Captcha-MCP

2Captcha MCP exposes the commercial 2Captcha API to MCP hosts with 43 tools—31 focused on captcha solving plus managemen…

4fetch

4fetch is a hosted MCP server that fetches a URL and returns clean Markdown with metadata so coding agents can quote pag…

AcrawlMingye-Lu/AgenticCrawler

acrawl (Agentic Crawler) is a Model Context Protocol server that packages autonomous web browsing into a single local bi…5 stars

Agentfetchbch1212/agentfetch-mcp

Agentfetch MCP is a token-budgeted web retrieval server for AI coding agents. Solo builders doing idea-phase competitor …

AgenticTotem Web Extractor

AgenticTotem Web Extractor is a hosted MCP server for AI web extraction: you supply URLs and a JSON Schema, and the serv…

Journey fit

Primary fit

IdeaOpportunity & market research

Arbitrary URL extraction is most often the first web-automation need in the solo journey—before you have a product URL of your own—when you are reading competitors, docs, and landing pages during opportunity research. The research subphase is where builders collect and normalize off-site information; markdown output maps directly into notes, specs, and agent context.

How it compares

stdio MCP web-scraping integration, not an in-repo agent skill or curated skills marketplace.

Common Questions / FAQ

Who is Webclaw for?

Solo and indie builders who use MCP-enabled coding agents and need URL-to-markdown extraction for research, validation, and content tasks without custom scraper code.

When should I use Webclaw?

Use it when an answer depends on a live web page—competitor sites, docs, or articles—and you want crawl, extract, or summarize output as clean markdown in the agent session.

How do I add Webclaw to my agent?

Install the npm package create-Webclaw (registry identifier in server metadata), add an MCP server entry with stdio transport in Claude Code, Cursor, or your host’s config, then restart the client so tools load.

What is this MCP server?

Scrape a single URL to structured, agent-friendly markdown

Crawl linked pages for deeper site captures when one page is not enough

Extract and summarize page content so agents get signal instead of raw HTML noise

Stdio MCP transport via npm package create-webclaw (v0.1.3) for Claude Code, Cursor, and other MCP hosts

Fits research, validation, and launch workflows that start from someone else’s website

Server schema version 0.1.3

npm registry identifier create-webclaw with stdio MCP transport

Capabilities described as scrape, crawl, extract, and summarize to markdown

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Who is it for?

Indie builders doing competitor and audience research, ingesting public docs, or summarizing marketing pages inside Claude Code, Cursor, or Codex with stdio MCP.

Skip if: Teams that need authenticated sessions, heavy anti-bot bypass, or a full managed crawling platform with SLAs and compliance review.

What do I get? / Deliverables

After you add webclaw to your MCP host, agents can pull normalized markdown from target URLs so research and spec work stays in one thread.

Clean markdown representations of single URLs or crawled pages

Summarized extractions suitable for notes, specs, and agent follow-up

Repeatable web capture workflows without bespoke scraper scripts per site

Journey fit

Primary fit

IdeaOpportunity & market research

Overview

What is this MCP server?

What problem does it solve?

Who is it for?

What do I get? / Deliverables

Recommended MCP Servers

Journey fit

Who is Webclaw for?

When should I use Webclaw?

How do I add Webclaw to my agent?

This week for builders

Overview

What is this MCP server?

What problem does it solve?

Who is it for?

What do I get? / Deliverables

Recommended MCP Servers

Journey fit

Who is Webclaw for?

When should I use Webclaw?

How do I add Webclaw to my agent?