Web Content Fetcher

Idea is the primary shelf because fetching and reading external pages is the backbone of early opportunity and competitor research before you commit to build. Research is where URL ingestion lives—blog posts, news, docs, and 微信公众号 pages become inputs to your notes and specs.

Also useful

Also useful

Where it fits

Example use

Pull a competitor launch post into Markdown before you sketch positioning.

Example use

Read a long product teardown article to sanity-check feature scope against market write-ups.

Example use

Ingest a third-party API guide as Markdown while wiring an integration.

Example use

GrowContent & marketing

Fetch a reference article to outline your own SEO post without losing link structure.

How it compares

Agent-side URL-to-Markdown skill—not a hosted MCP browser; pair with summarization skills after fetch.

Common Questions / FAQ

Who is web-content-fetcher for?

Solo builders and indie hackers who want their coding agent to read external articles and documentation as Markdown without writing scrape glue each time.

When should I use web-content-fetcher?

When you need to fetch, read, extract, or summarize a URL—in Idea competitor research, Validate scope reading, Build doc ingestion, or Grow content reference pulls—including WeChat and documentation pages.

Is web-content-fetcher safe to install?

It runs network fetches and optional local Python scripts; review the Security Audits panel on this page and avoid sending authenticated or sensitive URLs through third-party readers.

SKILL.md

READMESKILL.md - Web Content Fetcher

# Web Content Fetcher

Given a URL, return its main content as clean Markdown — headings, links, images, lists, code blocks all preserved.

## Extraction Strategy

Always try **one method per URL** — don't cascade blindly. Pick the right one upfront.

```
URL
 │
 ├─ 1. Scrapling script (preferred)
 │     Run fetch.py — check the domain routing table to decide fast vs --stealth.
 │     Works for most sites. Returns clean Markdown directly.
 │
 └─ 2. Jina Reader (fallback — only if Scrapling fails or dependencies not installed)
       web_fetch("https://r.jina.ai/<url>")
       Free tier: 200 req/day. Fast (~1-2s), good Markdown output.
       Does NOT work for: WeChat (403), some Chinese platforms.
```

### Scrapling script

```bash
python3 <SKILL_DIR>/scripts/fetch.py "<url>" [max_chars] [--stealth]
```

`<SKILL_DIR>` is the directory where this SKILL.md lives. Resolve it before calling the script.

The script has two modes built in:
- **Default (fast):** HTTP fetch, ~1-3s, works for most sites
- **`--stealth`:** Headless browser, ~5-15s, for JS-rendered or anti-scraping sites

When run without `--stealth`, the script automatically falls back to stealth if the fast result has too little content. So you rarely need to specify `--stealth` manually — the only reason to force it is when you already know the site needs it (see routing table), which saves the initial fast attempt.

## Domain Routing

Use this table to pick the right mode on the first call:

| Domain | Command | Why |
|--------|---------|-----|
| `mp.weixin.qq.com` | `fetch.py <url> --stealth` | JS-rendered content |
| `zhuanlan.zhihu.com` | `fetch.py <url> --stealth` | Anti-scraping + JS |
| `juejin.cn` | `fetch.py <url> --stealth` | JS-rendered SPA |
| `sspai.com` | `fetch.py <url>` | Static HTML |
| `blog.csdn.net` | `fetch.py <url>` | Static HTML |
| `ruanyifeng.com` | `fetch.py <url>` | Static blog |
| `openai.com` | `fetch.py <url>` | Static HTML |
| `blog.google` | `fetch.py <url>` | Static HTML |
| Everything else | `fetch.py <url>` | Auto-fallback handles it |

## Script Options

```bash
# Basic — auto-selects fast or stealth
python3 <SKILL_DIR>/scripts/fetch.py "https://sspai.com/post/73145"

# Force stealth for known JS-heavy sites
python3 <SKILL_DIR>/scripts/fetch.py "https://mp.weixin.qq.com/s/xxx" --stealth

# Limit output to 15000 characters (default: 30000)
python3 <SKILL_DIR>/scripts/fetch.py "https://example.com/article" 15000

# JSON output with metadata (url, mode, selector, content_length)
python3 <SKILL_DIR>/scripts/fetch.py "https://example.com" --json
```

## Install Dependencies

First use only — the script checks and tells you if anything is missing:

```bash
pip install scrapling html2text
```

If on system-managed Python (macOS/Linux), add `--break-system-packages` or use a venv.

## Failure Rules

- Same URL fails once → give up, tell the user "unable to extract content from this URL"
- Do not retry — each failed call wastes context tokens


<div align="center">

# Web Content Fetcher

**网页正文提取 · 永久免费 · 支持微信公众号**

[![Python](https://img.shields.io/badge/Python-3.10%2B-blue?logo=python&logoColor=white)](https://www.python.org/)
[![License](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)

</div>

---

## 简介

Web Content Fetcher 是一个轻量级的网页正文提取工具，能够自动将任意网页转换为干净的 Markdown 格式，保留标题、链接、图片和列表结构。

**核心优势：**
- Scrapling 优先提取，内置 fast / ste

What is this skill?

Primary path: Scrapling fetch.py with domain-aware fast vs --stealth routing for clean Markdown

Fallback: Jina Reader (r.jina.ai) when Scrapling fails or deps are missing—~1–2s on simple pages

Preserves headings, links, images, lists, and code blocks in Markdown output

Optional max_chars cap on extracted body length

Bilingual triggers: fetch/read/scrape URLs plus 帮我读一下这篇文章, 抓取这个网页, 提取正文

Two-tier strategy: Scrapling script primary, Jina Reader fallback

Jina free tier noted at 200 requests per day in skill docs

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 2.7k installs on skills.sh; 579 GitHub stars; 2/3 security scanners passed (skills.sh audits).

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

Also useful

Where it fits

Example use

Pull a competitor launch post into Markdown before you sketch positioning.

Example use

Read a long product teardown article to sanity-check feature scope against market write-ups.

Example use