
Web Fetch
Fetch a live URL and extract the main article body as clean markdown so your coding agent can cite docs, competitors, or changelogs without copying nav chrome.
Overview
web-fetch is an agent skill for the Idea phase that downloads a URL and returns main-content markdown by parsing HTML, removing navigation chrome, and converting with Turndown.
Install
npx skills add https://github.com/0xbigboss/claude-code --skill web-fetchWhat is this skill?
- HTTP fetch with a ClaudeCode-compatible User-Agent header
- DOM parse via linkedom and heuristic main-content selection across article, main, role=main, and .content/#content
- Strips nav, header, footer, scripts, sidebars, menus, TOC, and breadcrumbs before conversion
- Turndown-based HTML-to-markdown pipeline for token-efficient agent reads
- Bun CLI entry: `bun fetch.ts <url>` with explicit non-OK status handling
- 8 removeSelectors categories (nav, header, footer, script, style, noscript, navigation role, sidebar/nav/menu/toc/breadc
Adoption & trust: 858 installs on skills.sh; 49 GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need official docs or competitor copy in your agent session but raw fetch dumps menus, footers, and scripts that waste context and confuse summaries.
Who is it for?
Solo builders researching documentation, blog posts, or static marketing pages where the readable content lives in semantic HTML blocks.
Skip if: Heavy client-rendered apps, authenticated paywalled pages, or workflows that need full browser automation instead of HTTP plus DOM cleanup.
When should I use this skill?
User supplies a URL and needs article-style markdown for agent context without manual copy-paste.
What do I get? / Deliverables
You get a focused markdown extraction from the dominant content element so downstream research, spec, or implementation prompts cite the right text.
- Markdown stdout derived from the dominant content element
- Clear fetch failure exit codes for non-OK HTTP responses
Recommended Skills
Journey fit
Canonical shelf is Idea/research because the primary job is pulling readable web content into the agent context before you commit to a build. research fits competitive and documentation discovery workflows where HTML must be reduced to the content-rich region (article/main) for summarization and comparison.
How it compares
Use instead of pasting whole page HTML into chat when you only need article-body markdown from a known URL.
Common Questions / FAQ
Who is web-fetch for?
Indie developers and agent users who want a repeatable CLI fetch-and-extract step while researching products, APIs, or competitors before coding.
When should I use web-fetch?
Use it during Idea research and discovery when you have a URL and need clean markdown; it also helps during Build or Ship when you pull changelogs or integration docs into the repo.
Is web-fetch safe to install?
It performs outbound HTTP fetches—review the Security Audits panel on this Prism page and only point it at URLs you trust before running in your environment.
SKILL.md
READMESKILL.md - Web Fetch
node_modules/ *.lock import { parseHTML } from "linkedom"; import TurndownService from "turndown"; const url = process.argv[2]; if (!url) { console.error("Usage: bun fetch.ts <url>"); process.exit(1); } // Step 1: Fetch const response = await fetch(url, { headers: { "User-Agent": "Mozilla/5.0 (compatible; ClaudeCode/1.0)", }, }); if (!response.ok) { console.error(`Fetch failed: ${response.status} ${response.statusText}`); process.exit(1); } const html = await response.text(); // Step 2: Parse DOM const { document } = parseHTML(html); // Step 3: Find the content-rich element const candidates = [ ...document.querySelectorAll("article"), ...document.querySelectorAll("main"), ...document.querySelectorAll('[role="main"]'), ...document.querySelectorAll(".content"), ...document.querySelectorAll("#content"), ]; let contentEl: Element | null = null; let maxLength = 0; for (const el of candidates) { const len = el.textContent?.length || 0; if (len > maxLength) { maxLength = len; contentEl = el; } } if (!contentEl) { contentEl = document.body; } // Step 4: Clean up the content element before conversion // Remove navigation elements const removeSelectors = [ "nav", "header", "footer", "script", "style", "noscript", '[role="navigation"]', ".sidebar", ".nav", ".menu", ".toc", '[aria-label="breadcrumb"]', ]; for (const selector of removeSelectors) { contentEl.querySelectorAll(selector).forEach((el) => el.remove()); } // Step 5: Convert to Markdown with Turndown const turndown = new TurndownService({ headingStyle: "atx", codeBlockStyle: "fenced", }); // Better code block handling turndown.addRule("fencedCodeBlock", { filter: (node) => { return ( node.nodeName === "PRE" && node.firstChild && node.firstChild.nodeName === "CODE" ); }, replacement: (content, node) => { const el = node as Element; const code = el.querySelector("code"); const className = code?.className || ""; const lang = className.match(/language-(\w+)/)?.[1] || ""; const text = code?.textContent || ""; return `\n\`\`\`${lang}\n${text}\n\`\`\`\n`; }, }); // Handle pre without code child turndown.addRule("preBlock", { filter: (node) => { return ( node.nodeName === "PRE" && (!node.firstChild || node.firstChild.nodeName !== "CODE") ); }, replacement: (content, node) => { const text = (node as Element).textContent || ""; return `\n\`\`\`\n${text}\n\`\`\`\n`; }, }); // Remove "Copy page" buttons and similar UI elements turndown.addRule("removeButtons", { filter: (node) => { if (node.nodeName === "BUTTON") return true; const el = node as Element; if (el.getAttribute?.("aria-label")?.includes("Copy")) return true; return false; }, replacement: () => "", }); const markdown = turndown.turndown(contentEl.innerHTML); // Step 6: Clean up the output const cleaned = markdown // Remove Loading... placeholders .replace(/^Loading\.\.\.$/gm, "") // Remove Copy buttons .replace(/^Copy page$/gm, "") .replace(/^Copy$/gm, "") // Fix empty headings (## \n\nActual heading -> ## Actual heading) .replace(/^(#{1,6})\s*\n\n+([A-Z])/gm, "$1 $2") // Remove completely empty headings .replace(/^#{1,6}\s*$/gm, "") // Collapse multiple newlines .replace(/\n{3,}/g, "\n\n") .trim(); // Output with title const title = document.title || "Untitled"; console.log(`# ${title}\n`); console.log(cleaned); { "name": "web-fetch", "version": "1.0.0", "type": "module", "description": "Intelligent web content extraction to clean markdown", "dependencies": { "linkedom": "^0.18.0", "turndown": "^7.2.0" } } --- name: web-fetch description: Fetches web content as clean markdown by preferring markdown-native responses and falling back to selector-based HTML extraction. Use for documentation, articles, and reference pages at http/https URLs. --- # Web Content Fetching Fetch web con