Web Fetch

Name: Web Fetch
Author: 0xbigboss

0xbigboss/claude-code

Fetch a live URL and extract the main article body as clean markdown so your coding agent can cite docs, competitors, or changelogs without copying nav chrome.

Overview

web-fetch is an agent skill for the Idea phase that downloads a URL and returns main-content markdown by parsing HTML, removing navigation chrome, and converting with Turndown.

Install

npx skills add https://github.com/0xbigboss/claude-code --skill web-fetch

What is this skill?

HTTP fetch with a ClaudeCode-compatible User-Agent header
DOM parse via linkedom and heuristic main-content selection across article, main, role=main, and .content/#content
Strips nav, header, footer, scripts, sidebars, menus, TOC, and breadcrumbs before conversion
Turndown-based HTML-to-markdown pipeline for token-efficient agent reads
Bun CLI entry: `bun fetch.ts <url>` with explicit non-OK status handling
8 removeSelectors categories (nav, header, footer, script, style, noscript, navigation role, sidebar/nav/menu/toc/breadc

Compatible agents: Claude Code, Codex, Cursor, any compatible agent

Adoption & trust: 858 installs on skills.sh; 49 GitHub stars; 2/3 security scanners passed (skills.sh audits).

What problem does it solve?

You need official docs or competitor copy in your agent session but raw fetch dumps menus, footers, and scripts that waste context and confuse summaries.

Who is it for?

Solo builders researching documentation, blog posts, or static marketing pages where the readable content lives in semantic HTML blocks.

Skip if: Heavy client-rendered apps, authenticated paywalled pages, or workflows that need full browser automation instead of HTTP plus DOM cleanup.

When should I use this skill?

User supplies a URL and needs article-style markdown for agent context without manual copy-paste.

What do I get? / Deliverables

You get a focused markdown extraction from the dominant content element so downstream research, spec, or implementation prompts cite the right text.

Markdown stdout derived from the dominant content element
Clear fetch failure exit codes for non-OK HTTP responses

Recommended Skills

Agent Browservercel-labs/agent-browser

agent-browser is a Node-installed browser automation CLI built for AI agents that need dependable programmatic web inter…428k installs·35.5k stars

Lark Imlarksuite/cli

Lark IM is a Larksuite agent skill that exposes Feishu/Lark instant messaging to Claude Code, Cursor, and similar agents…210k installs·13.7k stars

Lark Calendarlarksuite/cli

lark-calendar is an agent skill for Feishu/Lark Calendar v4 exposed via lark-cli. Solo builders and small teams who alre…209k installs·13.7k stars

Lark Sheetslarksuite/cli

Skill for programmatic Feishu spreadsheet and worksheet management—create tables, bulk data IO, lookup, and export—using…209k installs·13.7k stars

Lark Vclarksuite/cli

lark-vc is an agent skill for Feishu/Lark video conferencing history and artifacts through lark-cli. After calls end, so…208k installs·13.7k stars

Lark Contactlarksuite/cli

CLI skill for Lark directory lookup: search employees and fetch metadata by open_id, with clear boundaries vs IM, calend…208k installs·13.7k stars

Journey fit

Primary fit

IdeaOpportunity & market research

Canonical shelf is Idea/research because the primary job is pulling readable web content into the agent context before you commit to a build. research fits competitive and documentation discovery workflows where HTML must be reduced to the content-rich region (article/main) for summarization and comparison.

Also useful

BuildIntegrations & version control

How it compares

Use instead of pasting whole page HTML into chat when you only need article-body markdown from a known URL.

Common Questions / FAQ

Who is web-fetch for?

Indie developers and agent users who want a repeatable CLI fetch-and-extract step while researching products, APIs, or competitors before coding.

When should I use web-fetch?

Use it during Idea research and discovery when you have a URL and need clean markdown; it also helps during Build or Ship when you pull changelogs or integration docs into the repo.

Is web-fetch safe to install?

It performs outbound HTTP fetches—review the Security Audits panel on this Prism page and only point it at URLs you trust before running in your environment.

SKILL.md

READMESKILL.md - Web Fetch

node_modules/
*.lock


import { parseHTML } from "linkedom";
import TurndownService from "turndown";

const url = process.argv[2];
if (!url) {
  console.error("Usage: bun fetch.ts <url>");
  process.exit(1);
}

// Step 1: Fetch
const response = await fetch(url, {
  headers: {
    "User-Agent": "Mozilla/5.0 (compatible; ClaudeCode/1.0)",
  },
});

if (!response.ok) {
  console.error(`Fetch failed: ${response.status} ${response.statusText}`);
  process.exit(1);
}

const html = await response.text();

// Step 2: Parse DOM
const { document } = parseHTML(html);

// Step 3: Find the content-rich element
const candidates = [
  ...document.querySelectorAll("article"),
  ...document.querySelectorAll("main"),
  ...document.querySelectorAll('[role="main"]'),
  ...document.querySelectorAll(".content"),
  ...document.querySelectorAll("#content"),
];

let contentEl: Element | null = null;
let maxLength = 0;

for (const el of candidates) {
  const len = el.textContent?.length || 0;
  if (len > maxLength) {
    maxLength = len;
    contentEl = el;
  }
}

if (!contentEl) {
  contentEl = document.body;
}

// Step 4: Clean up the content element before conversion
// Remove navigation elements
const removeSelectors = [
  "nav",
  "header",
  "footer",
  "script",
  "style",
  "noscript",
  '[role="navigation"]',
  ".sidebar",
  ".nav",
  ".menu",
  ".toc",
  '[aria-label="breadcrumb"]',
];

for (const selector of removeSelectors) {
  contentEl.querySelectorAll(selector).forEach((el) => el.remove());
}

// Step 5: Convert to Markdown with Turndown
const turndown = new TurndownService({
  headingStyle: "atx",
  codeBlockStyle: "fenced",
});

// Better code block handling
turndown.addRule("fencedCodeBlock", {
  filter: (node) => {
    return (
      node.nodeName === "PRE" &&
      node.firstChild &&
      node.firstChild.nodeName === "CODE"
    );
  },
  replacement: (content, node) => {
    const el = node as Element;
    const code = el.querySelector("code");
    const className = code?.className || "";
    const lang = className.match(/language-(\w+)/)?.[1] || "";
    const text = code?.textContent || "";
    return `\n\`\`\`${lang}\n${text}\n\`\`\`\n`;
  },
});

// Handle pre without code child
turndown.addRule("preBlock", {
  filter: (node) => {
    return (
      node.nodeName === "PRE" &&
      (!node.firstChild || node.firstChild.nodeName !== "CODE")
    );
  },
  replacement: (content, node) => {
    const text = (node as Element).textContent || "";
    return `\n\`\`\`\n${text}\n\`\`\`\n`;
  },
});

// Remove "Copy page" buttons and similar UI elements
turndown.addRule("removeButtons", {
  filter: (node) => {
    if (node.nodeName === "BUTTON") return true;
    const el = node as Element;
    if (el.getAttribute?.("aria-label")?.includes("Copy")) return true;
    return false;
  },
  replacement: () => "",
});

const markdown = turndown.turndown(contentEl.innerHTML);

// Step 6: Clean up the output
const cleaned = markdown
  // Remove Loading... placeholders
  .replace(/^Loading\.\.\.$/gm, "")
  // Remove Copy buttons
  .replace(/^Copy page$/gm, "")
  .replace(/^Copy$/gm, "")
  // Fix empty headings (## \n\nActual heading -> ## Actual heading)
  .replace(/^(#{1,6})\s*\n\n+([A-Z])/gm, "$1 $2")
  // Remove completely empty headings
  .replace(/^#{1,6}\s*$/gm, "")
  // Collapse multiple newlines
  .replace(/\n{3,}/g, "\n\n")
  .trim();

// Output with title
const title = document.title || "Untitled";
console.log(`# ${title}\n`);
console.log(cleaned);


{
  "name": "web-fetch",
  "version": "1.0.0",
  "type": "module",
  "description": "Intelligent web content extraction to clean markdown",
  "dependencies": {
    "linkedom": "^0.18.0",
    "turndown": "^7.2.0"
  }
}


---
name: web-fetch
description: Fetches web content as clean markdown by preferring markdown-native responses and falling back to selector-based HTML extraction. Use for documentation, articles, and reference pages at http/https URLs.
---

# Web Content Fetching

Fetch web con

What is this skill?

HTTP fetch with a ClaudeCode-compatible User-Agent header

DOM parse via linkedom and heuristic main-content selection across article, main, role=main, and .content/#content

Strips nav, header, footer, scripts, sidebars, menus, TOC, and breadcrumbs before conversion

Turndown-based HTML-to-markdown pipeline for token-efficient agent reads

Bun CLI entry: `bun fetch.ts <url>` with explicit non-OK status handling

8 removeSelectors categories (nav, header, footer, script, style, noscript, navigation role, sidebar/nav/menu/toc/breadc

Compatible agents: Claude Code, Codex, Cursor, any compatible agent

Adoption & trust: 858 installs on skills.sh; 49 GitHub stars; 2/3 security scanners passed (skills.sh audits).

Journey fit

Primary fit

IdeaOpportunity & market research

Also useful

BuildIntegrations & version control

SKILL.md

READMESKILL.md - Web Fetch

node_modules/
*.lock


import { parseHTML } from "linkedom";
import TurndownService from "turndown";

const url = process.argv[2];
if (!url) {
  console.error("Usage: bun fetch.ts <url>");
  process.exit(1);
}

// Step 1: Fetch
const response = await fetch(url, {
  headers: {
    "User-Agent": "Mozilla/5.0 (compatible; ClaudeCode/1.0)",
  },
});

if (!response.ok) {
  console.error(`Fetch failed: ${response.status} ${response.statusText}`);
  process.exit(1);
}

const html = await response.text();

// Step 2: Parse DOM
const { document } = parseHTML(html);

// Step 3: Find the content-rich element
const candidates = [
  ...document.querySelectorAll("article"),
  ...document.querySelectorAll("main"),
  ...document.querySelectorAll('[role="main"]'),
  ...document.querySelectorAll(".content"),
  ...document.querySelectorAll("#content"),
];

let contentEl: Element | null = null;
let maxLength = 0;

for (const el of candidates) {
  const len = el.textContent?.length || 0;
  if (len > maxLength) {
    maxLength = len;
    contentEl = el;
  }
}

if (!contentEl) {
  contentEl = document.body;
}

// Step 4: Clean up the content element before conversion
// Remove navigation elements
const removeSelectors = [
  "nav",
  "header",
  "footer",
  "script",
  "style",
  "noscript",
  '[role="navigation"]',
  ".sidebar",
  ".nav",
  ".menu",
  ".toc",
  '[aria-label="breadcrumb"]',
];

for (const selector of removeSelectors) {
  contentEl.querySelectorAll(selector).forEach((el) => el.remove());
}

// Step 5: Convert to Markdown with Turndown
const turndown = new TurndownService({
  headingStyle: "atx",
  codeBlockStyle: "fenced",
});

// Better code block handling
turndown.addRule("fencedCodeBlock", {
  filter: (node) => {
    return (
      node.nodeName === "PRE" &&
      node.firstChild &&
      node.firstChild.nodeName === "CODE"
    );
  },
  replacement: (content, node) => {
    const el = node as Element;
    const code = el.querySelector("code");
    const className = code?.className || "";
    const lang = className.match(/language-(\w+)/)?.[1] || "";
    const text = code?.textContent || "";
    return `\n\`\`\`${lang}\n${text}\n\`\`\`\n`;
  },
});

// Handle pre without code child
turndown.addRule("preBlock", {
  filter: (node) => {
    return (
      node.nodeName === "PRE" &&
      (!node.firstChild || node.firstChild.nodeName !== "CODE")
    );
  },
  replacement: (content, node) => {
    const text = (node as Element).textContent || "";
    return `\n\`\`\`\n${text}\n\`\`\`\n`;
  },
});

// Remove "Copy page" buttons and similar UI elements
turndown.addRule("removeButtons", {
  filter: (node) => {
    if (node.nodeName === "BUTTON") return true;
    const el = node as Element;
    if (el.getAttribute?.("aria-label")?.includes("Copy")) return true;
    return false;
  },
  replacement: () => "",
});

const markdown = turndown.turndown(contentEl.innerHTML);

// Step 6: Clean up the output
const cleaned = markdown
  // Remove Loading... placeholders
  .replace(/^Loading\.\.\.$/gm, "")
  // Remove Copy buttons
  .replace(/^Copy page$/gm, "")
  .replace(/^Copy$/gm, "")
  // Fix empty headings (## \n\nActual heading -> ## Actual heading)
  .replace(/^(#{1,6})\s*\n\n+([A-Z])/gm, "$1 $2")
  // Remove completely empty headings
  .replace(/^#{1,6}\s*$/gm, "")
  // Collapse multiple newlines
  .replace(/\n{3,}/g, "\n\n")
  .trim();

// Output with title
const title = document.title || "Untitled";
console.log(`# ${title}\n`);
console.log(cleaned);


{
  "name": "web-fetch",
  "version": "1.0.0",
  "type": "module",
  "description": "Intelligent web content extraction to clean markdown",
  "dependencies": {
    "linkedom": "^0.18.0",
    "turndown": "^7.2.0"
  }
}


---
name: web-fetch
description: Fetches web content as clean markdown by preferring markdown-native responses and falling back to selector-based HTML extraction. Use for documentation, articles, and reference pages at http/https URLs.
---

# Web Content Fetching

Fetch web con

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is web-fetch for?

When should I use web-fetch?

Is web-fetch safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is web-fetch for?

When should I use web-fetch?

Is web-fetch safe to install?

SKILL.md