Defuddle

Canonical shelf is Idea research because the default workflow optimizes reading and saving source material before you build or publish. Research subphase fits URL and HTML extraction, clutter removal, and summarizing title, author, and word count for later citation.

Also useful

Also useful

Where it fits

Example use

Extract a competitor’s manifesto post into Markdown for your positioning notes.

Example use

Pull a long-form source article cleanly before outlining your newsletter or blog rewrite.

Example use

Archive a third-party API guide as Markdown while writing integration docs.

How it compares

Opinionated single-URL article extractor with save workflow—not a headless browser automation suite or generic wget mirror.

Common Questions / FAQ

Who is defuddle for?

Solo builders and agents who read the open web or local HTML and need clutter-free Markdown plus metadata for notes, research briefs, or content drafts.

When should I use defuddle?

In Idea research when capturing articles, in Grow content when repurposing sources, or in Build docs when archiving reference pages—whenever triggers like extract article, clean this page, or get content from URL appear.

Is defuddle safe to install?

The skill runs a global npm CLI that fetches URLs you provide; review the Security Audits panel on this Prism page and pin or audit the defuddle and jsdom packages before use on sensitive networks.

SKILL.md

READMESKILL.md - Defuddle

# Defuddle - Web Content Extraction

Extract main article content from web pages, removing ads, sidebars, navigation, and other clutter. Output clean Markdown with metadata.

## Prerequisites

Before first use, check if `defuddle` is installed:

```bash
command -v defuddle >/dev/null 2>&1 || npm install -g defuddle jsdom
```

## Default Workflow

When user provides a URL, follow this workflow:

### Step 1: Extract content as Markdown + JSON metadata

Always use both `-m` and `-j` flags to get markdown content with full metadata:

```bash
defuddle parse "<url>" -m -j
```

### Step 2: Present a summary to the user

Show the user:
- **Title**: from JSON `title` field
- **Author**: from JSON `author` field
- **Source**: domain
- **Word count**: from JSON `wordCount` field
- A brief preview (first 2-3 sentences)

### Step 3: Ask where to save

If this is the **first time** using defuddle in this conversation, ask the user:
> "Save to which directory? (e.g. `~/Documents`, `~/Desktop`, or a custom path)"

Remember the user's chosen directory for subsequent uses in the same conversation.

### Step 4: Save as Markdown file

Write the file with frontmatter + full content:

```markdown
---
title: {title}
author: {author}
source: {url}
date: {published or "Unknown"}
clipped: {today's date YYYY-MM-DD}
wordCount: {wordCount}
---

# {title}

{markdown content}
```

**File naming**: Use the article title as filename, sanitized for filesystem:
- Replace special characters with spaces
- Trim whitespace
- Example: `The Shape of the Essay Field.md`

### Step 5: Confirm to user

Tell the user the file path where it was saved.

## CLI Reference

```bash
defuddle parse <source> [options]
```

**Arguments:**
- `<source>` — URL (`https://...`) or local HTML file path

**Options:**
| Flag | Description |
|------|-------------|
| `-m, --markdown` | Convert content to Markdown |
| `-j, --json` | Output as JSON with full metadata |
| `-o, --output <file>` | Write to file instead of stdout |
| `-p, --property <name>` | Extract single property (title, description, domain, author, published, wordCount, content) |
| `--debug` | Verbose logging |

## JSON Response Fields

When using `-j`, the response includes:
- `title` — Article title
- `author` — Author name
- `published` — Publication date
- `description` — Meta description
- `content` — Extracted Markdown (when `-m` used)
- `domain` — Source domain
- `favicon` — Favicon URL
- `image` — Featured image URL
- `site` — Site name
- `wordCount` — Word count
- `parseTime` — Processing time in ms

## Notes
- Requires Node.js and npm
- `jsdom` is required as a peer dependency
- Works best with article-style pages (blogs, news, documentation)
- Not designed for SPAs or JavaScript-heavy pages (e.g. WeChat articles need browser rendering)

What is this skill?

Strips ads, sidebars, nav, and clutter to return main article content

Default CLI: defuddle parse "<url>" -m -j for Markdown plus JSON metadata

Three-step workflow: extract, present summary preview, ask save directory on first use

Triggers: defuddle, extract article, clean this page, strip clutter, web extract

One-time npm global install: defuddle and jsdom when binary missing

Default workflow has 3 steps: extract, summary, save path

parse uses -m and -j flags together for Markdown and JSON metadata

Compatible agents: Claude Code, Cursor, Codex, Windsurf

Adoption & trust: 1.1k installs on skills.sh; 103 GitHub stars; 1/3 security scanners passed (skills.sh audits).

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

Also useful

Where it fits

Example use

Extract a competitor’s manifesto post into Markdown for your positioning notes.

Example use

Pull a long-form source article cleanly before outlining your newsletter or blog rewrite.

Example use