
Read
Fetch clean Markdown from arbitrary URLs when your coding agent needs page text without opening a browser yourself.
Overview
Read is a journey-wide agent skill that converts arbitrary URLs into clean Markdown via a four-step proxy cascade and GitHub-aware fallbacks—usable whenever a solo builder needs faithful page text before committing agent
Install
npx skills add https://github.com/tw93/waza --skill readWhat is this skill?
- Four-step proxy cascade: defuddle.md, r.jina.ai, optional web-search plugin reader, then local agent-fetch or defuddle p
- Treats empty, error, or sub-five-line proxy responses as failure and advances to the next method
- GitHub blob URLs bypass HTML proxies via raw.githubusercontent.com or gh api plus base64 decode for private repos
- agent-fetch --json requires extracting the Markdown field; raw JSON is explicitly invalid as final /read output
- Works as procedural reference for curl, npx, and gh commands rather than a hosted MCP server
- Four-step proxy cascade before local tools
- Proxy failure heuristic when output has fewer than five lines
Adoption & trust: 6.4k installs on skills.sh; 5.6k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need your agent to read a live URL but one-shot curl or HTML dumps return nav junk, empty JS shells, or unreadable GitHub blob pages.
Who is it for?
Indie builders who wire research, doc ingestion, or link summaries into agent workflows with shell and curl available.
Skip if: Builders who need authenticated interactive browsing, screenshot capture, or a managed SaaS scraper with no local network execution.
When should I use this skill?
An agent task requires reading an external URL or GitHub file and you need ordered fallbacks instead of a single brittle fetch.
What do I get? / Deliverables
You get a repeatable cascade order, GitHub raw/gh shortcuts, and local parse commands so the agent returns Markdown-bearing content instead of errors or raw JSON.
- Markdown body suitable for agent context
- Documented fallback attempt order for failed reads
Recommended Skills
Journey fit
Useful at every journey phase - explore requirements and options before committing to a direction.
Where it fits
Pull a competitor feature page through defuddle.md before updating your positioning notes.
Ingest a vendor API doc via r.jina.ai when the official site is JS-heavy.
Archive a pricing page as Markdown to compare tiers without manual copy-paste.
Capture changelog text from a dependency’s release URL for release notes.
Summarize a status or postmortem URL after an outage using the local defuddle parse fallback.
How it compares
Use this procedural read cascade instead of ad-hoc single-proxy curls that fail silently on GitHub or JavaScript-rendered pages.
Common Questions / FAQ
Who is read for?
Solo and indie builders running Claude Code, Cursor, Codex, or similar agents who must pull external web or GitHub file content into context reliably.
When should I use read?
Use it during idea research to ingest competitor pages, during build when copying integration docs, at ship for release-note sources, and during operate when summarizing incident or vendor status URLs.
Is read safe to install?
The skill only documents curl, npx, and gh patterns that hit third-party URLs and may run local fetch tools; review the Security Audits panel on this page before enabling network-heavy agent runs.
SKILL.md
READMESKILL.md - Read
# Read Methods Reference ## Proxy Cascade Try in order. Success = non-empty output with readable content. If a proxy returns empty, an error page, or fewer than 5 lines, treat it as failed and try the next: ### 1. defuddle.md ```bash curl -sL "https://defuddle.md/{url}" ``` Cleaner output with YAML frontmatter. Try this first. ### 2. r.jina.ai ```bash curl -sL "https://r.jina.ai/{url}" ``` Wide coverage, preserves image links. Use if defuddle.md returns empty or errors. ### 3. Web search plugin reader (if available) If a web search plugin is installed (e.g., PipeLLM), the cascade tries its reader tool before local fallback. Handles JavaScript-rendered pages better than free proxies. ### 4. Local tools ```bash npx agent-fetch "{url}" --json # or defuddle parse "{url}" -m ``` Last resort if both proxies fail. `agent-fetch --json` returns JSON, so extract the Markdown-bearing field before returning or saving the result. `defuddle parse -m` outputs Markdown directly. Raw JSON is not a valid final output for `/read`. ## GitHub URLs GitHub file URLs (`github.com/user/repo/blob/...`) render heavy HTML. The proxy cascade often returns partial or nav-heavy content. Prefer: ```bash # Raw file content (fastest) curl -sL "https://raw.githubusercontent.com/{user}/{repo}/{branch}/{path}" # Via gh CLI (works with private repos) gh api repos/{user}/{repo}/contents/{path} --jq '.content' | base64 -d ``` Use the proxy cascade only as a fallback for GitHub pages that are not raw file views (e.g., issue threads, README renders). ## PDF to Markdown ### Remote PDF URL r.jina.ai handles PDF URLs directly: ```bash curl -sL "https://r.jina.ai/{pdf_url}" ``` If that fails, download and extract locally: ```bash curl -sL "{pdf_url}" -o /tmp/input.pdf pdftotext -layout /tmp/input.pdf - ``` ### Local PDF file ```bash # Best quality (requires: pip install marker-pdf) marker_single /path/to/file.pdf --output_dir "${READ_OUTPUT_DIR:-/tmp/waza-read}" # Fast, text-heavy PDFs (requires: brew install poppler) pdftotext -layout /path/to/file.pdf - | sed 's/\f/\n---\n/g' # No-dependency fallback python3 -c " import pypdf, sys r = pypdf.PdfReader(sys.argv[1]) print('\n\n'.join(p.extract_text() for p in r.pages)) " /path/to/file.pdf ``` Use `marker` when layout matters (papers, tables). Use `pdftotext` for speed. ## Feishu / Lark Document Resolve the built-in helper script directory once. This works from a single-skill install, the packaged dispatcher, or the source repo root: ```bash READ_SCRIPT_DIR="" for candidate in \ "${CLAUDE_SKILL_DIR:+$CLAUDE_SKILL_DIR/scripts}" \ "${CLAUDE_SKILL_DIR:+$CLAUDE_SKILL_DIR/skills/read/scripts}" \ "./skills/read/scripts"; do if [ -n "$candidate" ] && [ -f "$candidate/fetch_feishu.py" ]; then READ_SCRIPT_DIR="$candidate" break fi done if [ -z "$READ_SCRIPT_DIR" ]; then echo "read helper scripts not found; set CLAUDE_SKILL_DIR or run from the Waza repo root" >&2 exit 1 fi ``` Requires `requests` and Feishu app credentials: ```bash pip install requests # one-time setup export FEISHU_APP_ID=your_app_id export FEISHU_APP_SECRET=your_app_secret python3 "$READ_SCRIPT_DIR/fetch_feishu.py" "{url}" ``` Supports: docx and wiki pages. Legacy `/docs/` pages are not supported by this script; convert them to docx first, or use a public-page fallback if the document is accessible without the API. App needs `docx:document:readonly` and `wiki:wiki:readonly` permissions. Output: YAML frontmatter (title, document_id, url) + Markdown body. ## WeChat Public Account Use the proxy cascade (r.jina.ai / defuddle.md). Works for most articles without any extra tools. If the proxy is blocked, use the built-in Playwright script as a last resort (requires ~300 MB one-time install): ```bash pip install playwright beautifulsoup4 lxml && playwright install chromium python3 "$READ_SCRIPT_DIR/fetch_weixin.py" "{url}" ``` #!/usr/bin/env python3 """Fetch Feishu/Lark document as Markdown via Feishu Open AP