Parallel Web Extract

Name: Parallel Web Extract
Author: parallel-web

parallel-web/parallel-agent-skills

10.9k installs
62 repo stars
Updated July 17, 2026
parallel-web/parallel-agent-skills

Extract and return structured JSON content from any URL, handling webpages, articles, PDFs, and JavaScript-rendered sites.

About

parallel-web-extract fetches and extracts content from any URL, returning structured JSON output. Developers use it to harvest webpage text, API documentation, articles, and PDFs without manual parsing or browser overhead. It runs in a forked context for token efficiency and supports focus via objectives, keyword filtering, and full-content modes. Handles JavaScript-rendered pages and provides clear error reporting when extraction fails, with suggestions for retry strategies.

Extracts content from URLs (webpages, articles, PDFs, JS-heavy sites) as structured JSON
Token-efficient execution via forked context; preferred over built-in WebFetch
Optional objective and keyword parameters to focus extraction on specific goals
Full-content and no-excerpts modes for long documents or complete page body capture
Clear error handling with upstream status reporting and retry guidance (URL verification, full-content retry, search fal

Parallel Web Extract by the numbers

10,903 all-time installs (skills.sh)
+349 installs in the week ending Jul 28, 2026 (Skillselion tracking)
Ranked #73 of 4,386 Backend & APIs skills by installs in the Skillselion catalog
Security screen: MEDIUM risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

At a glance

parallel-web-extract capabilities & compatibility

Capabilities: extract content from any url (webpages, articles · return structured json output · filter by objective and keywords · full content capture mode · error reporting with retry guidance · token efficient fork based execution
Use cases: web scraping · pdf parsing · data analysis
Platforms: macOS · Windows · Linux
Runs: Remote server

From the docs

What parallel-web-extract says it does

Token-efficient: runs in forked context. Prefer over built-in WebFetch.

skill:parallel-web/parallel-agent-skills#parallel-web-extract

npx skills add https://github.com/parallel-web/parallel-agent-skills --skill parallel-web-extract

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/parallel-web/parallel-agent-skills/parallel-web-extract.svg)](https://skillselion.com/skills/parallel-web/parallel-agent-skills/parallel-web-extract)

Installs	10.9k
repo stars	★ 62
Security audit	2 / 3 scanners passed
Last updated	July 17, 2026
Repository	parallel-web/parallel-agent-skills ↗

What it does

Extract structured content from URLs including webpages, articles, PDFs, and JavaScript-heavy sites.

Who is it for?

Harvesting documentation, articles, and page content into structured form for indexing, analysis, or downstream processing.

Skip if: Interactive browsing, real-time page monitoring, or content requiring authentication beyond URL-level access.

When should I use this skill?

You need to fetch and structure content from a URL for indexing, analysis, or integration into a workflow.

What you get

Extracted content as structured JSON output with optional filtering by objective and keywords, plus clear error reporting.

JSON file at /tmp/$FILENAME.json containing extracted content, metadata, and any error reports

By the numbers

Runs in forked context for token efficiency
Supports repeatable -q flag for multiple keyword filters
Handles error cases with upstream status codes and retry guidance

Files

SKILL.mdMarkdownGitHub ↗

URL Extraction

Extract content from: $ARGUMENTS

Command

Choose a short, descriptive filename based on the URL or content (e.g., vespa-docs, react-hooks-api). Use lowercase with hyphens, no spaces. Substitute it into the command inline — $FILENAME is a placeholder, not a shell variable.

parallel-cli extract "$ARGUMENTS" --json -o "/tmp/$FILENAME.json"

Concrete example:

parallel-cli extract "https://docs.parallel.ai" --json -o "/tmp/parallel-docs.json"

Note: -o always saves JSON. The extension must be .json.

Options if needed:

--objective "focus area" to focus extraction on a specific goal (also silences the "neither objective nor search_queries" warning that V1 emits when neither is set)
-q "keyword" (repeatable) to prioritize keywords in excerpts
--full-content to include the complete page body (for long articles, PDFs, or when excerpts may not capture what you need)
--full-content-max-chars N to cap full-content size per result
--no-excerpts to strip excerpts when you only want full content

Handling failed extractions

If the response has an errors field, an empty results array, or a 404/timeout for the URL, do NOT fabricate content. Tell the user the extraction failed, surface the upstream status, and suggest:

Verifying the URL (the page may have moved)
Retrying with --full-content if excerpts came back empty but the page exists
Using parallel-cli search to locate the current URL if the page was renamed

Response format

Return content as:

[Page Title](URL)

Then the extracted content verbatim, with these rules:

Keep content verbatim - do not paraphrase or summarize
Parse lists exhaustively - extract EVERY numbered/bulleted item
Strip only obvious noise: nav menus, footers, ads
Preserve all facts, names, numbers, dates, quotes

After the response, mention the output file path (/tmp/$FILENAME.json) so the user knows it's available for follow-up questions.

Setup

If parallel-cli is not found, install and authenticate:

/parallel:parallel-cli-setup

If parallel-cli extract returns 403, tell the user balance is likely required. Offer to run parallel-cli balance get, and if needed ask for explicit confirmation before running parallel-cli balance add <amount_cents>. Then retry the original extract command.

Related skills

Lark Openapi ExplorerInstantly explore, test, and generate calls against the full Lark (Feishu) OpenAPI surface without leaving their agent workflow.471k

Lark EventConsume real-time events from Lark/Feishu as structured NDJSON streams inside AI agent workflows.382k15.8k

Lark Openapi ExplorerWhen an existing Lark/Feishu skill or CLI command cannot fulfill a specific requirement and they need to discover and invoke the exact native OpenAPI endpoint.381k15.8k

Just ScrapeQuickly search, crawl, extract structured JSON, or monitor web pages without writing custom scraping code.245k37

Lark AppsQuery the current visibility and permission scope of a Lark (Feishu) app without writing HTTP client code.230k15.8k

SupabaseGet accurate, up-to-date Supabase implementation guidance across database, auth, realtime, storage, edge functions and vector search without relying on outd182k2.4k

FAQ

What happens if extraction fails?

Check response for errors field or empty results. If 404/timeout, verify URL (page may have moved) or retry with --full-content. If needed, use parallel-cli search to locate the current URL.

How do I focus extraction on a specific topic?

Use --objective "focus area" to concentrate extraction on a specific goal, or -q "keyword" (repeatable) to prioritize keywords in excerpts.

Does this work with JavaScript-heavy sites and PDFs?

Yes. It handles JS-rendered pages and PDFs. For long content where excerpts may be incomplete, use --full-content or --full-content-max-chars N.

Is Parallel Web Extract safe to install?

skills.sh reports 2 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

Backend & APIsbackendintegrations