
Jina Reader
Turn URLs into clean markdown, run search-with-full-text extraction, or ground claims—via a bash wrapper around Jina Reader modes.
Overview
Jina Reader is an agent skill most often used in Idea/research (also Validate/scope, Grow/content) that extracts web pages and search results into markdown via Jina Reader bash modes.
Install
npx skills add https://github.com/sundial-org/awesome-openclaw-skills --skill jina-readerWhat is this skill?
- Three modes: read (URL→markdown), search (full content), ground (fact-check)
- Output formats markdown, html, text, or screenshot
- CSS selector, wait, remove, geo-proxy, and nocache controls
- Bash CLI with --json for agent pipelines
- Geo-proxy country code for region-specific pages
- 3 CLI modes: read, search, ground
- 4 output formats: markdown, html, text, screenshot
Adoption & trust: 958 installs on skills.sh; 609 GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
Your agent needs readable web evidence but raw fetches return noisy HTML, broken SPAs, or unstructured search snippets.
Who is it for?
Solo builders automating competitor teardowns, doc ingestion, and claim verification inside agent or OpenClaw-style skill stacks.
Skip if: High-volume crawling behind auth walls where you need a dedicated browser automation farm instead of a reader API.
When should I use this skill?
An agent needs a URL or query converted via Jina Reader with optional selectors, proxy, or JSON output.
What do I get? / Deliverables
You get markdown, text, html, or screenshot artifacts—and grounded checks in search/ground modes—for downstream summarization or spec validation.
- Markdown or text extraction
- Search results with full content
- Grounding output for fact-check flows
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Idea/research is the canonical shelf because the default read mode and search mode exist to ingest and compare external web evidence before you commit to build scope. Research subphase fits URL-to-markdown and search extraction used for competitor pages, docs, and news—not shipping production monitoring.
Where it fits
Read a competitor pricing page to markdown with --wait for the pricing table selector.
Run ground mode on a market-size claim before locking the MVP brief.
Search mode on a niche query, then summarize extracted articles for a weekly post.
Nocache read of third-party API docs into markdown for a pre-release integration checklist.
How it compares
Jina Reader API bash bridge, not a self-hosted Playwright scraper or generic curl-to-file dump.
Common Questions / FAQ
Who is jina-reader for?
Indie developers and agent authors who want scripted URL-to-markdown, search extraction, and grounding without writing a custom scraper each time.
When should I use jina-reader?
Use it in Idea/research to capture competitor pages, in Validate/scope to ground pricing or feature claims, and in Grow/content to pull source articles for summaries—pass URLs or queries to reader.sh with the right --mode.
Is jina-reader safe to install?
Review the Security Audits panel on this page, protect Jina API keys in env vars, and treat fetched URLs as untrusted input in shell pipelines.
SKILL.md
READMESKILL.md - Jina Reader
#!/usr/bin/env bash set -euo pipefail # Jina AI Reader - Web content extraction # Usage: reader.sh [options] "URL or query" MODE="read" SELECTOR="" WAIT_SELECTOR="" REMOVE_SELECTOR="" PROXY_COUNTRY="" NO_CACHE=false FORMAT="markdown" JSON_OUTPUT=false INPUT="" usage() { echo "Usage: $(basename "$0") [options] \"URL or query\"" echo "" echo "Modes:" echo " --mode read Convert URL to markdown (default)" echo " --mode search Web search with full content extraction" echo " --mode ground Fact-check a statement" echo "" echo "Options:" echo " --selector CSS Extract specific region" echo " --wait CSS Wait for element before extraction" echo " --remove CSS Remove elements (comma-separated)" echo " --proxy CC Country code for geo-proxy" echo " --nocache Force fresh content" echo " --format FMT markdown|html|text|screenshot" echo " --json Raw JSON output" exit 1 } while [[ $# -gt 0 ]]; do case "$1" in --mode) MODE="$2"; shift 2 ;; --selector) SELECTOR="$2"; shift 2 ;; --wait) WAIT_SELECTOR="$2"; shift 2 ;; --remove) REMOVE_SELECTOR="$2"; shift 2 ;; --proxy) PROXY_COUNTRY="$2"; shift 2 ;; --nocache) NO_CACHE=true; shift ;; --format) FORMAT="$2"; shift 2 ;; --json) JSON_OUTPUT=true; shift ;; --help|-h) usage ;; -*) echo "Unknown option: $1" >&2; usage ;; *) INPUT="$1"; shift ;; esac done if [[ -z "$INPUT" ]]; then echo "Error: No URL or query provided" >&2 usage fi # API key is optional but recommended for higher rate limits API_KEY="${JINA_API_KEY:-}" # Build headers array HEADERS=() if [[ -n "$API_KEY" ]]; then HEADERS+=(-H "Authorization: Bearer $API_KEY") fi HEADERS+=(-H "Accept: application/json") if [[ -n "$SELECTOR" ]]; then HEADERS+=(-H "x-target-selector: $SELECTOR") fi if [[ -n "$WAIT_SELECTOR" ]]; then HEADERS+=(-H "x-wait-for-selector: $WAIT_SELECTOR") fi if [[ -n "$REMOVE_SELECTOR" ]]; then HEADERS+=(-H "x-remove-selector: $REMOVE_SELECTOR") fi if [[ -n "$PROXY_COUNTRY" ]]; then HEADERS+=(-H "x-country-proxy: $PROXY_COUNTRY") fi if [[ "$NO_CACHE" == true ]]; then HEADERS+=(-H "x-no-cache: true") fi if [[ "$FORMAT" != "markdown" ]]; then HEADERS+=(-H "x-respond-with: $FORMAT") fi # Execute based on mode case "$MODE" in read) RESPONSE=$(curl -sS "${HEADERS[@]}" "https://r.jina.ai/${INPUT}" 2>&1) ;; search) ENCODED_QUERY=$(printf '%s' "$INPUT" | jq -sRr @uri) RESPONSE=$(curl -sS "${HEADERS[@]}" "https://s.jina.ai/${ENCODED_QUERY}" 2>&1) ;; ground) RESPONSE=$(curl -sS "${HEADERS[@]}" \ -H "Content-Type: application/json" \ -X POST "https://g.jina.ai/" \ -d "$(jq -n --arg s "$INPUT" '{statement: $s}')" 2>&1) ;; *) echo "Error: Invalid mode '$MODE'. Use: read, search, ground" >&2 exit 1 ;; esac # Check for errors (Jina returns code 200 on success, 4xx/5xx on error) RESP_CODE=$(echo "$RESPONSE" | jq -r '.code // empty' 2>/dev/null) if [[ -n "$RESP_CODE" && "$RESP_CODE" != "200" ]]; then ERROR_MSG=$(echo "$RESPONSE" | jq -r '.message // .readableMessage // "Unknown error"') echo "Error ($RESP_CODE): $ERROR_MSG" >&2 exit 1 fi # Output if [[ "$JSON_OUTPUT" == true ]]; then echo "$RESPONSE" | jq . exit 0 fi # Format output based on mode case "$MODE" in read) TITLE=$(echo "$RESPONSE" | jq -r '.data.title // empty') URL=$(echo "$RESPONSE" | jq -r '.data.url // empty') CONTENT=$(echo "$RESPONSE" | jq -r '.data.content // .data // empty') if [[ -n "$TITLE" ]]; then echo "# $TITLE" echo "" fi if [[ -n "$URL" ]]; then echo "Source: $URL" echo "" fi if [[ -n "$CONTENT" ]]; then echo "$CONTENT" else echo "Error: No content extracted" >&2 echo "$RESPONSE" | jq . >&2 exit 1 fi ;; search) RESULTS=$(echo "$RESPONSE" | jq -r '.data // empty') if [[ -z "$RESULTS" || "$RESULTS" == "null" ]]; the