Jina Reader

Name: Jina Reader
Author: sundial-org

sundial-org/awesome-openclaw-skills

1k installs
635 repo stars
Updated March 7, 2026
sundial-org/awesome-openclaw-skills

jina-reader is a bash CLI skill that converts URLs or search queries into LLM-ready markdown or JSON via Jina AI Reader for developers building research and agent ingestion pipelines.

About

jina-reader is an awesome-openclaw-skills bash script that wraps Jina AI Reader for web content extraction. Run reader.sh with a URL or query to get markdown by default, or switch modes: read converts a URL to markdown, search performs web search with full content extraction, and ground fact-checks a statement. Options include CSS selector targeting, wait-for-element delays, remove-selector cleanup, proxy country selection, cache control, and JSON output. Developers reach for it when agents need clean page text without building custom scrapers. The script is shell-native and suitable for OpenClaw or Claude agent toolchains.

Supports three modes: read (URL to markdown), search (web search with extraction), and ground (fact-check statements)
Advanced extraction options including CSS selector, wait-for-element, element removal, geo-proxy, and no-cache
Output formats: markdown, html, text, or screenshot with optional raw JSON
Designed as a reusable MCP-compatible CLI for agentic research loops

Jina Reader by the numbers

1,001 all-time installs (skills.sh)
+2 installs in the week ending Jul 28, 2026 (Skillselion tracking)
Ranked #1,013 of 16,659 AI & Agent Building skills by installs in the Skillselion catalog
Security screen: MEDIUM risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

npx skills add https://github.com/sundial-org/awesome-openclaw-skills --skill jina-reader

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/sundial-org/awesome-openclaw-skills/jina-reader.svg)](https://skillselion.com/skills/sundial-org/awesome-openclaw-skills/jina-reader)

Installs	1k
repo stars	★ 635
Security audit	2 / 3 scanners passed
Last updated	March 7, 2026
Repository	sundial-org/awesome-openclaw-skills ↗

How do you extract clean markdown from URLs for LLMs?

Turn any web URL or search query into clean, LLM-ready markdown or structured data for research and agent workflows.

Who is it for?

Developers wiring web ingestion into agent workflows who want Jina Reader modes without custom scraper code.

Skip if: Authenticated behind-login pages, heavy JavaScript SPAs without wait selectors, or workflows that need raw HTML DOM manipulation.

When should I use this skill?

User needs to read a URL as markdown, search the web with full content extraction, or fact-check a statement for an agent.

What you get

LLM-ready markdown or JSON page content, search results with extracted bodies, or grounded fact-check output.

markdown page content
JSON structured extraction
grounded fact-check results

By the numbers

Exposes 3 extraction modes: read, search, and ground

Files

SKILL.mdMarkdownGitHub ↗

Jina Reader

Extract clean web content via Jina AI — without exposing your server IP.

Read a URL

{baseDir}/scripts/reader.sh "https://example.com/article"

Search the web (top 5 results with full content)

{baseDir}/scripts/reader.sh --mode search "latest AI news 2025"

Fact-check a statement

{baseDir}/scripts/reader.sh --mode ground "OpenAI was founded in 2015"

Options

Flag	Description	Default
`--mode`	`read`, `search`, `ground`	`read`
`--selector`	CSS selector to extract specific region	—
`--wait`	CSS selector to wait for before extraction	—
`--remove`	CSS selectors to remove (comma-separated)	—
`--proxy`	Country code for geo-proxy (`br`, `us`, etc.)	—
`--nocache`	Force fresh content (skip cache)	off
`--format`	`markdown`, `html`, `text`, `screenshot`	`markdown`
`--json`	Raw JSON output	off

Examples

# Extract article content
{baseDir}/scripts/reader.sh "https://blog.example.com/post"

# Extract specific section via CSS selector
{baseDir}/scripts/reader.sh --selector "article.main" "https://example.com"

# Remove nav and ads before extraction
{baseDir}/scripts/reader.sh --remove "nav,footer,.ads" "https://example.com"

# Search with JSON output
{baseDir}/scripts/reader.sh --mode search --json "AI enterprise trends"

# Read via Brazil proxy
{baseDir}/scripts/reader.sh --proxy br "https://example.com.br"

# Fact-check a claim
{baseDir}/scripts/reader.sh --mode ground "Tesla is the most valuable car company"

API Key

export JINA_API_KEY="jina_..."

Free tier: 10M tokens (no signup needed). Get key at https://jina.ai/reader/

Pricing

Read: ~$0.005/page (standard) | 3x for ReaderLM-v2
Search: 10K tokens fixed + variable per result
Ground: ~300K tokens/request (~30s latency)

Why Jina Reader?

IP protection — requests route through Jina's infra, not your server
Clean markdown — readability extraction + optional ReaderLM-v2
Dynamic content — headless Chrome renders JavaScript
Structured extraction — JSON schema support for data extraction

#!/usr/bin/env bash
set -euo pipefail

# Jina AI Reader - Web content extraction
# Usage: reader.sh [options] "URL or query"

MODE="read"
SELECTOR=""
WAIT_SELECTOR=""
REMOVE_SELECTOR=""
PROXY_COUNTRY=""
NO_CACHE=false
FORMAT="markdown"
JSON_OUTPUT=false
INPUT=""

usage() {
  echo "Usage: $(basename "$0") [options] \"URL or query\""
  echo ""
  echo "Modes:"
  echo "  --mode read     Convert URL to markdown (default)"
  echo "  --mode search   Web search with full content extraction"
  echo "  --mode ground   Fact-check a statement"
  echo ""
  echo "Options:"
  echo "  --selector CSS    Extract specific region"
  echo "  --wait CSS        Wait for element before extraction"
  echo "  --remove CSS      Remove elements (comma-separated)"
  echo "  --proxy CC        Country code for geo-proxy"
  echo "  --nocache         Force fresh content"
  echo "  --format FMT      markdown|html|text|screenshot"
  echo "  --json            Raw JSON output"
  exit 1
}

while [[ $# -gt 0 ]]; do
  case "$1" in
    --mode) MODE="$2"; shift 2 ;;
    --selector) SELECTOR="$2"; shift 2 ;;
    --wait) WAIT_SELECTOR="$2"; shift 2 ;;
    --remove) REMOVE_SELECTOR="$2"; shift 2 ;;
    --proxy) PROXY_COUNTRY="$2"; shift 2 ;;
    --nocache) NO_CACHE=true; shift ;;
    --format) FORMAT="$2"; shift 2 ;;
    --json) JSON_OUTPUT=true; shift ;;
    --help|-h) usage ;;
    -*) echo "Unknown option: $1" >&2; usage ;;
    *) INPUT="$1"; shift ;;
  esac
done

if [[ -z "$INPUT" ]]; then
  echo "Error: No URL or query provided" >&2
  usage
fi

# API key is optional but recommended for higher rate limits
API_KEY="${JINA_API_KEY:-}"

# Build headers array
HEADERS=()

if [[ -n "$API_KEY" ]]; then
  HEADERS+=(-H "Authorization: Bearer $API_KEY")
fi

HEADERS+=(-H "Accept: application/json")

if [[ -n "$SELECTOR" ]]; then
  HEADERS+=(-H "x-target-selector: $SELECTOR")
fi

if [[ -n "$WAIT_SELECTOR" ]]; then
  HEADERS+=(-H "x-wait-for-selector: $WAIT_SELECTOR")
fi

if [[ -n "$REMOVE_SELECTOR" ]]; then
  HEADERS+=(-H "x-remove-selector: $REMOVE_SELECTOR")
fi

if [[ -n "$PROXY_COUNTRY" ]]; then
  HEADERS+=(-H "x-country-proxy: $PROXY_COUNTRY")
fi

if [[ "$NO_CACHE" == true ]]; then
  HEADERS+=(-H "x-no-cache: true")
fi

if [[ "$FORMAT" != "markdown" ]]; then
  HEADERS+=(-H "x-respond-with: $FORMAT")
fi

# Execute based on mode
case "$MODE" in
  read)
    RESPONSE=$(curl -sS "${HEADERS[@]}" "https://r.jina.ai/${INPUT}" 2>&1)
    ;;
  search)
    ENCODED_QUERY=$(printf '%s' "$INPUT" | jq -sRr @uri)
    RESPONSE=$(curl -sS "${HEADERS[@]}" "https://s.jina.ai/${ENCODED_QUERY}" 2>&1)
    ;;
  ground)
    RESPONSE=$(curl -sS "${HEADERS[@]}" \
      -H "Content-Type: application/json" \
      -X POST "https://g.jina.ai/" \
      -d "$(jq -n --arg s "$INPUT" '{statement: $s}')" 2>&1)
    ;;
  *)
    echo "Error: Invalid mode '$MODE'. Use: read, search, ground" >&2
    exit 1
    ;;
esac

# Check for errors (Jina returns code 200 on success, 4xx/5xx on error)
RESP_CODE=$(echo "$RESPONSE" | jq -r '.code // empty' 2>/dev/null)
if [[ -n "$RESP_CODE" && "$RESP_CODE" != "200" ]]; then
  ERROR_MSG=$(echo "$RESPONSE" | jq -r '.message // .readableMessage // "Unknown error"')
  echo "Error ($RESP_CODE): $ERROR_MSG" >&2
  exit 1
fi

# Output
if [[ "$JSON_OUTPUT" == true ]]; then
  echo "$RESPONSE" | jq .
  exit 0
fi

# Format output based on mode
case "$MODE" in
  read)
    TITLE=$(echo "$RESPONSE" | jq -r '.data.title // empty')
    URL=$(echo "$RESPONSE" | jq -r '.data.url // empty')
    CONTENT=$(echo "$RESPONSE" | jq -r '.data.content // .data // empty')

    if [[ -n "$TITLE" ]]; then
      echo "# $TITLE"
      echo ""
    fi
    if [[ -n "$URL" ]]; then
      echo "Source: $URL"
      echo ""
    fi
    if [[ -n "$CONTENT" ]]; then
      echo "$CONTENT"
    else
      echo "Error: No content extracted" >&2
      echo "$RESPONSE" | jq . >&2
      exit 1
    fi
    ;;

  search)
    RESULTS=$(echo "$RESPONSE" | jq -r '.data // empty')
    if [[ -z "$RESULTS" || "$RESULTS" == "null" ]]; then
      echo "Error: No search results" >&2
      echo "$RESPONSE" | jq . >&2
      exit 1
    fi

    echo "$RESPONSE" | jq -r '.data[] | "## \(.title // "Untitled")\nURL: \(.url // "N/A")\n\n\(.content // .description // "No content")\n\n---\n"'
    ;;

  ground)
    FACTUALITY=$(echo "$RESPONSE" | jq -r '.data.factuality // empty')
    RESULT=$(echo "$RESPONSE" | jq -r '.data.result // empty')
    REASON=$(echo "$RESPONSE" | jq -r '.data.reason // empty')

    echo "Factuality: $FACTUALITY"
    echo "Result: $RESULT"
    echo ""
    if [[ -n "$REASON" ]]; then
      echo "$REASON"
    fi

    REFERENCES=$(echo "$RESPONSE" | jq -r '.data.references[]?.url // empty' 2>/dev/null)
    if [[ -n "$REFERENCES" ]]; then
      echo ""
      echo "---"
      echo "Sources:"
      echo "$REFERENCES" | while IFS= read -r url; do
        echo "  - $url"
      done
    fi
    ;;
esac

Related skills

Setup Matt Pocock SkillsScaffold the per-repo configuration that Matt Pocock’s engineering agent skills rely on so they understand the issue tracker, triage labels, and domain documentation la462k185k

Lark Skill MakerQuickly turn any Lark/Feishu OpenAPI call or multi-step workflow into a reusable agent skill with its own SKILL.md.379k15.8k

CavemanSlash token usage by roughly 75% while keeping every technical detail intact when working with Claude Code, Cursor or similar agents.378k92.5k

Lark AppsConnect Claude, Cursor or custom agents directly to Lark (Feishu) for messaging, document automation, approval workflows and enterprise data access.375k

Running Claude Code Via Litellm CopilotRun Claude Code at a fraction of the cost by routing requests through LiteLLM to the GitHub Copilot Chat API.270k72

Codex PetGenerate a complete Codex Pet spritesheet and metadata from one reference image without needing an OpenAI key or Codex Pro.246k8

FAQ

What modes does jina-reader support?

jina-reader supports three modes via reader.sh: read converts a URL to markdown, search runs web search with full content extraction, and ground fact-checks a statement. Default output format is markdown with optional JSON.

How do you target part of a page with jina-reader?

jina-reader accepts --selector to extract a CSS region, --wait to delay until an element appears, and --remove-selector to strip unwanted sections. These flags run through the bash reader.sh wrapper around Jina AI Reader.

Is Jina Reader safe to install?

skills.sh reports 2 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

AI & Agent Buildingagentsresearchautomation

About

Jina Reader by the numbers

Add your badge

How do you extract clean markdown from URLs for LLMs?

Who is it for?

When should I use this skill?

What you get

By the numbers

Files

Jina Reader

Read a URL

Search the web (top 5 results with full content)

Fact-check a statement

Options

Examples

API Key

Pricing

Why Jina Reader?

Related skills

FAQ

What modes does jina-reader support?

How do you target part of a page with jina-reader?

Is Jina Reader safe to install?

This week in AI coding