Paddleocr Text Recognition

Name: Paddleocr Text Recognition
Author: aidenwu0209

aidenwu0209/paddleocr-skills

Wire PaddleOCR text recognition into agent workflows with a predictable JSON envelope from ocr_caller.py.

Overview

PaddleOCR Text Recognition is an agent skill for the Build phase that standardizes PaddleOCR ocr_caller.py JSON output so agents reliably read extracted text and error codes.

Install

npx skills add https://github.com/aidenwu0209/paddleocr-skills --skill paddleocr-text-recognition

What is this skill?

Stable ok/text/result/error JSON envelope for every OCR run
Three error codes: INPUT_ERROR, CONFIG_ERROR, API_ERROR
Default saves JSON under system temp and prints path to stderr; --output and --stdout overrides
Raw provider OCR results preserved in result for version-specific fields
3 documented error codes: INPUT_ERROR, CONFIG_ERROR, API_ERROR

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 3.4k installs on skills.sh; 25 GitHub stars; 2/3 security scanners passed (skills.sh audits).

What problem does it solve?

You invoked PaddleOCR from a script or agent but cannot trust or parse inconsistent provider JSON across runs.

Who is it for?

Solo builders piping scans, screenshots, or PDFs through PaddleOCR inside Claude Code or Cursor automation.

Skip if: Teams that only need one-off manual OCR in a desktop app with no agent parsing requirement.

When should I use this skill?

After running or wrapping ocr_caller.py when you need to interpret saved JSON or --stdout output.

What do I get? / Deliverables

Every OCR invocation returns the same envelope fields so your agent reads text on success and acts on INPUT_ERROR, CONFIG_ERROR, or API_ERROR on failure.

Parsed text string from envelope
Structured error handling by code
Optional path to saved JSON file

Recommended Skills

Microsoft Foundrymicrosoft/azure-skills

Microsoft Foundry skill guides agents through the full Azure AI Foundry lifecycle—containerizing agents, pushing to ACR,…377k installs·1.2k stars

Azure Aimicrosoft/azure-skills

azure-ai is a Prism-oriented quick reference for Microsoft Azure AI work, with the published body centered on the Azure …375k installs·1.2k stars

Azure Hosted Copilot Sdkmicrosoft/azure-skills

Azure Hosted Copilot SDK is Microsoft's entry skill for repos using @github/copilot-sdk—it detects CopilotClient usage, …346k installs·1.2k stars

Lark Eventlarksuite/cli

Lark real-time subscription skill via lark-cli event consume for building bots and streaming webhook-style agent workers…208k installs·13.7k stars

Running Claude Code Via Litellm Copilotxixu-me/skills

Running Claude Code via LiteLLM Copilot walks through pointing Claude Code at a local LiteLLM proxy that forwards Anthro…200k installs·61 stars

Setup Matt Pocock Skillsmattpocock/skills

One-time per-repo setup so Matt Pocock engineering skills share correct issue tracker, triage strings, and domain docume…180k installs·121k stars

Journey fit

Primary fit

BuildIntegrations & version control

OCR is integrated while building features that ingest images or PDFs, not during idea or launch work. The skill documents the ocr_caller.py contract—CLI args, temp output paths, and provider payloads—which is third-party service integration.

How it compares

Use for documented CLI output shape instead of ad-hoc Paddle API response parsing in chat.

Common Questions / FAQ

Who is paddleocr-text-recognition for?

Indie builders and agent authors who call ocr_caller.py and need a fixed JSON contract for text extraction and errors.

When should I use paddleocr-text-recognition?

During Build when integrating document or image OCR—before wiring downstream validation, search indexing, or RAG ingest on extracted text.

Is paddleocr-text-recognition safe to install?

Review the Security Audits panel on this Prism page and treat OCR inputs as sensitive; the skill describes output only and may imply network calls to OCR APIs when used with live credentials.

SKILL.md

READMESKILL.md - Paddleocr Text Recognition

# PaddleOCR Text Recognition Output Schema

This document defines the output envelope returned by `ocr_caller.py`.

By default, `ocr_caller.py` saves the JSON envelope to a unique file under the system temp directory and prints the absolute saved path to `stderr`. Use `--output` when you need a custom destination, or `--stdout` when you want to skip file saving and print JSON directly.

## Output Envelope

`ocr_caller.py` wraps provider response in a stable structure:

```json
{
  "ok": true,
  "text": "Extracted text from all pages",
  "result": { ... },  // raw provider response
  "error": null
}
```

On error:

```json
{
  "ok": false,
  "text": "",
  "result": null,
  "error": {
    "code": "ERROR_CODE",
    "message": "Human-readable message"
  }
}
```

## Error Codes

| Code           | Description                                                                 |
| -------------- | --------------------------------------------------------------------------- |
| `INPUT_ERROR`  | Invalid or unusable input (arguments, file source, format, types). |
| `CONFIG_ERROR` | Missing or invalid API / client configuration.       |
| `API_ERROR`    | Request or response handling failed (network, HTTP, body parsing, schema). |

## Raw Result Notes

The `result` field contains raw provider output.  
Raw fields may vary by model version and endpoint.

## Raw Result Example

```json
{
  "logId": "request-uuid",
  "errorCode": 0,
  "errorMsg": "Success",
  "result": {
    "ocrResults": [
      {
        "prunedResult": {
          "rec_texts": ["First line", "Second line"],
          "rec_scores": [0.98, 0.95],
          "...": "other OCR fields"
        },
        "ocrImage": "https://...",
        "inputImage": "https://...",
        "...": "other model-specific fields"
      }
    ],
    "dataInfo": {
      "numPages": 1,
      "type": "pdf",
      "...": "other metadata"
    },
    "...": "other top-level fields"
  }
}
```

## Stable Fields for Downstream Use

Paths are relative to the output envelope root.

- `result.result.ocrResults[n].prunedResult`
  Structured OCR data for page `n`.

- `result.result.ocrResults[n].prunedResult.rec_texts`
  Recognized text lines for page `n`.

- `result.result.ocrResults[n].prunedResult.rec_scores`
  Confidence scores for recognized text lines.

## Text Extraction

`ocr_caller.py` extracts top-level `text` from `result.result.ocrResults[n].prunedResult.rec_texts`, joins lines with `\n`, and joins pages with `\n\n`.

## Command Examples

```bash
# OCR from URL (result auto-saves to the system temp directory)
uv run scripts/ocr_caller.py --file-url "URL" --pretty

# OCR local file (result auto-saves to the system temp directory)
uv run scripts/ocr_caller.py --file-path "doc.pdf" --pretty

# OCR with explicit file type
uv run scripts/ocr_caller.py --file-url "URL" --file-type 1 --pretty

# Save result to a custom file path
uv run scripts/ocr_caller.py --file-url "URL" --output "./result.json" --pretty

# Print JSON to stdout without saving a file
uv run scripts/ocr_caller.py --file-url "URL" --stdout --pretty
```


"""
PaddleOCR Text Recognition Library

Simple OCR API wrapper for PaddleOCR text recognition.
"""

import base64
import logging
import math
import os
from pathlib import Path
from typing import Any, Optional
from urllib.parse import unquote, urlparse

import httpx

logger = logging.getLogger(__name__)

# =============================================================================
# Constants
# =============================================================================

DEFAULT_TIMEOUT = 120  # seconds
API_GUIDE_URL = "https://paddleocr.com"
FILE_TYPE_PDF = 0
FILE_TYPE_IMAGE = 1
IMAGE_EXTENSIONS = (".png", ".jpg", ".jpeg", ".bmp", ".tiff", ".tif", ".webp")


# =============================================================================
# Environment
# =============================================================================


def _get_env(key: str) -> str:
    """Get environment vari

What is this skill?

Stable ok/text/result/error JSON envelope for every OCR run

Three error codes: INPUT_ERROR, CONFIG_ERROR, API_ERROR

Default saves JSON under system temp and prints path to stderr; --output and --stdout overrides

Raw provider OCR results preserved in result for version-specific fields

3 documented error codes: INPUT_ERROR, CONFIG_ERROR, API_ERROR

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 3.4k installs on skills.sh; 25 GitHub stars; 2/3 security scanners passed (skills.sh audits).

Journey fit

Primary fit

BuildIntegrations & version control

SKILL.md

READMESKILL.md - Paddleocr Text Recognition

# PaddleOCR Text Recognition Output Schema

This document defines the output envelope returned by `ocr_caller.py`.

By default, `ocr_caller.py` saves the JSON envelope to a unique file under the system temp directory and prints the absolute saved path to `stderr`. Use `--output` when you need a custom destination, or `--stdout` when you want to skip file saving and print JSON directly.

## Output Envelope

`ocr_caller.py` wraps provider response in a stable structure:

```json
{
  "ok": true,
  "text": "Extracted text from all pages",
  "result": { ... },  // raw provider response
  "error": null
}
```

On error:

```json
{
  "ok": false,
  "text": "",
  "result": null,
  "error": {
    "code": "ERROR_CODE",
    "message": "Human-readable message"
  }
}
```

## Error Codes

| Code           | Description                                                                 |
| -------------- | --------------------------------------------------------------------------- |
| `INPUT_ERROR`  | Invalid or unusable input (arguments, file source, format, types). |
| `CONFIG_ERROR` | Missing or invalid API / client configuration.       |
| `API_ERROR`    | Request or response handling failed (network, HTTP, body parsing, schema). |

## Raw Result Notes

The `result` field contains raw provider output.  
Raw fields may vary by model version and endpoint.

## Raw Result Example

```json
{
  "logId": "request-uuid",
  "errorCode": 0,
  "errorMsg": "Success",
  "result": {
    "ocrResults": [
      {
        "prunedResult": {
          "rec_texts": ["First line", "Second line"],
          "rec_scores": [0.98, 0.95],
          "...": "other OCR fields"
        },
        "ocrImage": "https://...",
        "inputImage": "https://...",
        "...": "other model-specific fields"
      }
    ],
    "dataInfo": {
      "numPages": 1,
      "type": "pdf",
      "...": "other metadata"
    },
    "...": "other top-level fields"
  }
}
```

## Stable Fields for Downstream Use

Paths are relative to the output envelope root.

- `result.result.ocrResults[n].prunedResult`
  Structured OCR data for page `n`.

- `result.result.ocrResults[n].prunedResult.rec_texts`
  Recognized text lines for page `n`.

- `result.result.ocrResults[n].prunedResult.rec_scores`
  Confidence scores for recognized text lines.

## Text Extraction

`ocr_caller.py` extracts top-level `text` from `result.result.ocrResults[n].prunedResult.rec_texts`, joins lines with `\n`, and joins pages with `\n\n`.

## Command Examples

```bash
# OCR from URL (result auto-saves to the system temp directory)
uv run scripts/ocr_caller.py --file-url "URL" --pretty

# OCR local file (result auto-saves to the system temp directory)
uv run scripts/ocr_caller.py --file-path "doc.pdf" --pretty

# OCR with explicit file type
uv run scripts/ocr_caller.py --file-url "URL" --file-type 1 --pretty

# Save result to a custom file path
uv run scripts/ocr_caller.py --file-url "URL" --output "./result.json" --pretty

# Print JSON to stdout without saving a file
uv run scripts/ocr_caller.py --file-url "URL" --stdout --pretty
```


"""
PaddleOCR Text Recognition Library

Simple OCR API wrapper for PaddleOCR text recognition.
"""

import base64
import logging
import math
import os
from pathlib import Path
from typing import Any, Optional
from urllib.parse import unquote, urlparse

import httpx

logger = logging.getLogger(__name__)

# =============================================================================
# Constants
# =============================================================================

DEFAULT_TIMEOUT = 120  # seconds
API_GUIDE_URL = "https://paddleocr.com"
FILE_TYPE_PDF = 0
FILE_TYPE_IMAGE = 1
IMAGE_EXTENSIONS = (".png", ".jpg", ".jpeg", ".bmp", ".tiff", ".tif", ".webp")


# =============================================================================
# Environment
# =============================================================================


def _get_env(key: str) -> str:
    """Get environment vari

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is paddleocr-text-recognition for?

When should I use paddleocr-text-recognition?

Is paddleocr-text-recognition safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is paddleocr-text-recognition for?

When should I use paddleocr-text-recognition?

Is paddleocr-text-recognition safe to install?

SKILL.md