
Paddleocr Text Recognition
Wire PaddleOCR text recognition into agent workflows with a predictable JSON envelope from ocr_caller.py.
Overview
PaddleOCR Text Recognition is an agent skill for the Build phase that standardizes PaddleOCR ocr_caller.py JSON output so agents reliably read extracted text and error codes.
Install
npx skills add https://github.com/aidenwu0209/paddleocr-skills --skill paddleocr-text-recognitionWhat is this skill?
- Stable ok/text/result/error JSON envelope for every OCR run
- Three error codes: INPUT_ERROR, CONFIG_ERROR, API_ERROR
- Default saves JSON under system temp and prints path to stderr; --output and --stdout overrides
- Raw provider OCR results preserved in result for version-specific fields
- 3 documented error codes: INPUT_ERROR, CONFIG_ERROR, API_ERROR
Adoption & trust: 3.4k installs on skills.sh; 25 GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You invoked PaddleOCR from a script or agent but cannot trust or parse inconsistent provider JSON across runs.
Who is it for?
Solo builders piping scans, screenshots, or PDFs through PaddleOCR inside Claude Code or Cursor automation.
Skip if: Teams that only need one-off manual OCR in a desktop app with no agent parsing requirement.
When should I use this skill?
After running or wrapping ocr_caller.py when you need to interpret saved JSON or --stdout output.
What do I get? / Deliverables
Every OCR invocation returns the same envelope fields so your agent reads text on success and acts on INPUT_ERROR, CONFIG_ERROR, or API_ERROR on failure.
- Parsed text string from envelope
- Structured error handling by code
- Optional path to saved JSON file
Recommended Skills
Journey fit
OCR is integrated while building features that ingest images or PDFs, not during idea or launch work. The skill documents the ocr_caller.py contract—CLI args, temp output paths, and provider payloads—which is third-party service integration.
How it compares
Use for documented CLI output shape instead of ad-hoc Paddle API response parsing in chat.
Common Questions / FAQ
Who is paddleocr-text-recognition for?
Indie builders and agent authors who call ocr_caller.py and need a fixed JSON contract for text extraction and errors.
When should I use paddleocr-text-recognition?
During Build when integrating document or image OCR—before wiring downstream validation, search indexing, or RAG ingest on extracted text.
Is paddleocr-text-recognition safe to install?
Review the Security Audits panel on this Prism page and treat OCR inputs as sensitive; the skill describes output only and may imply network calls to OCR APIs when used with live credentials.
SKILL.md
READMESKILL.md - Paddleocr Text Recognition
# PaddleOCR Text Recognition Output Schema This document defines the output envelope returned by `ocr_caller.py`. By default, `ocr_caller.py` saves the JSON envelope to a unique file under the system temp directory and prints the absolute saved path to `stderr`. Use `--output` when you need a custom destination, or `--stdout` when you want to skip file saving and print JSON directly. ## Output Envelope `ocr_caller.py` wraps provider response in a stable structure: ```json { "ok": true, "text": "Extracted text from all pages", "result": { ... }, // raw provider response "error": null } ``` On error: ```json { "ok": false, "text": "", "result": null, "error": { "code": "ERROR_CODE", "message": "Human-readable message" } } ``` ## Error Codes | Code | Description | | -------------- | --------------------------------------------------------------------------- | | `INPUT_ERROR` | Invalid or unusable input (arguments, file source, format, types). | | `CONFIG_ERROR` | Missing or invalid API / client configuration. | | `API_ERROR` | Request or response handling failed (network, HTTP, body parsing, schema). | ## Raw Result Notes The `result` field contains raw provider output. Raw fields may vary by model version and endpoint. ## Raw Result Example ```json { "logId": "request-uuid", "errorCode": 0, "errorMsg": "Success", "result": { "ocrResults": [ { "prunedResult": { "rec_texts": ["First line", "Second line"], "rec_scores": [0.98, 0.95], "...": "other OCR fields" }, "ocrImage": "https://...", "inputImage": "https://...", "...": "other model-specific fields" } ], "dataInfo": { "numPages": 1, "type": "pdf", "...": "other metadata" }, "...": "other top-level fields" } } ``` ## Stable Fields for Downstream Use Paths are relative to the output envelope root. - `result.result.ocrResults[n].prunedResult` Structured OCR data for page `n`. - `result.result.ocrResults[n].prunedResult.rec_texts` Recognized text lines for page `n`. - `result.result.ocrResults[n].prunedResult.rec_scores` Confidence scores for recognized text lines. ## Text Extraction `ocr_caller.py` extracts top-level `text` from `result.result.ocrResults[n].prunedResult.rec_texts`, joins lines with `\n`, and joins pages with `\n\n`. ## Command Examples ```bash # OCR from URL (result auto-saves to the system temp directory) uv run scripts/ocr_caller.py --file-url "URL" --pretty # OCR local file (result auto-saves to the system temp directory) uv run scripts/ocr_caller.py --file-path "doc.pdf" --pretty # OCR with explicit file type uv run scripts/ocr_caller.py --file-url "URL" --file-type 1 --pretty # Save result to a custom file path uv run scripts/ocr_caller.py --file-url "URL" --output "./result.json" --pretty # Print JSON to stdout without saving a file uv run scripts/ocr_caller.py --file-url "URL" --stdout --pretty ``` """ PaddleOCR Text Recognition Library Simple OCR API wrapper for PaddleOCR text recognition. """ import base64 import logging import math import os from pathlib import Path from typing import Any, Optional from urllib.parse import unquote, urlparse import httpx logger = logging.getLogger(__name__) # ============================================================================= # Constants # ============================================================================= DEFAULT_TIMEOUT = 120 # seconds API_GUIDE_URL = "https://paddleocr.com" FILE_TYPE_PDF = 0 FILE_TYPE_IMAGE = 1 IMAGE_EXTENSIONS = (".png", ".jpg", ".jpeg", ".bmp", ".tiff", ".tif", ".webp") # ============================================================================= # Environment # ============================================================================= def _get_env(key: str) -> str: """Get environment vari