
Silicon Paddle Ocr
Extract text from images in agent workflows using PaddleOCR through the SiliconFlow API with batch, JSON, and custom-prompt modes.
Overview
silicon-paddle-ocr is an agent skill for the Build phase that runs OCR on images through the SiliconFlow PaddleOCR API via a Python script.
Install
npx skills add https://github.com/aotenjou/silicon-paddleocr --skill silicon-paddle-ocrWhat is this skill?
- Single-image and glob batch paths via `ocr_skill.py`
- JSON output mode and optional `--output` file for agent-consumable results
- Custom `-p` prompts for tasks like Markdown table extraction from photos
- SiliconFlow API key via `SILICONFLOW_API_KEY` environment variable
- Example invocations for default, batch, JSON, prompt, and save-to-file flows
- 5 documented example invocation patterns in the usage script
Adoption & trust: 604 installs on skills.sh; 1 GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
Your agent workflow has images or scans but no reliable way to turn them into text without running heavy local OCR yourself.
Who is it for?
Indie builders adding screenshot, invoice, or document ingestion to an agent or Python automation with a hosted OCR API.
Skip if: Teams that require fully offline OCR, strict data residency without third-party APIs, or real-time video OCR at scale without rate-limit planning.
When should I use this skill?
You need OCR from images in an agent workflow and will call the bundled Python script with image paths, optional `--json`, `-p` prompt, or batch globs.
What do I get? / Deliverables
Image paths are processed through SiliconFlow and return plain or JSON text (optionally saved to a file) for the next automation or coding step.
- Console OCR text output
- Optional JSON extraction payload
- Optional results file via `--output`
Recommended Skills
Journey fit
OCR is an external API integration you wire into the product or agent pipeline during Build, not a full-journey methodology. The skill centers on calling SiliconFlow with SILICONFLOW_API_KEY and a Python CLI script—classic third-party integration work.
How it compares
An API-backed OCR shell skill—not a local Tesseract install or a vision MCP server with persistent session state.
Common Questions / FAQ
Who is silicon-paddle-ocr for?
Solo developers and agent users who need quick text extraction from image files using SiliconFlow’s hosted PaddleOCR model from a scriptable skill.
When should I use silicon-paddle-ocr?
During Build when you integrate OCR into tooling: single images, batch folders with globs, JSON for parsers, or custom prompts for tables—after you have a SiliconFlow API key configured.
Is silicon-paddle-ocr safe to install?
It requires a network API key and runs Python against local image paths; check the Security Audits panel on this Prism page and avoid sending sensitive documents to third-party APIs if your policy forbids it.
SKILL.md
READMESKILL.md - Silicon Paddle Ocr
#!/bin/bash # Example usage script for OCR skill # Set API key (or load from environment) export SILICONFLOW_API_KEY="your_api_key_here" # Path to the OCR script OCR_SCRIPT="$(dirname "$0")/../scripts/ocr_skill.py" echo "=== OCR Skill Examples ===" echo "" # Example 1: Single image recognition echo "Example 1: Single image" # python3 "$OCR_SCRIPT" /path/to/test.jpg echo "python3 $OCR_SCRIPT /path/to/test.jpg" echo "" # Example 2: Batch processing with glob pattern echo "Example 2: Batch processing" # python3 "$OCR_SCRIPT" /path/to/images/*.png echo "python3 $OCR_SCRIPT /path/to/images/*.png" echo "" # Example 3: JSON output format echo "Example 3: JSON format output" # python3 "$OCR_SCRIPT" --json /path/to/image.jpg echo "python3 $OCR_SCRIPT --json /path/to/image.jpg" echo "" # Example 4: Custom prompt for specific task echo "Example 4: Custom prompt for table extraction" # python3 "$OCR_SCRIPT" -p "Please extract and format as Markdown table" /path/to/table.jpg echo "python3 $OCR_SCRIPT -p \"Please extract and format as Markdown table\" /path/to/table.jpg" echo "" # Example 5: Save results to file echo "Example 5: Save results to file" # python3 "$OCR_SCRIPT" --json --output results.json /path/to/images/*.jpg echo "python3 $OCR_SCRIPT --json --output results.json /path/to/images/*.jpg" echo "" { "id": "silicon-paddle-ocr", "title": "Silicon PaddleOCR", "description": "OCR skill using PaddleOCR model via SiliconFlow API for text extraction from images", "author": "aotenjou", "version": "1.0.0", "license": "MIT", "tags": [ "ocr", "text-recognition", "image-processing", "paddleocr", "siliconflow" ], "category": "text-processing", "home": "https://skills.sh/aotenjou/silicon-paddle-ocr", "repository": "https://github.com/aotenjou/silicon-PaddleOCR", "skills": { "ocr": { "name": "ocr", "description": "Extract text from images using PaddleOCR via SiliconFlow API", "whenToUse": "When the user wants to recognize text from an image, extract text from a photo, OCR a screenshot, or mentions PaddleOCR, image text recognition, or text extraction from images", "entry": "scripts/ocr_skill.py", "compatibleAgents": ["claude"] } }, "capabilities": [ "single-image-ocr", "batch-image-ocr", "json-output", "custom-prompts", "multiple-image-formats", "text-bounding-boxes", "structured-output" ], "dependencies": { "python": ">3.7", "packages": ["openai>=1.0.0", "Pillow>=8.0.0"] }, "environment": { "required": ["SILICONFLOW_API_KEY"], "optional": [] }, "supportedFormats": ["jpg", "jpeg", "png", "webp", "bmp", "gif"] } # API Configuration ## SiliconFlow API This skill uses the SiliconFlow API for OCR operations. ### API Endpoint ``` https://api.siliconflow.cn/v1 ``` ### Authentication Use the SILICONFLOW_API_KEY environment variable: ```bash export SILICONFLOW_API_KEY="sk-xxxxxxxxxxxxx" ``` ### Default Model ``` PaddlePaddle/PaddleOCR-VL-1.5 ``` ### Supported Models - `PaddlePaddle/PaddleOCR-VL-1.5` (default) - Other multilingual VL models compatible with the API ### Rate Limits - Requests per minute: 60 (configurable) - Max concurrent requests: 5 ### API Response Format The API returns a chat completion JSON with the recognized text in the message content. #!/usr/bin/env python3 # -*- coding: utf-8 -*- """ OCR Skill - 使用 PaddleOCR 识别图片中的文字 """ import base64 import json import re import sys from pathlib import Path from typing import Dict, Any, List, Optional, Tuple try: from openai import OpenAI except ImportError: print("错误: 需要安装 openai 库") print("运行: pip install openai") sys.exit(1) def image_to_base64(image_path: str) -> str: """将图片文件转换为 base64 字符串""" with open(image_path, "rb") as f: return base64.b64encode(f.read()).decode("utf-8") def get_image_size(image_path: str) -> Tuple[int, int]: """获取图片尺寸""" from PIL import Image with Imag