Silicon Paddle Ocr

Name: Silicon Paddle Ocr
Author: aotenjou

aotenjou/silicon-paddleocr

Extract text from images in agent workflows using PaddleOCR through the SiliconFlow API with batch, JSON, and custom-prompt modes.

Overview

silicon-paddle-ocr is an agent skill for the Build phase that runs OCR on images through the SiliconFlow PaddleOCR API via a Python script.

Install

npx skills add https://github.com/aotenjou/silicon-paddleocr --skill silicon-paddle-ocr

What is this skill?

Single-image and glob batch paths via `ocr_skill.py`
JSON output mode and optional `--output` file for agent-consumable results
Custom `-p` prompts for tasks like Markdown table extraction from photos
SiliconFlow API key via `SILICONFLOW_API_KEY` environment variable
Example invocations for default, batch, JSON, prompt, and save-to-file flows
5 documented example invocation patterns in the usage script

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 604 installs on skills.sh; 1 GitHub stars; 2/3 security scanners passed (skills.sh audits).

What problem does it solve?

Your agent workflow has images or scans but no reliable way to turn them into text without running heavy local OCR yourself.

Who is it for?

Indie builders adding screenshot, invoice, or document ingestion to an agent or Python automation with a hosted OCR API.

Skip if: Teams that require fully offline OCR, strict data residency without third-party APIs, or real-time video OCR at scale without rate-limit planning.

When should I use this skill?

You need OCR from images in an agent workflow and will call the bundled Python script with image paths, optional `--json`, `-p` prompt, or batch globs.

What do I get? / Deliverables

Image paths are processed through SiliconFlow and return plain or JSON text (optionally saved to a file) for the next automation or coding step.

Console OCR text output
Optional JSON extraction payload
Optional results file via `--output`

Recommended Skills

Microsoft Foundrymicrosoft/azure-skills

Microsoft Foundry skill guides agents through the full Azure AI Foundry lifecycle—containerizing agents, pushing to ACR,…377k installs·1.2k stars

Azure Aimicrosoft/azure-skills

azure-ai is a Prism-oriented quick reference for Microsoft Azure AI work, with the published body centered on the Azure …375k installs·1.2k stars

Azure Hosted Copilot Sdkmicrosoft/azure-skills

Azure Hosted Copilot SDK is Microsoft's entry skill for repos using @github/copilot-sdk—it detects CopilotClient usage, …346k installs·1.2k stars

Lark Eventlarksuite/cli

Lark real-time subscription skill via lark-cli event consume for building bots and streaming webhook-style agent workers…208k installs·13.7k stars

Running Claude Code Via Litellm Copilotxixu-me/skills

Running Claude Code via LiteLLM Copilot walks through pointing Claude Code at a local LiteLLM proxy that forwards Anthro…200k installs·61 stars

Setup Matt Pocock Skillsmattpocock/skills

One-time per-repo setup so Matt Pocock engineering skills share correct issue tracker, triage strings, and domain docume…180k installs·121k stars

Journey fit

Primary fit

BuildIntegrations & version control

OCR is an external API integration you wire into the product or agent pipeline during Build, not a full-journey methodology. The skill centers on calling SiliconFlow with SILICONFLOW_API_KEY and a Python CLI script—classic third-party integration work.

How it compares

An API-backed OCR shell skill—not a local Tesseract install or a vision MCP server with persistent session state.

Common Questions / FAQ

Who is silicon-paddle-ocr for?

Solo developers and agent users who need quick text extraction from image files using SiliconFlow’s hosted PaddleOCR model from a scriptable skill.

When should I use silicon-paddle-ocr?

During Build when you integrate OCR into tooling: single images, batch folders with globs, JSON for parsers, or custom prompts for tables—after you have a SiliconFlow API key configured.

Is silicon-paddle-ocr safe to install?

It requires a network API key and runs Python against local image paths; check the Security Audits panel on this Prism page and avoid sending sensitive documents to third-party APIs if your policy forbids it.

SKILL.md

READMESKILL.md - Silicon Paddle Ocr

#!/bin/bash
# Example usage script for OCR skill

# Set API key (or load from environment)
export SILICONFLOW_API_KEY="your_api_key_here"

# Path to the OCR script
OCR_SCRIPT="$(dirname "$0")/../scripts/ocr_skill.py"

echo "=== OCR Skill Examples ==="
echo ""

# Example 1: Single image recognition
echo "Example 1: Single image"
# python3 "$OCR_SCRIPT" /path/to/test.jpg
echo "python3 $OCR_SCRIPT /path/to/test.jpg"
echo ""

# Example 2: Batch processing with glob pattern
echo "Example 2: Batch processing"
# python3 "$OCR_SCRIPT" /path/to/images/*.png
echo "python3 $OCR_SCRIPT /path/to/images/*.png"
echo ""

# Example 3: JSON output format
echo "Example 3: JSON format output"
# python3 "$OCR_SCRIPT" --json /path/to/image.jpg
echo "python3 $OCR_SCRIPT --json /path/to/image.jpg"
echo ""

# Example 4: Custom prompt for specific task
echo "Example 4: Custom prompt for table extraction"
# python3 "$OCR_SCRIPT" -p "Please extract and format as Markdown table" /path/to/table.jpg
echo "python3 $OCR_SCRIPT -p \"Please extract and format as Markdown table\" /path/to/table.jpg"
echo ""

# Example 5: Save results to file
echo "Example 5: Save results to file"
# python3 "$OCR_SCRIPT" --json --output results.json /path/to/images/*.jpg
echo "python3 $OCR_SCRIPT --json --output results.json /path/to/images/*.jpg"
echo ""


{
  "id": "silicon-paddle-ocr",
  "title": "Silicon PaddleOCR",
  "description": "OCR skill using PaddleOCR model via SiliconFlow API for text extraction from images",
  "author": "aotenjou",
  "version": "1.0.0",
  "license": "MIT",
  "tags": [
    "ocr",
    "text-recognition",
    "image-processing",
    "paddleocr",
    "siliconflow"
  ],
  "category": "text-processing",
  "home": "https://skills.sh/aotenjou/silicon-paddle-ocr",
  "repository": "https://github.com/aotenjou/silicon-PaddleOCR",
  "skills": {
    "ocr": {
      "name": "ocr",
      "description": "Extract text from images using PaddleOCR via SiliconFlow API",
      "whenToUse": "When the user wants to recognize text from an image, extract text from a photo, OCR a screenshot, or mentions PaddleOCR, image text recognition, or text extraction from images",
      "entry": "scripts/ocr_skill.py",
      "compatibleAgents": ["claude"]
    }
  },
  "capabilities": [
    "single-image-ocr",
    "batch-image-ocr",
    "json-output",
    "custom-prompts",
    "multiple-image-formats",
    "text-bounding-boxes",
    "structured-output"
  ],
  "dependencies": {
    "python": ">3.7",
    "packages": ["openai>=1.0.0", "Pillow>=8.0.0"]
  },
  "environment": {
    "required": ["SILICONFLOW_API_KEY"],
    "optional": []
  },
  "supportedFormats": ["jpg", "jpeg", "png", "webp", "bmp", "gif"]
}


# API Configuration

## SiliconFlow API

This skill uses the SiliconFlow API for OCR operations.

### API Endpoint

```
https://api.siliconflow.cn/v1
```

### Authentication

Use the SILICONFLOW_API_KEY environment variable:
```bash
export SILICONFLOW_API_KEY="sk-xxxxxxxxxxxxx"
```

### Default Model

```
PaddlePaddle/PaddleOCR-VL-1.5
```

### Supported Models

- `PaddlePaddle/PaddleOCR-VL-1.5` (default)
- Other multilingual VL models compatible with the API

### Rate Limits

- Requests per minute: 60 (configurable)
- Max concurrent requests: 5

### API Response Format

The API returns a chat completion JSON with the recognized text in the message content.


#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
OCR Skill - 使用 PaddleOCR 识别图片中的文字
"""

import base64
import json
import re
import sys
from pathlib import Path
from typing import Dict, Any, List, Optional, Tuple

try:
    from openai import OpenAI
except ImportError:
    print("错误: 需要安装 openai 库")
    print("运行: pip install openai")
    sys.exit(1)


def image_to_base64(image_path: str) -> str:
    """将图片文件转换为 base64 字符串"""
    with open(image_path, "rb") as f:
        return base64.b64encode(f.read()).decode("utf-8")


def get_image_size(image_path: str) -> Tuple[int, int]:
    """获取图片尺寸"""
    from PIL import Image
    with Imag

What is this skill?

Single-image and glob batch paths via `ocr_skill.py`

JSON output mode and optional `--output` file for agent-consumable results

Custom `-p` prompts for tasks like Markdown table extraction from photos

SiliconFlow API key via `SILICONFLOW_API_KEY` environment variable

Example invocations for default, batch, JSON, prompt, and save-to-file flows

5 documented example invocation patterns in the usage script

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 604 installs on skills.sh; 1 GitHub stars; 2/3 security scanners passed (skills.sh audits).

Journey fit

Primary fit

BuildIntegrations & version control

SKILL.md

READMESKILL.md - Silicon Paddle Ocr

#!/bin/bash
# Example usage script for OCR skill

# Set API key (or load from environment)
export SILICONFLOW_API_KEY="your_api_key_here"

# Path to the OCR script
OCR_SCRIPT="$(dirname "$0")/../scripts/ocr_skill.py"

echo "=== OCR Skill Examples ==="
echo ""

# Example 1: Single image recognition
echo "Example 1: Single image"
# python3 "$OCR_SCRIPT" /path/to/test.jpg
echo "python3 $OCR_SCRIPT /path/to/test.jpg"
echo ""

# Example 2: Batch processing with glob pattern
echo "Example 2: Batch processing"
# python3 "$OCR_SCRIPT" /path/to/images/*.png
echo "python3 $OCR_SCRIPT /path/to/images/*.png"
echo ""

# Example 3: JSON output format
echo "Example 3: JSON format output"
# python3 "$OCR_SCRIPT" --json /path/to/image.jpg
echo "python3 $OCR_SCRIPT --json /path/to/image.jpg"
echo ""

# Example 4: Custom prompt for specific task
echo "Example 4: Custom prompt for table extraction"
# python3 "$OCR_SCRIPT" -p "Please extract and format as Markdown table" /path/to/table.jpg
echo "python3 $OCR_SCRIPT -p \"Please extract and format as Markdown table\" /path/to/table.jpg"
echo ""

# Example 5: Save results to file
echo "Example 5: Save results to file"
# python3 "$OCR_SCRIPT" --json --output results.json /path/to/images/*.jpg
echo "python3 $OCR_SCRIPT --json --output results.json /path/to/images/*.jpg"
echo ""


{
  "id": "silicon-paddle-ocr",
  "title": "Silicon PaddleOCR",
  "description": "OCR skill using PaddleOCR model via SiliconFlow API for text extraction from images",
  "author": "aotenjou",
  "version": "1.0.0",
  "license": "MIT",
  "tags": [
    "ocr",
    "text-recognition",
    "image-processing",
    "paddleocr",
    "siliconflow"
  ],
  "category": "text-processing",
  "home": "https://skills.sh/aotenjou/silicon-paddle-ocr",
  "repository": "https://github.com/aotenjou/silicon-PaddleOCR",
  "skills": {
    "ocr": {
      "name": "ocr",
      "description": "Extract text from images using PaddleOCR via SiliconFlow API",
      "whenToUse": "When the user wants to recognize text from an image, extract text from a photo, OCR a screenshot, or mentions PaddleOCR, image text recognition, or text extraction from images",
      "entry": "scripts/ocr_skill.py",
      "compatibleAgents": ["claude"]
    }
  },
  "capabilities": [
    "single-image-ocr",
    "batch-image-ocr",
    "json-output",
    "custom-prompts",
    "multiple-image-formats",
    "text-bounding-boxes",
    "structured-output"
  ],
  "dependencies": {
    "python": ">3.7",
    "packages": ["openai>=1.0.0", "Pillow>=8.0.0"]
  },
  "environment": {
    "required": ["SILICONFLOW_API_KEY"],
    "optional": []
  },
  "supportedFormats": ["jpg", "jpeg", "png", "webp", "bmp", "gif"]
}


# API Configuration

## SiliconFlow API

This skill uses the SiliconFlow API for OCR operations.

### API Endpoint

```
https://api.siliconflow.cn/v1
```

### Authentication

Use the SILICONFLOW_API_KEY environment variable:
```bash
export SILICONFLOW_API_KEY="sk-xxxxxxxxxxxxx"
```

### Default Model

```
PaddlePaddle/PaddleOCR-VL-1.5
```

### Supported Models

- `PaddlePaddle/PaddleOCR-VL-1.5` (default)
- Other multilingual VL models compatible with the API

### Rate Limits

- Requests per minute: 60 (configurable)
- Max concurrent requests: 5

### API Response Format

The API returns a chat completion JSON with the recognized text in the message content.


#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
OCR Skill - 使用 PaddleOCR 识别图片中的文字
"""

import base64
import json
import re
import sys
from pathlib import Path
from typing import Dict, Any, List, Optional, Tuple

try:
    from openai import OpenAI
except ImportError:
    print("错误: 需要安装 openai 库")
    print("运行: pip install openai")
    sys.exit(1)


def image_to_base64(image_path: str) -> str:
    """将图片文件转换为 base64 字符串"""
    with open(image_path, "rb") as f:
        return base64.b64encode(f.read()).decode("utf-8")


def get_image_size(image_path: str) -> Tuple[int, int]:
    """获取图片尺寸"""
    from PIL import Image
    with Imag

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is silicon-paddle-ocr for?

When should I use silicon-paddle-ocr?

Is silicon-paddle-ocr safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is silicon-paddle-ocr for?

When should I use silicon-paddle-ocr?

Is silicon-paddle-ocr safe to install?

SKILL.md