Smart Ocr

Name: Smart Ocr
Author: skills.volces.com

skills.volces.com

Pull readable text, bounding boxes, and confidence scores from screenshots, scans, and photos using PaddleOCR inside an agent workflow.

Overview

Smart OCR is an agent skill for the Build phase that extracts text from images and scanned documents using PaddleOCR with multilingual support and position metadata.

Install

npx skills add https://github.com/skills.volces.com --skill smart-ocr

What is this skill?

PaddleOCR-based extraction with angle classification for skewed scans
100+ language support including mixed Chinese and English prompts
Returns per-line text, quadrilateral boxes, and confidence scores
Works on screenshots, scanned PDFs, business cards, and handwritten images
Example Python init and result parsing included in the skill
100+ languages supported
PaddleOCR library referenced at ~69k GitHub stars in skill metadata

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 1 installs on skills.sh; 1/1 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

What problem does it solve?

You have screenshots, scans, or photos full of text but no fast way to feed that content into search, agents, or databases.

Who is it for?

Indie builders adding document ingestion, screenshot parsing, or multilingual OCR to Python-backed agent tools.

Skip if: Production OCR at scale without your own hosting, compliance review, and image pre-processing pipeline.

When should I use this skill?

User provides an image or scanned document and asks to extract, read, or OCR text with optional language hints.

What do I get? / Deliverables

You receive machine-readable text lines with bounding boxes and confidence so downstream code, RAG, or agents can cite or transform the content.

Structured OCR lines with text, boxes, and confidence scores

Recommended Skills

Agent Browservercel-labs/agent-browser

agent-browser is a Node-installed browser automation CLI built for AI agents that need dependable programmatic web inter…428k installs·35.5k stars

Lark Imlarksuite/cli

Lark IM is a Larksuite agent skill that exposes Feishu/Lark instant messaging to Claude Code, Cursor, and similar agents…210k installs·13.7k stars

Lark Calendarlarksuite/cli

lark-calendar is an agent skill for Feishu/Lark Calendar v4 exposed via lark-cli. Solo builders and small teams who alre…209k installs·13.7k stars

Lark Sheetslarksuite/cli

Skill for programmatic Feishu spreadsheet and worksheet management—create tables, bulk data IO, lookup, and export—using…209k installs·13.7k stars

Lark Vclarksuite/cli

lark-vc is an agent skill for Feishu/Lark video conferencing history and artifacts through lark-cli. After calls end, so…208k installs·13.7k stars

Lark Contactlarksuite/cli

CLI skill for Lark directory lookup: search employees and fetch metadata by open_id, with clear boundaries vs IM, calend…208k installs·13.7k stars

Journey fit

Primary fit

BuildIntegrations & version control

Build is where you wire document and image ingestion into products and agent tools. integrations matches attaching a third-party OCR engine (PaddleOCR) to your pipeline or agent session.

How it compares

Skill-wrapped PaddleOCR integration, not a hosted SaaS OCR API with managed SLAs.

Common Questions / FAQ

Who is smart-ocr for?

Solo developers and agent users who need local or scriptable OCR from PaddleOCR during Build integrations.

When should I use smart-ocr?

While building features that ingest scans, screenshots, or photos—especially when you need 100+ languages or mixed-language business cards and forms.

Is smart-ocr safe to install?

OCR runs Python and reads image files; confirm dependency sources and review the Security Audits panel on this Prism page before enabling in sensitive environments.

SKILL.md

READMESKILL.md - Smart Ocr

# Smart OCR Skill

## Overview

This skill enables intelligent text extraction from images and scanned documents using **PaddleOCR** - a leading OCR engine supporting 100+ languages. Extract text from photos, screenshots, scanned PDFs, and handwritten documents with high accuracy.

## How to Use

1. Provide the image or scanned document
2. Optionally specify language(s) to detect
3. I'll extract text with position and confidence data

**Example prompts:**
- "Extract all text from this screenshot"
- "OCR this scanned PDF document"
- "Read the text from this business card photo"
- "Extract Chinese and English text from this image"

## Domain Knowledge

### PaddleOCR Fundamentals

```python
from paddleocr import PaddleOCR

# Initialize OCR engine
ocr = PaddleOCR(use_angle_cls=True, lang='en')

# Run OCR on image
result = ocr.ocr('image.png', cls=True)

# Result structure: [[box, (text, confidence)], ...]
for line in result[0]:
    box = line[0]      # [[x1,y1], [x2,y2], [x3,y3], [x4,y4]]
    text = line[1][0]  # Extracted text
    conf = line[1][1]  # Confidence score
    print(f"{text} ({conf:.2f})")
```

### Supported Languages

```python
# Common language codes
languages = {
    'en': 'English',
    'ch': 'Chinese (Simplified)',
    'cht': 'Chinese (Traditional)',
    'japan': 'Japanese',
    'korean': 'Korean',
    'french': 'French',
    'german': 'German',
    'spanish': 'Spanish',
    'russian': 'Russian',
    'arabic': 'Arabic',
    'hindi': 'Hindi',
    'vi': 'Vietnamese',
    'th': 'Thai',
    # ... 100+ languages supported
}

# Use specific language
ocr = PaddleOCR(lang='ch')  # Chinese
ocr = PaddleOCR(lang='japan')  # Japanese
ocr = PaddleOCR(lang='multilingual')  # Auto-detect
```

### Configuration Options

```python
from paddleocr import PaddleOCR

ocr = PaddleOCR(
    # Detection settings
    det_model_dir=None,         # Custom detection model
    det_limit_side_len=960,     # Max side length for detection
    det_db_thresh=0.3,          # Binarization threshold
    det_db_box_thresh=0.5,      # Box score threshold
    
    # Recognition settings
    rec_model_dir=None,         # Custom recognition model
    rec_char_dict_path=None,    # Custom character dictionary
    
    # Angle classification
    use_angle_cls=True,         # Enable angle classification
    cls_model_dir=None,         # Custom classification model
    
    # Language
    lang='en',                  # Language code
    
    # Performance
    use_gpu=True,               # Use GPU if available
    gpu_mem=500,                # GPU memory limit (MB)
    enable_mkldnn=True,         # CPU optimization
    
    # Output
    show_log=False,             # Suppress logs
)
```

### Processing Different Sources

#### Image Files
```python
# Single image
result = ocr.ocr('image.png')

# Multiple images
images = ['img1.png', 'img2.png', 'img3.png']
for img in images:
    result = ocr.ocr(img)
    process_result(result)
```

#### PDF Files (Scanned)
```python
from pdf2image import convert_from_path

def ocr_pdf(pdf_path):
    """OCR a scanned PDF."""
    # Convert PDF pages to images
    images = convert_from_path(pdf_path)
    
    all_text = []
    for i, img in enumerate(images):
        # Save temp image
        temp_path = f'temp_page_{i}.png'
        img.save(temp_path)
        
        # OCR the image
        result = ocr.ocr(temp_path)
        
        # Extract text
        page_text = '\n'.join([line[1][0] for line in result[0]])
        all_text.append(f"--- Page {i+1} ---\n{page_text}")
        
        os.remove(temp_path)
    
    return

What is this skill?

PaddleOCR-based extraction with angle classification for skewed scans

100+ language support including mixed Chinese and English prompts

Returns per-line text, quadrilateral boxes, and confidence scores

Works on screenshots, scanned PDFs, business cards, and handwritten images

Example Python init and result parsing included in the skill

100+ languages supported

PaddleOCR library referenced at ~69k GitHub stars in skill metadata

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 1 installs on skills.sh; 1/1 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

SKILL.md

READMESKILL.md - Smart Ocr

# Smart OCR Skill

## Overview

This skill enables intelligent text extraction from images and scanned documents using **PaddleOCR** - a leading OCR engine supporting 100+ languages. Extract text from photos, screenshots, scanned PDFs, and handwritten documents with high accuracy.

## How to Use

1. Provide the image or scanned document
2. Optionally specify language(s) to detect
3. I'll extract text with position and confidence data

**Example prompts:**
- "Extract all text from this screenshot"
- "OCR this scanned PDF document"
- "Read the text from this business card photo"
- "Extract Chinese and English text from this image"

## Domain Knowledge

### PaddleOCR Fundamentals

```python
from paddleocr import PaddleOCR

# Initialize OCR engine
ocr = PaddleOCR(use_angle_cls=True, lang='en')

# Run OCR on image
result = ocr.ocr('image.png', cls=True)

# Result structure: [[box, (text, confidence)], ...]
for line in result[0]:
    box = line[0]      # [[x1,y1], [x2,y2], [x3,y3], [x4,y4]]
    text = line[1][0]  # Extracted text
    conf = line[1][1]  # Confidence score
    print(f"{text} ({conf:.2f})")
```

### Supported Languages

```python
# Common language codes
languages = {
    'en': 'English',
    'ch': 'Chinese (Simplified)',
    'cht': 'Chinese (Traditional)',
    'japan': 'Japanese',
    'korean': 'Korean',
    'french': 'French',
    'german': 'German',
    'spanish': 'Spanish',
    'russian': 'Russian',
    'arabic': 'Arabic',
    'hindi': 'Hindi',
    'vi': 'Vietnamese',
    'th': 'Thai',
    # ... 100+ languages supported
}

# Use specific language
ocr = PaddleOCR(lang='ch')  # Chinese
ocr = PaddleOCR(lang='japan')  # Japanese
ocr = PaddleOCR(lang='multilingual')  # Auto-detect
```

### Configuration Options

```python
from paddleocr import PaddleOCR

ocr = PaddleOCR(
    # Detection settings
    det_model_dir=None,         # Custom detection model
    det_limit_side_len=960,     # Max side length for detection
    det_db_thresh=0.3,          # Binarization threshold
    det_db_box_thresh=0.5,      # Box score threshold
    
    # Recognition settings
    rec_model_dir=None,         # Custom recognition model
    rec_char_dict_path=None,    # Custom character dictionary
    
    # Angle classification
    use_angle_cls=True,         # Enable angle classification
    cls_model_dir=None,         # Custom classification model
    
    # Language
    lang='en',                  # Language code
    
    # Performance
    use_gpu=True,               # Use GPU if available
    gpu_mem=500,                # GPU memory limit (MB)
    enable_mkldnn=True,         # CPU optimization
    
    # Output
    show_log=False,             # Suppress logs
)
```

### Processing Different Sources

#### Image Files
```python
# Single image
result = ocr.ocr('image.png')

# Multiple images
images = ['img1.png', 'img2.png', 'img3.png']
for img in images:
    result = ocr.ocr(img)
    process_result(result)
```

#### PDF Files (Scanned)
```python
from pdf2image import convert_from_path

def ocr_pdf(pdf_path):
    """OCR a scanned PDF."""
    # Convert PDF pages to images
    images = convert_from_path(pdf_path)
    
    all_text = []
    for i, img in enumerate(images):
        # Save temp image
        temp_path = f'temp_page_{i}.png'
        img.save(temp_path)
        
        # OCR the image
        result = ocr.ocr(temp_path)
        
        # Extract text
        page_text = '\n'.join([line[1][0] for line in result[0]])
        all_text.append(f"--- Page {i+1} ---\n{page_text}")
        
        os.remove(temp_path)
    
    return

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is smart-ocr for?

When should I use smart-ocr?

Is smart-ocr safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is smart-ocr for?

When should I use smart-ocr?

Is smart-ocr safe to install?

SKILL.md