Image Ocr

Most builders first install OCR when wiring ingestion or document features into the product. The skill is a tool-selection and pipeline integration reference spanning Tesseract, EasyOCR, cloud Vision/Textract, and vision LLMs.

Also useful

Also useful

Where it fits

Example use

Spike OCR on sample invoice photos before committing to a billing automation feature.

Example use

Wire EasyOCR or Textract into an upload endpoint that returns JSON fields for your app.

Example use

OperateIteration & experiments

Batch-OCR community screenshots or scanned guides into markdown for SEO and support search.

Example use

Tune preprocessing when production OCR confidence drops on new camera formats or languages.

How it compares

Reference skill for multi-engine OCR pipelines—not a single hosted MCP; compare against ad-hoc "just use Tesseract" without preprocessing guidance.

Common Questions / FAQ

Who is image-ocr for?

Solo developers and agent builders who need to extract text from images, scans, invoices, or screenshots and want a structured tool comparison.

When should I use image-ocr?

When extracting text from images in Build integrations; when validating a prototype that reads user-uploaded documents; when growing content ops that digitize scans or social screenshots.

Is image-ocr safe to install?

Cloud OCR paths may send images off-device—check the Security Audits panel on this page and your data policy before enabling Vision or Textract in production.

SKILL.md

READMESKILL.md - Image Ocr

# Image OCR Expert

> Expert in extracting, processing, and structuring text from images using OCR tools and techniques.

## Description

This skill provides specialized knowledge for extracting text from images, including:
- Tool and library selection by use case (Tesseract, EasyOCR, PaddleOCR, cloud APIs)
- Image preprocessing to maximize OCR accuracy
- Post-processing and structuring of extracted text
- Handling handwriting, receipts, invoices, documents, screenshots
- Multilingual OCR and special character support
- Integration into Python/Node.js/cloud pipelines

**Triggers**: ocr, extract text from image, image to text, read text image, optical character recognition, tesseract, easyocr, paddleocr, textract, vision api, document extraction, screenshot text, invoice ocr, receipt ocr, handwriting recognition, image text extraction

---

## Tool Selection Guide

| Tool | Best For | Languages | Accuracy | Cost |
|------|----------|-----------|----------|------|
| **Tesseract** | Local, simple docs, print text | 100+ | Medium | Free |
| **EasyOCR** | Local, photos, multiple scripts | 80+ | High | Free |
| **PaddleOCR** | Local, CJK languages, tables | 80+ | Very High | Free |
| **Google Vision API** | Cloud, complex docs, handwriting | All | Excellent | Pay-per-use |
| **AWS Textract** | Cloud, forms, tables, invoices | Limited | Excellent | Pay-per-use |
| **Azure Computer Vision** | Cloud, general OCR | 164 | Excellent | Pay-per-use |
| **Surya** | Local, multilingual PDFs | 90+ | High | Free |
| **Docling** | Local, PDFs, structured output | Many | High | Free |

### Decision Tree

```
Is accuracy critical and budget available?
├─ YES → Google Vision API or AWS Textract
└─ NO → Local solution
    ├─ CJK (Chinese/Japanese/Korean) or tables? → PaddleOCR
    ├─ General photos or multiple languages? → EasyOCR
    ├─ Simple printed English docs? → Tesseract
    └─ PDF documents with structure? → Docling or Surya
```

---

## Python Implementations

### Tesseract (pytesseract)

```python
import pytesseract
from PIL import Image
import cv2
import numpy as np

def extract_text_tesseract(image_path: str, lang: str = "eng") -> str:
    """Extract text using Tesseract. Best for clean printed documents."""
    image = Image.open(image_path)

    # Config: --psm 6 = assume uniform block of text
    config = "--psm 6 --oem 3"
    text = pytesseract.image_to_string(image, lang=lang, config=config)
    return text.strip()

def extract_with_confidence(image_path: str) -> list[dict]:
    """Extract text with bounding boxes and confidence scores."""
    image = Image.open(image_path)
    data = pytesseract.image_to_data(image, output_type=pytesseract.Output.DICT)

    results = []
    for i, word in enumerate(data["text"]):
        if word.strip() and int(data["conf"][i]) > 30:
            results.append({
                "text": word,
                "confidence": data["conf"][i],
                "bbox": {
                    "x": data["left"][i],
                    "y": data["top"][i],
                    "width": data["width"][i],
                    "height": data["height"][i],
                }
            })
    return results

# Install: pip install pytesseract pillow
# System: apt install tesseract-ocr (Linux) / brew install tesseract (Mac)
```

### EasyOCR

```python
import easyocr
from pathlib import Path

def extract_text_easyocr(
    image_path: str,
    languages: list[str] = ["en"],
    detail: bool = False
) -> str | list:
    """
    Extract text using EasyOCR. Best for photos and multiple languages.
    languages: ['en'], ['en', 'es'], ['ch_sim', 'en'], etc.
    """
    reader = easyocr.Reader(languages, gpu=False)  # gpu=True if CUDA available
    results = reader

What is this skill?

Tool selection matrix: Tesseract, EasyOCR, PaddleOCR, Google Vision, AWS Textract, Claude Vision

Preprocessing and post-processing guidance to improve OCR accuracy

Use-case coverage: receipts, invoices, handwriting, screenshots, multilingual text

Pipeline integration notes for Python, Node.js, and cloud deployments

Explicit trigger phrases for ocr, tesseract, easyocr, textract, document extraction

6 tools in selection guide table

Tesseract 100+ languages listed

EasyOCR 80+ scripts listed

Compatible agents: Claude Code, Cursor, Codex, Windsurf

Adoption & trust: 1.2k installs on skills.sh; 1 GitHub stars; 2/3 security scanners passed (skills.sh audits).

Who is it for?

Builders adding document upload, receipt parsing, screenshot automation, or vision-to-text steps in Python/Node agents and backends.

Skip if: Pure vector PDF text extraction where native text layers exist and OCR adds noise, or teams forbidden from sending images to third-party APIs without a compliance review.

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

Also useful

Where it fits

Example use

Spike OCR on sample invoice photos before committing to a billing automation feature.

Example use

Wire EasyOCR or Textract into an upload endpoint that returns JSON fields for your app.

Example use