
Image Ocr
Pick and implement OCR pipelines to pull text from screenshots, scans, receipts, and PDFs into your app or agent workflows.
Overview
image-ocr is an agent skill most often used in Build (also Validate prototype, Grow content) that guides OCR tool choice, preprocessing, and text extraction from images and documents.
Install
npx skills add https://github.com/fearovex/claude-config --skill image-ocrWhat is this skill?
- Tool selection matrix: Tesseract, EasyOCR, PaddleOCR, Google Vision, AWS Textract, Claude Vision
- Preprocessing and post-processing guidance to improve OCR accuracy
- Use-case coverage: receipts, invoices, handwriting, screenshots, multilingual text
- Pipeline integration notes for Python, Node.js, and cloud deployments
- Explicit trigger phrases for ocr, tesseract, easyocr, textract, document extraction
- 6 tools in selection guide table
- Tesseract 100+ languages listed
- EasyOCR 80+ scripts listed
Adoption & trust: 1.2k installs on skills.sh; 1 GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have screenshots, scans, or PDFs full of text but no clear stack or preprocessing steps to extract reliable structured text locally or via cloud OCR.
Who is it for?
Builders adding document upload, receipt parsing, screenshot automation, or vision-to-text steps in Python/Node agents and backends.
Skip if: Pure vector PDF text extraction where native text layers exist and OCR adds noise, or teams forbidden from sending images to third-party APIs without a compliance review.
When should I use this skill?
When extracting text from images, screenshots, scanned documents, or PDFs.
What do I get? / Deliverables
You leave with a justified OCR toolchain, preprocessing checklist, and integration pattern suited to your language, document type, and budget.
- OCR stack recommendation
- Preprocessing and structuring checklist
- Pipeline integration outline
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Most builders first install OCR when wiring ingestion or document features into the product. The skill is a tool-selection and pipeline integration reference spanning Tesseract, EasyOCR, cloud Vision/Textract, and vision LLMs.
Where it fits
Spike OCR on sample invoice photos before committing to a billing automation feature.
Wire EasyOCR or Textract into an upload endpoint that returns JSON fields for your app.
Batch-OCR community screenshots or scanned guides into markdown for SEO and support search.
Tune preprocessing when production OCR confidence drops on new camera formats or languages.
How it compares
Reference skill for multi-engine OCR pipelines—not a single hosted MCP; compare against ad-hoc "just use Tesseract" without preprocessing guidance.
Common Questions / FAQ
Who is image-ocr for?
Solo developers and agent builders who need to extract text from images, scans, invoices, or screenshots and want a structured tool comparison.
When should I use image-ocr?
When extracting text from images in Build integrations; when validating a prototype that reads user-uploaded documents; when growing content ops that digitize scans or social screenshots.
Is image-ocr safe to install?
Cloud OCR paths may send images off-device—check the Security Audits panel on this page and your data policy before enabling Vision or Textract in production.
SKILL.md
READMESKILL.md - Image Ocr
# Image OCR Expert > Expert in extracting, processing, and structuring text from images using OCR tools and techniques. ## Description This skill provides specialized knowledge for extracting text from images, including: - Tool and library selection by use case (Tesseract, EasyOCR, PaddleOCR, cloud APIs) - Image preprocessing to maximize OCR accuracy - Post-processing and structuring of extracted text - Handling handwriting, receipts, invoices, documents, screenshots - Multilingual OCR and special character support - Integration into Python/Node.js/cloud pipelines **Triggers**: ocr, extract text from image, image to text, read text image, optical character recognition, tesseract, easyocr, paddleocr, textract, vision api, document extraction, screenshot text, invoice ocr, receipt ocr, handwriting recognition, image text extraction --- ## Tool Selection Guide | Tool | Best For | Languages | Accuracy | Cost | |------|----------|-----------|----------|------| | **Tesseract** | Local, simple docs, print text | 100+ | Medium | Free | | **EasyOCR** | Local, photos, multiple scripts | 80+ | High | Free | | **PaddleOCR** | Local, CJK languages, tables | 80+ | Very High | Free | | **Google Vision API** | Cloud, complex docs, handwriting | All | Excellent | Pay-per-use | | **AWS Textract** | Cloud, forms, tables, invoices | Limited | Excellent | Pay-per-use | | **Azure Computer Vision** | Cloud, general OCR | 164 | Excellent | Pay-per-use | | **Surya** | Local, multilingual PDFs | 90+ | High | Free | | **Docling** | Local, PDFs, structured output | Many | High | Free | ### Decision Tree ``` Is accuracy critical and budget available? ├─ YES → Google Vision API or AWS Textract └─ NO → Local solution ├─ CJK (Chinese/Japanese/Korean) or tables? → PaddleOCR ├─ General photos or multiple languages? → EasyOCR ├─ Simple printed English docs? → Tesseract └─ PDF documents with structure? → Docling or Surya ``` --- ## Python Implementations ### Tesseract (pytesseract) ```python import pytesseract from PIL import Image import cv2 import numpy as np def extract_text_tesseract(image_path: str, lang: str = "eng") -> str: """Extract text using Tesseract. Best for clean printed documents.""" image = Image.open(image_path) # Config: --psm 6 = assume uniform block of text config = "--psm 6 --oem 3" text = pytesseract.image_to_string(image, lang=lang, config=config) return text.strip() def extract_with_confidence(image_path: str) -> list[dict]: """Extract text with bounding boxes and confidence scores.""" image = Image.open(image_path) data = pytesseract.image_to_data(image, output_type=pytesseract.Output.DICT) results = [] for i, word in enumerate(data["text"]): if word.strip() and int(data["conf"][i]) > 30: results.append({ "text": word, "confidence": data["conf"][i], "bbox": { "x": data["left"][i], "y": data["top"][i], "width": data["width"][i], "height": data["height"][i], } }) return results # Install: pip install pytesseract pillow # System: apt install tesseract-ocr (Linux) / brew install tesseract (Mac) ``` ### EasyOCR ```python import easyocr from pathlib import Path def extract_text_easyocr( image_path: str, languages: list[str] = ["en"], detail: bool = False ) -> str | list: """ Extract text using EasyOCR. Best for photos and multiple languages. languages: ['en'], ['en', 'es'], ['ch_sim', 'en'], etc. """ reader = easyocr.Reader(languages, gpu=False) # gpu=True if CUDA available results = reader