
Ocr Document Processor
Recover text and structure from scanned PDFs, photos, receipts, and business cards when solo builders need searchable output without manual typing.
Overview
OCR Document Processor is an agent skill for the Build phase that extracts text and structure from scans, images, and scanned PDFs via scripted OCR and specialized receipt or business-card parsers.
Install
npx skills add https://github.com/dkyazzentwatwa/chatgpt-skills --skill ocr-document-processorWhat is this skill?
- Plain OCR, structured extraction, and document-specific parsing paths in one workflow
- Python helpers: ocr_processor.py, business_card_scanner.py, receipt_scanner.py
- Exports to text, markdown, JSON, HTML, and searchable PDF
- Table extraction plus receipt and business card field parsing
- Preprocess skew, blur, and shadows; surface confidence caveats on low-quality sources
- 5-step OCR workflow from mode choice through specialized scanners
- 3 Python helpers: ocr_processor, business_card_scanner, receipt_scanner
- 4 structured export shapes: text, markdown, JSON, HTML
Adoption & trust: 4.4k installs on skills.sh; 60 GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have photos or scanned pages with no selectable text and need markdown, JSON, or searchable PDFs without retyping every field.
Who is it for?
Solo builders automating intake from scans, receipts, business cards, or table-heavy paper that must become agent-readable text.
Skip if: Native digital PDFs with embedded text—use document-converter-suite by default instead of forcing OCR.
When should I use this skill?
Extract text and structure from scans, images, and scanned PDFs—for OCR, searchable PDFs, table extraction, receipt parsing, and business card parsing.
What do I get? / Deliverables
You get OCR output with preprocessing guidance, format choice, and confidence caveats, with digital PDFs explicitly routed to document-converter-suite when OCR is the wrong tool.
- OCR text or structured markdown/JSON/HTML
- Searchable PDF when export is requested
- Receipt or business card fields with stated confidence limits
Recommended Skills
Journey fit
Document OCR is wired into the product through Python helpers and preprocessing rules during implementation, not as a launch or growth activity. The skill centers on integrating OCR pipelines (core processor plus receipt and business-card scanners) into agent workflows rather than pure UI or backend API design.
How it compares
Use for image and scan recovery; use a document-converter skill for text-native PDFs and office formats.
Common Questions / FAQ
Who is ocr-document-processor for?
Solo and indie builders who pipe scanned documents, receipts, or cards through an agent and need structured extraction without a separate OCR SaaS for every task.
When should I use ocr-document-processor?
During Build integrations when you need OCR on images or scanned PDFs, searchable PDF export, table extraction from scans, or receipt and business card parsing—with preprocessing when skew or blur is present.
Is ocr-document-processor safe to install?
It runs local Python scripts (OpenCV, Tesseract) on files you provide; review the Security Audits panel on this Prism page and inspect the repo scripts before executing in sensitive environments.
SKILL.md
READMESKILL.md - Ocr Document Processor
# OCR Document Processor Handle OCR-heavy inputs where text must be recovered from images or scanned pages. ## Use This For - OCR on images and scanned PDFs - Searchable PDF export - Structured extraction to text, markdown, JSON, or HTML - Table extraction from scanned material - Receipt parsing and business card parsing ## Workflow 1. Decide whether plain OCR, structured extraction, or document-specific parsing is needed. 2. Preprocess noisy inputs before extraction when skew, blur, or shadows are present. 3. Use `scripts/ocr_processor.py` for core OCR tasks. 4. Use the focused helpers when the input is specialized: - `scripts/business_card_scanner.py` - `scripts/receipt_scanner.py` 5. Return confidence caveats when the source is low quality, rotated, handwritten, or multilingual. ## Guardrails - Prefer explicit language selection when accuracy matters. - Do not claim fields are exact when OCR confidence is weak. - Route non-scanned digital PDFs to `document-converter-suite` instead of OCR by default. #!/usr/bin/env python3 """ Business Card Scanner - Extract contact info from cards. """ import argparse import json import re import pytesseract import cv2 import numpy as np from PIL import Image class BusinessCardScanner: """Scan business cards.""" def __init__(self): """Initialize scanner.""" self.raw_text = "" self.data = {} def scan(self, filepath: str) -> 'BusinessCardScanner': """Scan business card image.""" img = cv2.imread(filepath) # Preprocess gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) _, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU) # OCR self.raw_text = pytesseract.image_to_string(thresh) # Extract data self.extract_contact_info() return self def extract_contact_info(self): """Extract contact information.""" lines = [line.strip() for line in self.raw_text.split('\n') if line.strip()] # Extract email email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b' emails = re.findall(email_pattern, self.raw_text) self.data['email'] = emails[0] if emails else None # Extract phone phone_patterns = [ r'\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}', r'\+\d{1,3}[-.\s]?\d{1,4}[-.\s]?\d{1,4}[-.\s]?\d{1,9}' ] for pattern in phone_patterns: phones = re.findall(pattern, self.raw_text) if phones: self.data['phone'] = phones[0] break # Extract name (usually first line) self.data['name'] = lines[0] if lines else None # Extract company (heuristic: look for Inc, LLC, Ltd) company_keywords = ['Inc', 'LLC', 'Ltd', 'Corp', 'Company'] for line in lines: if any(kw in line for kw in company_keywords): self.data['company'] = line break # Extract website url_pattern = r'www\.[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}' urls = re.findall(url_pattern, self.raw_text, re.IGNORECASE) self.data['website'] = urls[0] if urls else None def get_data(self) -> dict: """Get extracted data.""" return self.data def to_json(self, output: str) -> str: """Export to JSON.""" with open(output, 'w') as f: json.dump(self.data, f, indent=2) return output def main(): parser = argparse.ArgumentParser(description="Business Card Scanner") parser.add_argument("--input", "-i", required=True, help="Business card image") parser.add_argument("--output", "-o", required=True, help="Output JSON file") args = parser.parse_args() scanner = BusinessCardSca