Ocr Document Processor

Name: Ocr Document Processor
Author: dkyazzentwatwa

dkyazzentwatwa/chatgpt-skills

Recover text and structure from scanned PDFs, photos, receipts, and business cards when solo builders need searchable output without manual typing.

Overview

OCR Document Processor is an agent skill for the Build phase that extracts text and structure from scans, images, and scanned PDFs via scripted OCR and specialized receipt or business-card parsers.

Install

npx skills add https://github.com/dkyazzentwatwa/chatgpt-skills --skill ocr-document-processor

What is this skill?

Plain OCR, structured extraction, and document-specific parsing paths in one workflow
Python helpers: ocr_processor.py, business_card_scanner.py, receipt_scanner.py
Exports to text, markdown, JSON, HTML, and searchable PDF
Table extraction plus receipt and business card field parsing
Preprocess skew, blur, and shadows; surface confidence caveats on low-quality sources
5-step OCR workflow from mode choice through specialized scanners
3 Python helpers: ocr_processor, business_card_scanner, receipt_scanner
4 structured export shapes: text, markdown, JSON, HTML

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 4.4k installs on skills.sh; 60 GitHub stars; 3/3 security scanners passed (skills.sh audits).

What problem does it solve?

You have photos or scanned pages with no selectable text and need markdown, JSON, or searchable PDFs without retyping every field.

Who is it for?

Solo builders automating intake from scans, receipts, business cards, or table-heavy paper that must become agent-readable text.

Skip if: Native digital PDFs with embedded text—use document-converter-suite by default instead of forcing OCR.

When should I use this skill?

Extract text and structure from scans, images, and scanned PDFs—for OCR, searchable PDFs, table extraction, receipt parsing, and business card parsing.

What do I get? / Deliverables

You get OCR output with preprocessing guidance, format choice, and confidence caveats, with digital PDFs explicitly routed to document-converter-suite when OCR is the wrong tool.

OCR text or structured markdown/JSON/HTML
Searchable PDF when export is requested
Receipt or business card fields with stated confidence limits

Recommended Skills

Agent Browservercel-labs/agent-browser

agent-browser is a Node-installed browser automation CLI built for AI agents that need dependable programmatic web inter…428k installs·35.5k stars

Lark Imlarksuite/cli

Lark IM is a Larksuite agent skill that exposes Feishu/Lark instant messaging to Claude Code, Cursor, and similar agents…210k installs·13.7k stars

Lark Calendarlarksuite/cli

lark-calendar is an agent skill for Feishu/Lark Calendar v4 exposed via lark-cli. Solo builders and small teams who alre…209k installs·13.7k stars

Lark Sheetslarksuite/cli

Skill for programmatic Feishu spreadsheet and worksheet management—create tables, bulk data IO, lookup, and export—using…209k installs·13.7k stars

Lark Vclarksuite/cli

lark-vc is an agent skill for Feishu/Lark video conferencing history and artifacts through lark-cli. After calls end, so…208k installs·13.7k stars

Lark Contactlarksuite/cli

CLI skill for Lark directory lookup: search employees and fetch metadata by open_id, with clear boundaries vs IM, calend…208k installs·13.7k stars

Journey fit

Primary fit

BuildIntegrations & version control

Document OCR is wired into the product through Python helpers and preprocessing rules during implementation, not as a launch or growth activity. The skill centers on integrating OCR pipelines (core processor plus receipt and business-card scanners) into agent workflows rather than pure UI or backend API design.

How it compares

Use for image and scan recovery; use a document-converter skill for text-native PDFs and office formats.

Common Questions / FAQ

Who is ocr-document-processor for?

Solo and indie builders who pipe scanned documents, receipts, or cards through an agent and need structured extraction without a separate OCR SaaS for every task.

When should I use ocr-document-processor?

During Build integrations when you need OCR on images or scanned PDFs, searchable PDF export, table extraction from scans, or receipt and business card parsing—with preprocessing when skew or blur is present.

Is ocr-document-processor safe to install?

It runs local Python scripts (OpenCV, Tesseract) on files you provide; review the Security Audits panel on this Prism page and inspect the repo scripts before executing in sensitive environments.

SKILL.md

READMESKILL.md - Ocr Document Processor

# OCR Document Processor

Handle OCR-heavy inputs where text must be recovered from images or scanned pages.

## Use This For

- OCR on images and scanned PDFs
- Searchable PDF export
- Structured extraction to text, markdown, JSON, or HTML
- Table extraction from scanned material
- Receipt parsing and business card parsing

## Workflow

1. Decide whether plain OCR, structured extraction, or document-specific parsing is needed.
2. Preprocess noisy inputs before extraction when skew, blur, or shadows are present.
3. Use `scripts/ocr_processor.py` for core OCR tasks.
4. Use the focused helpers when the input is specialized:
   - `scripts/business_card_scanner.py`
   - `scripts/receipt_scanner.py`
5. Return confidence caveats when the source is low quality, rotated, handwritten, or multilingual.

## Guardrails

- Prefer explicit language selection when accuracy matters.
- Do not claim fields are exact when OCR confidence is weak.
- Route non-scanned digital PDFs to `document-converter-suite` instead of OCR by default.


#!/usr/bin/env python3
"""
Business Card Scanner - Extract contact info from cards.
"""

import argparse
import json
import re

import pytesseract
import cv2
import numpy as np
from PIL import Image


class BusinessCardScanner:
    """Scan business cards."""

    def __init__(self):
        """Initialize scanner."""
        self.raw_text = ""
        self.data = {}

    def scan(self, filepath: str) -> 'BusinessCardScanner':
        """Scan business card image."""
        img = cv2.imread(filepath)

        # Preprocess
        gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
        _, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)

        # OCR
        self.raw_text = pytesseract.image_to_string(thresh)

        # Extract data
        self.extract_contact_info()

        return self

    def extract_contact_info(self):
        """Extract contact information."""
        lines = [line.strip() for line in self.raw_text.split('\n') if line.strip()]

        # Extract email
        email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
        emails = re.findall(email_pattern, self.raw_text)
        self.data['email'] = emails[0] if emails else None

        # Extract phone
        phone_patterns = [
            r'\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}',
            r'\+\d{1,3}[-.\s]?\d{1,4}[-.\s]?\d{1,4}[-.\s]?\d{1,9}'
        ]
        for pattern in phone_patterns:
            phones = re.findall(pattern, self.raw_text)
            if phones:
                self.data['phone'] = phones[0]
                break

        # Extract name (usually first line)
        self.data['name'] = lines[0] if lines else None

        # Extract company (heuristic: look for Inc, LLC, Ltd)
        company_keywords = ['Inc', 'LLC', 'Ltd', 'Corp', 'Company']
        for line in lines:
            if any(kw in line for kw in company_keywords):
                self.data['company'] = line
                break

        # Extract website
        url_pattern = r'www\.[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}'
        urls = re.findall(url_pattern, self.raw_text, re.IGNORECASE)
        self.data['website'] = urls[0] if urls else None

    def get_data(self) -> dict:
        """Get extracted data."""
        return self.data

    def to_json(self, output: str) -> str:
        """Export to JSON."""
        with open(output, 'w') as f:
            json.dump(self.data, f, indent=2)
        return output


def main():
    parser = argparse.ArgumentParser(description="Business Card Scanner")

    parser.add_argument("--input", "-i", required=True, help="Business card image")
    parser.add_argument("--output", "-o", required=True, help="Output JSON file")

    args = parser.parse_args()

    scanner = BusinessCardSca

What is this skill?

Plain OCR, structured extraction, and document-specific parsing paths in one workflow

Python helpers: ocr_processor.py, business_card_scanner.py, receipt_scanner.py

Exports to text, markdown, JSON, HTML, and searchable PDF

Table extraction plus receipt and business card field parsing

Preprocess skew, blur, and shadows; surface confidence caveats on low-quality sources

5-step OCR workflow from mode choice through specialized scanners

3 Python helpers: ocr_processor, business_card_scanner, receipt_scanner

4 structured export shapes: text, markdown, JSON, HTML

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 4.4k installs on skills.sh; 60 GitHub stars; 3/3 security scanners passed (skills.sh audits).

What do I get? / Deliverables

You get OCR output with preprocessing guidance, format choice, and confidence caveats, with digital PDFs explicitly routed to document-converter-suite when OCR is the wrong tool.

OCR text or structured markdown/JSON/HTML

Searchable PDF when export is requested

Receipt or business card fields with stated confidence limits

Journey fit

Primary fit

BuildIntegrations & version control

SKILL.md

READMESKILL.md - Ocr Document Processor

# OCR Document Processor

Handle OCR-heavy inputs where text must be recovered from images or scanned pages.

## Use This For

- OCR on images and scanned PDFs
- Searchable PDF export
- Structured extraction to text, markdown, JSON, or HTML
- Table extraction from scanned material
- Receipt parsing and business card parsing

## Workflow

1. Decide whether plain OCR, structured extraction, or document-specific parsing is needed.
2. Preprocess noisy inputs before extraction when skew, blur, or shadows are present.
3. Use `scripts/ocr_processor.py` for core OCR tasks.
4. Use the focused helpers when the input is specialized:
   - `scripts/business_card_scanner.py`
   - `scripts/receipt_scanner.py`
5. Return confidence caveats when the source is low quality, rotated, handwritten, or multilingual.

## Guardrails

- Prefer explicit language selection when accuracy matters.
- Do not claim fields are exact when OCR confidence is weak.
- Route non-scanned digital PDFs to `document-converter-suite` instead of OCR by default.


#!/usr/bin/env python3
"""
Business Card Scanner - Extract contact info from cards.
"""

import argparse
import json
import re

import pytesseract
import cv2
import numpy as np
from PIL import Image


class BusinessCardScanner:
    """Scan business cards."""

    def __init__(self):
        """Initialize scanner."""
        self.raw_text = ""
        self.data = {}

    def scan(self, filepath: str) -> 'BusinessCardScanner':
        """Scan business card image."""
        img = cv2.imread(filepath)

        # Preprocess
        gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
        _, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)

        # OCR
        self.raw_text = pytesseract.image_to_string(thresh)

        # Extract data
        self.extract_contact_info()

        return self

    def extract_contact_info(self):
        """Extract contact information."""
        lines = [line.strip() for line in self.raw_text.split('\n') if line.strip()]

        # Extract email
        email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
        emails = re.findall(email_pattern, self.raw_text)
        self.data['email'] = emails[0] if emails else None

        # Extract phone
        phone_patterns = [
            r'\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}',
            r'\+\d{1,3}[-.\s]?\d{1,4}[-.\s]?\d{1,4}[-.\s]?\d{1,9}'
        ]
        for pattern in phone_patterns:
            phones = re.findall(pattern, self.raw_text)
            if phones:
                self.data['phone'] = phones[0]
                break

        # Extract name (usually first line)
        self.data['name'] = lines[0] if lines else None

        # Extract company (heuristic: look for Inc, LLC, Ltd)
        company_keywords = ['Inc', 'LLC', 'Ltd', 'Corp', 'Company']
        for line in lines:
            if any(kw in line for kw in company_keywords):
                self.data['company'] = line
                break

        # Extract website
        url_pattern = r'www\.[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}'
        urls = re.findall(url_pattern, self.raw_text, re.IGNORECASE)
        self.data['website'] = urls[0] if urls else None

    def get_data(self) -> dict:
        """Get extracted data."""
        return self.data

    def to_json(self, output: str) -> str:
        """Export to JSON."""
        with open(output, 'w') as f:
            json.dump(self.data, f, indent=2)
        return output


def main():
    parser = argparse.ArgumentParser(description="Business Card Scanner")

    parser.add_argument("--input", "-i", required=True, help="Business card image")
    parser.add_argument("--output", "-o", required=True, help="Output JSON file")

    args = parser.parse_args()

    scanner = BusinessCardSca

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is ocr-document-processor for?

When should I use ocr-document-processor?

Is ocr-document-processor safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is ocr-document-processor for?

When should I use ocr-document-processor?

Is ocr-document-processor safe to install?

SKILL.md