
Pdf Processing Pro
Run production-grade PDF extract, form fill, table export, and batch jobs with validated Python scripts instead of fragile one-off snippets.
Overview
PDF Processing Pro is an agent skill for the Build phase that delivers production-ready PDF processing with forms, tables, OCR-oriented workflows, validation, and batch scripts.
Install
npx skills add https://github.com/davila7/claude-code-templates --skill pdf-processing-proWhat is this skill?
- Production scripts with error handling, validation, logging, and typed CLI (--help)
- Form analyze/fill workflow: analyze_form.py, fill_form.py with pre-fill validation
- Table extraction via extract_tables.py with CSV output
- pdfplumber quick-start for page text extraction
- Suited to high-volume PDF batches and complex form workflows
- Bundled CLI scripts include analyze_form.py, fill_form.py, and extract_tables.py with --help interfaces
Adoption & trust: 604 installs on skills.sh; 27.8k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need reliable PDF form fill, table extract, or batch processing in production but only have fragile scripts that fail without useful errors.
Who is it for?
Indie SaaS or internal tools that ingest PDFs at volume and need repeatable agent-guided implementation.
Skip if: One-page PDF reads in a notebook, pure browser-only PDF viewing, or teams that require a fully managed document AI API with no Python runtime.
When should I use this skill?
Working with complex PDF workflows in production, processing large volumes of PDFs, or requiring robust error handling and validation.
What do I get? / Deliverables
You integrate validated CLI Python scripts and pdfplumber patterns that return structured outputs (JSON fields, CSV tables, filled PDFs) with explicit error reporting.
- Filled or analyzed PDF outputs and sidecar JSON/CSV artifacts
- Validated CLI invocations with logged errors for failed inputs
Recommended Skills
Journey fit
PDF pipelines are product features or back-office integrations assembled during Build, before you harden them in Ship. Integrations captures third-party document formats, OCR, and scripted tooling wired into your app or jobs.
How it compares
Bundled script toolkit with validation, not a single-purpose chat macro for casual PDF viewing.
Common Questions / FAQ
Who is PDF Processing Pro for?
Solo builders shipping document automation—forms, reports, or OCR pipelines—who want agent help wiring battle-tested Python scripts.
When should I use PDF Processing Pro?
Use it during Build (integrations) when implementing complex PDF workflows, large batch jobs, or production environments that need validation and logging—not for unrelated non-PDF tasks.
Is PDF Processing Pro safe to install?
Scripts run locally on your files; review the Security Audits panel on this page, scan dependencies (pdfplumber stack), and sandbox uploads from untrusted users before production.
SKILL.md
READMESKILL.md - Pdf Processing Pro
# PDF Processing Pro Production-ready PDF processing toolkit with pre-built scripts, comprehensive error handling, and support for complex workflows. ## Quick start ### Extract text from PDF ```python import pdfplumber with pdfplumber.open("document.pdf") as pdf: text = pdf.pages[0].extract_text() print(text) ``` ### Analyze PDF form (using included script) ```bash python scripts/analyze_form.py input.pdf --output fields.json # Returns: JSON with all form fields, types, and positions ``` ### Fill PDF form with validation ```bash python scripts/fill_form.py input.pdf data.json output.pdf # Validates all fields before filling, includes error reporting ``` ### Extract tables from PDF ```bash python scripts/extract_tables.py report.pdf --output tables.csv # Extracts all tables with automatic column detection ``` ## Features ### ✅ Production-ready scripts All scripts include: - **Error handling**: Graceful failures with detailed error messages - **Validation**: Input validation and type checking - **Logging**: Configurable logging with timestamps - **Type hints**: Full type annotations for IDE support - **CLI interface**: `--help` flag for all scripts - **Exit codes**: Proper exit codes for automation ### ✅ Comprehensive workflows - **PDF Forms**: Complete form processing pipeline - **Table Extraction**: Advanced table detection and extraction - **OCR Processing**: Scanned PDF text extraction - **Batch Operations**: Process multiple PDFs efficiently - **Validation**: Pre and post-processing validation ## Advanced topics ### PDF Form Processing For complete form workflows including: - Field analysis and detection - Dynamic form filling - Validation rules - Multi-page forms - Checkbox and radio button handling See [FORMS.md](FORMS.md) ### Table Extraction For complex table extraction: - Multi-page tables - Merged cells - Nested tables - Custom table detection - Export to CSV/Excel See [TABLES.md](TABLES.md) ### OCR Processing For scanned PDFs and image-based documents: - Tesseract integration - Language support - Image preprocessing - Confidence scoring - Batch OCR See [OCR.md](OCR.md) ## Included scripts ### Form processing **analyze_form.py** - Extract form field information ```bash python scripts/analyze_form.py input.pdf [--output fields.json] [--verbose] ``` **fill_form.py** - Fill PDF forms with data ```bash python scripts/fill_form.py input.pdf data.json output.pdf [--validate] ``` **validate_form.py** - Validate form data before filling ```bash python scripts/validate_form.py data.json schema.json ``` ### Table extraction **extract_tables.py** - Extract tables to CSV/Excel ```bash python scripts/extract_tables.py input.pdf [--output tables.csv] [--format csv|excel] ``` ### Text extraction **extract_text.py** - Extract text with formatting preservation ```bash python scripts/extract_text.py input.pdf [--output text.txt] [--preserve-formatting] ``` ### Utilities **merge_pdfs.py** - Merge multiple PDFs ```bash python scripts/merge_pdfs.py file1.pdf file2.pdf file3.pdf --output merged.pdf ``` **split_pdf.py** - Split PDF into individual pages ```bash python scripts/split_pdf.py input.pdf --output-dir pages/ ``` **validate_pdf.py** - Validate PDF integrity ```bash python scripts/validate_pdf.py input.pdf ``` ## Common workflows ### Workflow 1: Process form submissions ```bash # 1. Analyze form structure python scripts/analyze_form.py template.pdf --output schema.json # 2. Validate submission data python scripts/validate_form.py submission.json schema.json # 3. Fill form python scripts/fill_form.py template.pdf submission.json completed.pdf # 4. Validate output python scripts/validate_pdf.p