
ClicheFactory Document Intelligence
Turn inbound PDFs, office files, email attachments, and images into structured JSON inside agent workflows without hand-writing parsers.
Overview
ClicheFactory is an MCP server for the Build phase that extracts structured JSON from documents and attachments via LLM-driven pipelines.
What is this MCP server?
- Extract structured JSON from PDF, image, DOCX, XLSX, CSV, and EML attachments
- PyPI package clichefactory-mcp (v0.1.8) runnable via uvx stdio transport
- Service mode with CLICHEFACTORY_API_KEY or local mode with LLM_API_KEY and model env vars
- Optional OCR/VLM overrides via OCR_MODEL_NAME and OCR_API_KEY
- DSPy pipeline support for document-intelligence style automation chains
- Package version 0.1.8; PyPI identifier clichefactory-mcp; stdio via uvx
- Supported inputs include PDF, image, DOCX, XLSX, CSV, EML, and DSPy pipelines
- Env keys: CLICHEFACTORY_API_KEY, optional CLICHEFACTORY_API_URL, LLM_MODEL_NAME, LLM_API_KEY, OCR_MODEL_NAME, OCR_API_KE
What problem does it solve?
Agents cannot reliably turn PDFs, spreadsheets, and email files into typed records without bespoke OCR scripts and fragile prompts.
Who is it for?
Builders shipping document-heavy SaaS, ops automations, or agent tools that must ingest real-world files during implementation.
Skip if: Simple markdown-only repos or teams that only need static file search without structured field extraction.
What do I get? / Deliverables
Your agent calls one MCP to return JSON from common attachment types so you can plug results into APIs, databases, or DSPy flows.
- Structured JSON extracted from supported file and attachment types
- Agent-callable ingestion step for SaaS upload and email workflows
- Configurable service or local LLM extraction path
Recommended MCP Servers
Journey fit
Document extraction is a Build integration task that feeds backends, automations, and agent tools once you are implementing the product. Integrations subphase covers MCP bridges that connect agents to external intelligence APIs and file-ingestion pipelines.
How it compares
Document-to-JSON extraction MCP, not a general code-search or Git hosting integration.
Common Questions / FAQ
Who is ClicheFactory for?
ClicheFactory is for solo builders and small teams whose products depend on turning uploaded or emailed documents into structured data inside agent-assisted development.
When should I use ClicheFactory?
Use it while building ingestion features, support automations, or back-office agents that must parse PDFs, Office files, CSVs, images, or EML attachments.
How do I add ClicheFactory to my agent?
Configure stdio MCP to run clichefactory-mcp via uvx, set CLICHEFACTORY_API_KEY for service mode—or LLM_MODEL_NAME and LLM_API_KEY for local mode—and optionally OCR variables for scanned documents.