
Pdfmux
Convert PDFs to Markdown through per-page routing and confidence scoring so RAG pipelines ingest cleaner text.
Overview
io.github.NameetP/pdfmux is a MCP server for the Build phase that routes PDF pages to Markdown backends with confidence scoring for RAG ingestion.
What is this MCP server?
- Per-page backend selection with confidence scoring for PDF-to-Markdown
- Runs fully local without paid APIs; optional GEMINI_API_KEY for low-confidence page fallback
- PyPI package pdfmux invoked as pdfmux serve over stdio MCP
- Registry version 1.6.4 with website at pdfmux.com
- Optimized for RAG ingestion workflows rather than one-shot viewing
- stdio transport via pypi package identifier pdfmux with required serve argument
- 100% local operation documented without GEMINI_API_KEY
Community signal: 66 GitHub stars.
What problem does it solve?
Bulk PDF docs produce noisy Markdown when a single parser handles every layout, breaking RAG quality for solo-built knowledge features.
Who is it for?
Indie builders piping manuals, specs, or reports into RAG or agent tools who need per-page quality signals.
Skip if: Users who only need to read a PDF once in a viewer with no Markdown or embedding pipeline.
What do I get? / Deliverables
After you add pdfmux, agents can ingest PDFs as scored Markdown pages with local-first conversion and optional Gemini fallback.
- Markdown output per PDF page with backend and confidence metadata
- Local-first ingestion path without mandatory cloud keys
- Agent-callable PDF conversion over MCP stdio
Recommended MCP Servers
Journey fit
PDF ingestion for embeddings and agent knowledge bases happens while you build integrations and data prep—not at launch distribution. Integrations subphase covers MCP servers that connect document ingestion into your app or agent stack.
How it compares
Document-ingestion router MCP, not a general web browser or SQL database server.
Common Questions / FAQ
Who is Pdfmux for?
Builders and agent users preparing PDF corpora for embeddings, search, or chat over documentation.
When should I use Pdfmux?
Use it when converting mixed PDFs to Markdown for RAG and you want per-page backend choice and confidence scores.
How do I add Pdfmux to my agent?
Install Pdfmux from PyPI, configure MCP stdio with runtime argument serve, set optional GEMINI_API_KEY for fallback, then point Claude Code or Cursor at the server.