
Pdf4vllm
Feed vision-capable LLMs clean PDF content by auto-switching from broken text extraction to image mode when corruption is detected.
Overview
pdf4vllm is an MCP server for the Build phase that reads PDFs for vision LLMs and auto-switches to image mode when text extraction is corrupted.
What is this MCP server?
- PDF reader tailored for vision LLM consumption paths
- Automatic detection of text-layer corruption with fallback to image mode
- stdio PyPI package pdf4vllm-mcp (v1.0.2) for MCP hosts
- Reduces silent garbage text in agent context when OCR or extractors fail
- Server version 1.0.2
- PyPI identifier pdf4vllm-mcp
- Transport: stdio
What problem does it solve?
Agents confidently summarize PDFs that are actually corrupted text extractions, so vision models never see the pages they need.
Who is it for?
Builders wiring document Q&A or coding agents that call vision models on messy real-world PDFs and scans.
Skip if: Pipelines that only ingest clean HTML or markdown, or teams without any vision-LLM inference path.
What do I get? / Deliverables
Installing the MCP server lets your agent pull PDF content with corruption-aware routing into text or image representations suitable for vision LLMs.
- Corruption-aware PDF representations for agent prompts
- Image-mode page payloads when text extract fails quality checks
- Repeatable MCP tool calls for document batches in dev workflows
Recommended MCP Servers
Journey fit
PDF ingestion for multimodal agents is core agent-tooling you add while building RAG, eval, or document workflows—not a launch distribution task. The server optimizes how agents read PDFs for vLLM-style vision models, which is tooling around the agent pipeline itself.
How it compares
Vision-oriented PDF ingestion MCP adapter, not a full vector database or generic office PDF editor.
Common Questions / FAQ
Who is pdf4vllm MCP for?
Solo developers building agent or RAG flows that send PDFs to vision LLMs and need automatic fallback when text layers are broken.
When should I use pdf4vllm MCP?
During build agent-tooling when you integrate document tools and want corruption detection before pages hit your vLLM or multimodal stack.
How do I add pdf4vllm to my agent?
Install pdf4vllm-mcp from PyPI, add it as a stdio MCP server in Claude Code, Cursor, or another host, then call its PDF read tools from your agent workflow.