
Document Parser
Let your agent parse PDF, Word, and HTML into structured fields so specs, contracts, and legacy docs become machine-usable context inside Build workflows.
Overview
Document Parser is a Build-phase MCP server that parses PDF, Word, and HTML documents into structured data for agent and integration workflows.
What is this MCP server?
- Parses PDF, Word, and HTML into structured extractable data
- npm @agenson-horrowitz/document-parser-mcp v1.0.8 with stdio transport
- GitHub: agenson-tools/document-parser-mcp
- Suited for agent pipelines that must not treat binary docs as opaque blobs
- Server metadata version 1.0.8 on MCP schema 2025-12-11
- Package and server version 1.0.8
- 3 document families explicitly listed: PDF, Word, HTML
- Transport type: stdio
What problem does it solve?
Agents cannot reliably use PDFs and Word files until text and fields are extracted, and manual copy-paste does not scale for solo builders automating intake.
Who is it for?
Builders automating document intake—contracts, specs, help center exports—who need MCP-accessible parsing inside Claude Code or Cursor.
Skip if: Users who only need plain markdown notes with no binary documents, or who require specialized scientific/OCR pipelines beyond PDF, Word, and HTML.
What do I get? / Deliverables
After adding the stdio MCP server, your agent can request structured parses of supported document types and feed results into code, databases, or summaries.
- Structured fields and text extracted from PDF, Word, and HTML via MCP tools
- Agent-ready document payloads for downstream code or storage
- Repeatable stdio MCP configuration for document intake pipelines
Recommended MCP Servers
Journey fit
Ingesting external file formats into your product or agent context is classic integration work during Build, even when source documents originated in Idea or Validate research. Integrations covers format bridges—PDF, DOCX, and HTML extraction into structured data for downstream code and prompts.
How it compares
Format extraction MCP integration, not an agent memory server or multi-agent output validator.
Common Questions / FAQ
Who is Document Parser for?
It is for indie developers and agent authors who must turn PDF, Word, and HTML files into structured data during build and integration tasks.
When should I use Document Parser?
Use it when an agent workflow needs tables, sections, or fields from uploaded or fetched documents instead of unstructured paste-ins.
How do I add Document Parser to my agent?
Install @agenson-horrowitz/document-parser-mcp v1.0.8 via npm, configure it as a stdio MCP server in your client, and call parse tools from your agent per the GitHub documentation.