Document Parser

Name: Document Parser
Author: agenson-tools

agenson-tools/document-parser-mcp

Let your agent parse PDF, Word, and HTML into structured fields so specs, contracts, and legacy docs become machine-usable context inside Build workflows.

Overview

Document Parser is a Build-phase MCP server that parses PDF, Word, and HTML documents into structured data for agent and integration workflows.

What is this MCP server?

Parses PDF, Word, and HTML into structured extractable data
npm @agenson-horrowitz/document-parser-mcp v1.0.8 with stdio transport
GitHub: agenson-tools/document-parser-mcp
Suited for agent pipelines that must not treat binary docs as opaque blobs
Server metadata version 1.0.8 on MCP schema 2025-12-11
Package and server version 1.0.8
3 document families explicitly listed: PDF, Word, HTML
Transport type: stdio

Compatible agents: Claude Code, Cursor, Codex, Windsurf

What problem does it solve?

Agents cannot reliably use PDFs and Word files until text and fields are extracted, and manual copy-paste does not scale for solo builders automating intake.

Who is it for?

Builders automating document intake—contracts, specs, help center exports—who need MCP-accessible parsing inside Claude Code or Cursor.

Skip if: Users who only need plain markdown notes with no binary documents, or who require specialized scientific/OCR pipelines beyond PDF, Word, and HTML.

What do I get? / Deliverables

After adding the stdio MCP server, your agent can request structured parses of supported document types and feed results into code, databases, or summaries.

Structured fields and text extracted from PDF, Word, and HTML via MCP tools
Agent-ready document payloads for downstream code or storage
Repeatable stdio MCP configuration for document intake pipelines

Recommended MCP Servers

An MCP (Model Context Protocol) is a standardized interface that enables applications and AI agents to discover, connect…

0pidizzydes/botbox

io.github.dizzydes/0pi exposes a lightweight Model Context Protocol server around short-lived, free agent storage so sol…

100Hires AI ATS & Recruitment Software100Hires/mcp

The 100Hires MCP server is the official Model Context Protocol bridge to 100Hires, an AI-oriented applicant tracking and…

123elec Mcp

io.github.Servicedsi/123elec-mcp is the official Model Context Protocol interface for the 123elec electrical supplies me…

1staySTAYKER-COM/1Stay-mcp

1Stay by Stayker is a remote MCP server for hotel booking operations: search properties, complete bookings, and manage r…

3D MeshWeaver

io.github.Evozim/3d-meshweaver is a hosted Model Context Protocol server titled 3D-MeshWeaver that optimizes three-dimen…

Journey fit

Primary fit

BuildIntegrations & version control

Ingesting external file formats into your product or agent context is classic integration work during Build, even when source documents originated in Idea or Validate research. Integrations covers format bridges—PDF, DOCX, and HTML extraction into structured data for downstream code and prompts.

How it compares

Format extraction MCP integration, not an agent memory server or multi-agent output validator.

Common Questions / FAQ

Who is Document Parser for?

It is for indie developers and agent authors who must turn PDF, Word, and HTML files into structured data during build and integration tasks.

When should I use Document Parser?

Use it when an agent workflow needs tables, sections, or fields from uploaded or fetched documents instead of unstructured paste-ins.

How do I add Document Parser to my agent?

Install @agenson-horrowitz/document-parser-mcp v1.0.8 via npm, configure it as a stdio MCP server in your client, and call parse tools from your agent per the GitHub documentation.

What is this MCP server?

Parses PDF, Word, and HTML into structured extractable data

npm @agenson-horrowitz/document-parser-mcp v1.0.8 with stdio transport

GitHub: agenson-tools/document-parser-mcp

Suited for agent pipelines that must not treat binary docs as opaque blobs

Server metadata version 1.0.8 on MCP schema 2025-12-11

Package and server version 1.0.8

3 document families explicitly listed: PDF, Word, HTML

Transport type: stdio

Compatible agents: Claude Code, Cursor, Codex, Windsurf

What do I get? / Deliverables

After adding the stdio MCP server, your agent can request structured parses of supported document types and feed results into code, databases, or summaries.

Structured fields and text extracted from PDF, Word, and HTML via MCP tools

Agent-ready document payloads for downstream code or storage

Repeatable stdio MCP configuration for document intake pipelines

Journey fit

Primary fit

BuildIntegrations & version control

Overview

What is this MCP server?

What problem does it solve?

Who is it for?

What do I get? / Deliverables

Recommended MCP Servers

Journey fit

Who is Document Parser for?

When should I use Document Parser?

How do I add Document Parser to my agent?

This week for builders

Overview

What is this MCP server?

What problem does it solve?

Who is it for?

What do I get? / Deliverables

Recommended MCP Servers

Journey fit

Who is Document Parser for?

When should I use Document Parser?

How do I add Document Parser to my agent?