
Markitdown
Convert PDFs, Office files, HTML pages, and HTTP sources into markdown text inside agent workflows using the MarkItDown Python API.
Overview
MarkItDown is an agent skill for the Build phase that documents how to convert local files, streams, and URLs to markdown using the MarkItDown Python library.
Install
npx skills add https://github.com/julianobarbosa/claude-code-skills --skill markitdownWhat is this skill?
- convert() auto-detects file paths, URLs, and requests.Response objects
- convert_local(), convert_stream() (binary BytesIO), and convert_url() dedicated entry points
- Stream path requires binary streams only—StringIO not supported (v0.1.0+ note)
- Advanced path for custom converters, URI handling, and plugins per skill doc
- Python snippets agents can drop into ingestion scripts or agent tools
- Binary streams only for convert_stream() per v0.1.0+ note
Adoption & trust: 356 installs on skills.sh; 76 GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
Your agent pipeline needs text from PDFs and web docs but you keep reimplementing brittle one-off parsers instead of a unified conversion API.
Who is it for?
Indie devs building doc ingestion, RAG prep scripts, or support automation that must normalize attachments to markdown.
Skip if: Pure front-end teams with no Python runtime, or workflows that only need trivial plain-text files without PDF/Office/HTML conversion.
When should I use this skill?
Implementing or debugging MarkItDown conversions from files, byte streams, or HTTP URLs in Python.
What do I get? / Deliverables
Working Python calls return result.text_content from paths, byte streams, or HTTPS sources, ready for chunking or prompt injection.
- Python ingestion snippet using MarkItDown
- Markdown text output from target documents
Recommended Skills
Journey fit
Document ingestion supports building features, RAG pipelines, and doc tooling—canonical placement is Build where libraries get wired into the product. Integrations subphase fits URI handlers, convert/convert_url/convert_stream entry points, and plugin-style converters.
How it compares
Python library integration skill—not an MCP server wrapper or a hosted SaaS converter.
Common Questions / FAQ
Who is markitdown for?
Solo builders and small teams using Python plus AI agents who need reliable document-to-markdown conversion in repos or automation scripts.
When should I use markitdown?
During Build integrations when wiring upload handlers, CLI importers, or agent tools that must read PDFs, HTML, or remote document URLs.
Is markitdown safe to install?
The skill describes code that may read local files and fetch URLs; review the Security Audits panel on this Prism page and sandbox network/file access in your agent.
SKILL.md
READMESKILL.md - Markitdown
# MarkItDown Advanced Features Advanced functionality for custom converters, URI handling, and plugins. ## Conversion Methods MarkItDown provides multiple conversion entry points. ### convert() Universal method that auto-detects source type. ```python from markitdown import MarkItDown md = MarkItDown() # File path result = md.convert("document.pdf") # URL result = md.convert("https://example.com/page.html") # HTTP Response object import requests response = requests.get("https://example.com/doc.pdf") result = md.convert(response) ``` ### convert_local() For local file paths only. ```python from markitdown import MarkItDown md = MarkItDown() result = md.convert_local("./documents/report.pdf") print(result.text_content) ``` ### convert_stream() For binary file-like objects. ```python from markitdown import MarkItDown import io md = MarkItDown() # From bytes with open("document.pdf", "rb") as f: content = f.read() stream = io.BytesIO(content) result = md.convert_stream(stream) # From HTTP response import requests response = requests.get("https://example.com/doc.pdf") stream = io.BytesIO(response.content) result = md.convert_stream(stream) ``` > **Note:** `convert_stream()` requires binary streams only (v0.1.0+). > Text streams (`io.StringIO`) are not supported. ### convert_url() For HTTP/HTTPS URLs. ```python from markitdown import MarkItDown md = MarkItDown() result = md.convert_url("https://example.com/document.pdf") print(result.text_content) ``` ### convert_uri() For any URI scheme (http, https, file, data). ```python from markitdown import MarkItDown md = MarkItDown() # HTTP URL result = md.convert_uri("https://example.com/page.html") # Local file URI result = md.convert_uri("file:///path/to/document.pdf") # Data URI result = md.convert_uri("data:text/plain;base64,SGVsbG8gV29ybGQ=") ``` ### convert_response() For `requests.Response` objects. ```python from markitdown import MarkItDown import requests md = MarkItDown() response = requests.get("https://example.com/report.pdf") result = md.convert_response(response) print(result.text_content) ``` ## Result Object The `DocumentConverterResult` contains conversion output. ```python from markitdown import MarkItDown md = MarkItDown() result = md.convert("document.pdf") # Main content print(result.text_content) # Alias for text_content print(result.markdown) # Document title (if available) if result.title: print(f"Title: {result.title}") ``` ## Custom Converters Create custom converters for unsupported formats. ### Basic Custom Converter ```python from markitdown import MarkItDown from markitdown._base_converter import DocumentConverter from markitdown._base_converter import DocumentConverterResult from typing import BinaryIO class MyFormatConverter(DocumentConverter): """Converter for .myformat files.""" def accepts( self, file_stream: BinaryIO, stream_info: "StreamInfo", **kwargs ) -> bool: """Check if this converter handles the file.""" # Check by extension if stream_info.extension: return stream_info.extension.lower() == ".myformat" # Check by MIME type if stream_info.mime_type: return stream_info.mime_type == "application/x-myformat" return False def convert( self, file_stream: BinaryIO, stream_info: "StreamInfo", **kwargs ) -> DocumentConverterResult: """Convert the file to markdown.""" content = file_stream.read().decode("utf-8") # Process content... markdown = f"# My Format\n\n{content}" return DocumentConverterResult( title="My Document", text_content=markdown ) # Register the converter md = MarkItDown() md.register_converter(MyFormatConverter()) # Use it result = md.convert("document.myformat") ``` ### Converter Priority Converters are matched in priority order (lower = higher priority