Markitdown

Name: Markitdown
Author: julianobarbosa

julianobarbosa/claude-code-skills

Convert PDFs, Office files, HTML pages, and HTTP sources into markdown text inside agent workflows using the MarkItDown Python API.

Overview

MarkItDown is an agent skill for the Build phase that documents how to convert local files, streams, and URLs to markdown using the MarkItDown Python library.

Install

npx skills add https://github.com/julianobarbosa/claude-code-skills --skill markitdown

What is this skill?

convert() auto-detects file paths, URLs, and requests.Response objects
convert_local(), convert_stream() (binary BytesIO), and convert_url() dedicated entry points
Stream path requires binary streams only—StringIO not supported (v0.1.0+ note)
Advanced path for custom converters, URI handling, and plugins per skill doc
Python snippets agents can drop into ingestion scripts or agent tools
Binary streams only for convert_stream() per v0.1.0+ note

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 356 installs on skills.sh; 76 GitHub stars; 2/3 security scanners passed (skills.sh audits).

What problem does it solve?

Your agent pipeline needs text from PDFs and web docs but you keep reimplementing brittle one-off parsers instead of a unified conversion API.

Who is it for?

Indie devs building doc ingestion, RAG prep scripts, or support automation that must normalize attachments to markdown.

Skip if: Pure front-end teams with no Python runtime, or workflows that only need trivial plain-text files without PDF/Office/HTML conversion.

When should I use this skill?

Implementing or debugging MarkItDown conversions from files, byte streams, or HTTP URLs in Python.

What do I get? / Deliverables

Working Python calls return result.text_content from paths, byte streams, or HTTPS sources, ready for chunking or prompt injection.

Python ingestion snippet using MarkItDown
Markdown text output from target documents

Recommended Skills

Python Performance Optimizationwshobson/agents

Python Performance Optimization is an agent skill that walks solo builders through profiling and fixing slow Python code…25.6k installs·36.5k stars

Python Testing Patternswshobson/agents

Python Testing Patterns is an advanced agent reference for pytest-heavy solo backends and automation scripts. It goes pa…22.5k installs·36.5k stars

Python Design Patternswshobson/agents

Python Design Patterns is an agent skill that teaches maintainable structure for solo and indie builders writing Python:…12.7k installs·36.5k stars

Python Executorqu-skills/skills

The python-executor skill packages procedural knowledge for executing Python inside inference.sh’s safe, CPU-only sandbo…12.4k installs·512 stars

Async Python Patternswshobson/agents

async-python-patterns is an agent skill that teaches asyncio through detailed, runnable examples—async context managers,…11.4k installs·36.5k stars

Uv Package Managerwshobson/agents

UV Package Manager is an advanced reference skill for Astral’s uv toolchain, aimed at solo Python builders who want one …9.6k installs·36.5k stars

Journey fit

Primary fit

BuildIntegrations & version control

Document ingestion supports building features, RAG pipelines, and doc tooling—canonical placement is Build where libraries get wired into the product. Integrations subphase fits URI handlers, convert/convert_url/convert_stream entry points, and plugin-style converters.

How it compares

Python library integration skill—not an MCP server wrapper or a hosted SaaS converter.

Common Questions / FAQ

Who is markitdown for?

Solo builders and small teams using Python plus AI agents who need reliable document-to-markdown conversion in repos or automation scripts.

When should I use markitdown?

During Build integrations when wiring upload handlers, CLI importers, or agent tools that must read PDFs, HTML, or remote document URLs.

Is markitdown safe to install?

The skill describes code that may read local files and fetch URLs; review the Security Audits panel on this Prism page and sandbox network/file access in your agent.

SKILL.md

READMESKILL.md - Markitdown

# MarkItDown Advanced Features

Advanced functionality for custom converters, URI handling, and plugins.

## Conversion Methods

MarkItDown provides multiple conversion entry points.

### convert()

Universal method that auto-detects source type.

```python
from markitdown import MarkItDown

md = MarkItDown()

# File path
result = md.convert("document.pdf")

# URL
result = md.convert("https://example.com/page.html")

# HTTP Response object
import requests
response = requests.get("https://example.com/doc.pdf")
result = md.convert(response)
```

### convert_local()

For local file paths only.

```python
from markitdown import MarkItDown

md = MarkItDown()
result = md.convert_local("./documents/report.pdf")
print(result.text_content)
```

### convert_stream()

For binary file-like objects.

```python
from markitdown import MarkItDown
import io

md = MarkItDown()

# From bytes
with open("document.pdf", "rb") as f:
    content = f.read()

stream = io.BytesIO(content)
result = md.convert_stream(stream)

# From HTTP response
import requests
response = requests.get("https://example.com/doc.pdf")
stream = io.BytesIO(response.content)
result = md.convert_stream(stream)
```

> **Note:** `convert_stream()` requires binary streams only (v0.1.0+).
> Text streams (`io.StringIO`) are not supported.

### convert_url()

For HTTP/HTTPS URLs.

```python
from markitdown import MarkItDown

md = MarkItDown()
result = md.convert_url("https://example.com/document.pdf")
print(result.text_content)
```

### convert_uri()

For any URI scheme (http, https, file, data).

```python
from markitdown import MarkItDown

md = MarkItDown()

# HTTP URL
result = md.convert_uri("https://example.com/page.html")

# Local file URI
result = md.convert_uri("file:///path/to/document.pdf")

# Data URI
result = md.convert_uri("data:text/plain;base64,SGVsbG8gV29ybGQ=")
```

### convert_response()

For `requests.Response` objects.

```python
from markitdown import MarkItDown
import requests

md = MarkItDown()
response = requests.get("https://example.com/report.pdf")
result = md.convert_response(response)
print(result.text_content)
```

## Result Object

The `DocumentConverterResult` contains conversion output.

```python
from markitdown import MarkItDown

md = MarkItDown()
result = md.convert("document.pdf")

# Main content
print(result.text_content)

# Alias for text_content
print(result.markdown)

# Document title (if available)
if result.title:
    print(f"Title: {result.title}")
```

## Custom Converters

Create custom converters for unsupported formats.

### Basic Custom Converter

```python
from markitdown import MarkItDown
from markitdown._base_converter import DocumentConverter
from markitdown._base_converter import DocumentConverterResult
from typing import BinaryIO

class MyFormatConverter(DocumentConverter):
    """Converter for .myformat files."""

    def accepts(
        self,
        file_stream: BinaryIO,
        stream_info: "StreamInfo",
        **kwargs
    ) -> bool:
        """Check if this converter handles the file."""
        # Check by extension
        if stream_info.extension:
            return stream_info.extension.lower() == ".myformat"
        # Check by MIME type
        if stream_info.mime_type:
            return stream_info.mime_type == "application/x-myformat"
        return False

    def convert(
        self,
        file_stream: BinaryIO,
        stream_info: "StreamInfo",
        **kwargs
    ) -> DocumentConverterResult:
        """Convert the file to markdown."""
        content = file_stream.read().decode("utf-8")

        # Process content...
        markdown = f"# My Format\n\n{content}"

        return DocumentConverterResult(
            title="My Document",
            text_content=markdown
        )

# Register the converter
md = MarkItDown()
md.register_converter(MyFormatConverter())

# Use it
result = md.convert("document.myformat")
```

### Converter Priority

Converters are matched in priority order (lower = higher priority

What is this skill?

convert() auto-detects file paths, URLs, and requests.Response objects

convert_local(), convert_stream() (binary BytesIO), and convert_url() dedicated entry points

Stream path requires binary streams only—StringIO not supported (v0.1.0+ note)

Advanced path for custom converters, URI handling, and plugins per skill doc

Python snippets agents can drop into ingestion scripts or agent tools

Binary streams only for convert_stream() per v0.1.0+ note

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 356 installs on skills.sh; 76 GitHub stars; 2/3 security scanners passed (skills.sh audits).

Journey fit

Primary fit

BuildIntegrations & version control

SKILL.md

READMESKILL.md - Markitdown

# MarkItDown Advanced Features

Advanced functionality for custom converters, URI handling, and plugins.

## Conversion Methods

MarkItDown provides multiple conversion entry points.

### convert()

Universal method that auto-detects source type.

```python
from markitdown import MarkItDown

md = MarkItDown()

# File path
result = md.convert("document.pdf")

# URL
result = md.convert("https://example.com/page.html")

# HTTP Response object
import requests
response = requests.get("https://example.com/doc.pdf")
result = md.convert(response)
```

### convert_local()

For local file paths only.

```python
from markitdown import MarkItDown

md = MarkItDown()
result = md.convert_local("./documents/report.pdf")
print(result.text_content)
```

### convert_stream()

For binary file-like objects.

```python
from markitdown import MarkItDown
import io

md = MarkItDown()

# From bytes
with open("document.pdf", "rb") as f:
    content = f.read()

stream = io.BytesIO(content)
result = md.convert_stream(stream)

# From HTTP response
import requests
response = requests.get("https://example.com/doc.pdf")
stream = io.BytesIO(response.content)
result = md.convert_stream(stream)
```

> **Note:** `convert_stream()` requires binary streams only (v0.1.0+).
> Text streams (`io.StringIO`) are not supported.

### convert_url()

For HTTP/HTTPS URLs.

```python
from markitdown import MarkItDown

md = MarkItDown()
result = md.convert_url("https://example.com/document.pdf")
print(result.text_content)
```

### convert_uri()

For any URI scheme (http, https, file, data).

```python
from markitdown import MarkItDown

md = MarkItDown()

# HTTP URL
result = md.convert_uri("https://example.com/page.html")

# Local file URI
result = md.convert_uri("file:///path/to/document.pdf")

# Data URI
result = md.convert_uri("data:text/plain;base64,SGVsbG8gV29ybGQ=")
```

### convert_response()

For `requests.Response` objects.

```python
from markitdown import MarkItDown
import requests

md = MarkItDown()
response = requests.get("https://example.com/report.pdf")
result = md.convert_response(response)
print(result.text_content)
```

## Result Object

The `DocumentConverterResult` contains conversion output.

```python
from markitdown import MarkItDown

md = MarkItDown()
result = md.convert("document.pdf")

# Main content
print(result.text_content)

# Alias for text_content
print(result.markdown)

# Document title (if available)
if result.title:
    print(f"Title: {result.title}")
```

## Custom Converters

Create custom converters for unsupported formats.

### Basic Custom Converter

```python
from markitdown import MarkItDown
from markitdown._base_converter import DocumentConverter
from markitdown._base_converter import DocumentConverterResult
from typing import BinaryIO

class MyFormatConverter(DocumentConverter):
    """Converter for .myformat files."""

    def accepts(
        self,
        file_stream: BinaryIO,
        stream_info: "StreamInfo",
        **kwargs
    ) -> bool:
        """Check if this converter handles the file."""
        # Check by extension
        if stream_info.extension:
            return stream_info.extension.lower() == ".myformat"
        # Check by MIME type
        if stream_info.mime_type:
            return stream_info.mime_type == "application/x-myformat"
        return False

    def convert(
        self,
        file_stream: BinaryIO,
        stream_info: "StreamInfo",
        **kwargs
    ) -> DocumentConverterResult:
        """Convert the file to markdown."""
        content = file_stream.read().decode("utf-8")

        # Process content...
        markdown = f"# My Format\n\n{content}"

        return DocumentConverterResult(
            title="My Document",
            text_content=markdown
        )

# Register the converter
md = MarkItDown()
md.register_converter(MyFormatConverter())

# Use it
result = md.convert("document.myformat")
```

### Converter Priority

Converters are matched in priority order (lower = higher priority

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is markitdown for?

When should I use markitdown?

Is markitdown safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is markitdown for?

When should I use markitdown?

Is markitdown safe to install?

SKILL.md