Neo4j Document Import Skill

Name: Neo4j Document Import Skill
Author: neo4j-contrib

neo4j-contrib/neo4j-skills

Turn PDFs and text corpora into a queryable Neo4j knowledge graph with chunking, extraction, and loader choices your agent can execute end to end.

Overview

neo4j-document-import-skill is an agent skill for the Build phase that guides importing unstructured documents into Neo4j as a knowledge graph via chunking, LLM extraction, and standard loaders.

Install

npx skills add https://github.com/neo4j-contrib/neo4j-skills --skill neo4j-document-import-skill

What is this skill?

Covers PDF/text chunking through entity extraction via SimpleKGPipeline and related Neo4j graphrag patterns
Compares five chunking strategies (fixed-size, sentence/paragraph, semantic, n-gram, structural) with neo4j-graphrag and
Documents no-code LLM Graph Builder path plus apoc.load.json and LangChain/LlamaIndex document loaders
Extended reference SKILL overflow for chunking strategy and entity-resolution detail when imports get noisy
Install via npx skills add from neo4j-contrib/neo4j-skills repo
Five chunking strategies documented in the strategy comparison table
Supports SimpleKGPipeline, apoc.load.json, and LangChain/LlamaIndex loaders

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 1 installs on skills.sh; 80 GitHub stars; 2/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

What problem does it solve?

You have piles of PDFs and text but no repeatable agent playbook to chunk, extract entities, and load them into Neo4j without re-researching every tool path.

Who is it for?

Solo builders shipping RAG or graph-augmented features on Neo4j who need structured import steps instead of one-off notebook experiments.

Skip if: Teams that only need simple CRUD on Postgres, have no Neo4j instance, or want a finished production ETL without reviewing draft skill guidance.

When should I use this skill?

You need to import unstructured documents into Neo4j as a knowledge graph and must choose chunking, extraction, and load mechanisms.

What do I get? / Deliverables

Your agent picks a chunking strategy, extraction pipeline, and Neo4j load path aligned to your corpus, with extended reference for entity resolution when quality slips.

Chosen chunking and extraction pipeline configuration for the corpus
Step-by-step import runbook aligned to Neo4j loaders in use
Notes on entity resolution when extended reference is loaded

Recommended Skills

Supabase Postgres Best Practicessupabase/agent-skills

Supabase Postgres Best Practices is an MIT-licensed reference skill from Supabase that packages performance and reliabil…217k installs·2.2k stars

Lark Baselarksuite/cli

Lark CLI skill for Feishu multidimensional tables, including schema, records, and analysis-oriented query patterns.210k installs·13.7k stars

Convex Migration Helperget-convex/agent-skills

Convex Migration Helper is an agent skill from the Convex toolkit that walks solo builders through safe schema and data …61.9k installs·31 stars

Neon Postgresneondatabase/agent-skills

neon-postgres guides coding agents through any Neon Serverless Postgres task: creating projects, choosing connection str…38.3k installs·68 stars

Firebase Firestore Standardfirebase/agent-skills

firebase-firestore-standard is a comprehensive Firestore agent skill for solo builders who need Cloud Firestore Standard…36.7k installs·345 stars

Postgresql Table Designwshobson/agents

PostgreSQL Table Design is an agent skill that walks solo builders through designing or reviewing Postgres-specific sche…18.5k installs·36.5k stars

Journey fit

Primary fit

BuildIntegrations & version control

Document-to-graph ingestion is core product integration work once you are building backends and data layers, not early ideation. The skill wires external document pipelines (LLM extraction, LangChain/LlamaIndex, APOC JSON) into Neo4j—classic integrations subphase work.

Also useful

BuildBackend, data & payments

Also useful

BuildAgent skills & templates

How it compares

Procedural import orchestration for Neo4j graphs—not a hosted MCP server or a generic vector-only embedding tutorial.

Common Questions / FAQ

Who is neo4j-document-import-skill for?

Indie developers and small teams using Claude Code, Cursor, or Codex who already chose Neo4j and want the agent to execute document-to-graph imports with known chunking and loader options.

When should I use neo4j-document-import-skill?

During Build integrations when ingesting PDFs or text, when comparing fixed-size versus semantic chunking for long docs, or when wiring LangChain/LlamaIndex loaders into your graph schema.

Is neo4j-document-import-skill safe to install?

Review the Security Audits panel on this Prism page and your agent’s network/filesystem permissions before pointing it at proprietary documents or production Neo4j credentials.

SKILL.md

READMESKILL.md - Neo4j Document Import Skill

> **Status: Draft / WIP**

# neo4j-document-import-skill

Guides agents through importing unstructured documents into Neo4j as a knowledge graph: PDF/text chunking, LLM entity extraction (SimpleKGPipeline), LLM Graph Builder (no-code), apoc.load.json, and LangChain/LlamaIndex document loaders.

**Install:**
```bash
npx skills add https://github.com/neo4j-contrib/neo4j-skills --skill neo4j-document-import-skill
```

Or paste this link into your coding assistant:
https://github.com/neo4j-contrib/neo4j-skills/tree/main/neo4j-document-import-skill


# KG Construction — Extended Reference

Overflow from `SKILL.md` — load when detailed chunking strategy or entity resolution config needed.

---

## Chunking Strategy Comparison

| Strategy | How it splits | Best for | neo4j-graphrag class |
|---|---|---|---|
| Fixed-size | Token count with optional boundary respect | Dense technical docs; most use-cases | `FixedSizeSplitter(chunk_size, chunk_overlap)` |
| Sentence/paragraph | Natural language boundaries (`\n\n`, `.`) | Narrative text, news articles, course content | LangChain `CharacterTextSplitter(separator="\n\n")` |
| Semantic | Embedding similarity between adjacent sentences | Long-form documents with topic shifts | LangChain `SemanticChunker` (requires embedder) |
| N-gram | Overlapping windows of n words | Short snippets, keyword-dense text | Custom — not built into neo4j-graphrag |
| Structural | By section/heading/method (doc-specific) | API docs, legal contracts, structured PDFs | Custom — parse structure then chunk |

**Rule**: Start with `FixedSizeSplitter(chunk_size=512, chunk_overlap=50)`. Switch to paragraph-based when sentences must not break (courses, articles). Switch to semantic chunking only when topic coherence within chunks is critical and embedder calls during ingestion are affordable.

**Combination pattern** (course content model from GraphAcademy course):
```
Course → Module → Lesson → Paragraph
```
Split doc into structural units (Module/Lesson), then chunk each Lesson into Paragraphs (`\n\n`). Store both levels; query at Paragraph for vector search, traverse to Lesson for context. Pattern:
```python
from langchain_text_splitters import CharacterTextSplitter
splitter = CharacterTextSplitter(separator="\n\n", chunk_size=1500, chunk_overlap=200)
paragraphs = splitter.split_documents(lesson_docs)
```

LangChain `CharacterTextSplitter` behavior:
1. Split by `separator` (paragraph breaks)
2. Combine paragraphs up to `chunk_size` chars
3. If single paragraph > `chunk_size`: keep as-is (no mid-paragraph cut)
4. Add last paragraph of chunk N to start of chunk N+1 only if it's ≤ `chunk_overlap` chars

---

## Entity Resolver — Full Config

Resolvers merge duplicate entities after bulk ingest. All use APOC `refactor.mergeNodes` internally.

### Class Hierarchy

```
EntityResolver (base)
  ├── SinglePropertyExactMatchResolver  — exact name match
  ├── BasePropertySimilarityResolver (abstract)
  │     ├── FuzzyMatchResolver          — Levenshtein; pip install rapidfuzz
  │     └── SpaCySemanticMatchResolver  — cosine; pip install neo4j-graphrag[nlp]
  └── (custom subclass)
```

### SinglePropertyExactMatchResolver

```python
from neo4j_graphrag.experimental.components.resolver import SinglePropertyExactMatchResolver

resolver = SinglePropertyExactMatchResolver(
    driver=driver,
    filter_query="WHERE n:Organization OR n:Person",   # optional: narrow scope
    resolve_property="name",   # default: "name"
    neo4j_database="neo4j",    # optional
)
stats = asyncio.run(resolver.run())
# stats.number_of_nodes_to_resolve, stats.number_of_created_nodes
```

### FuzzyMatchResolver

```python
from neo4j_graphrag.experimental.components.resolver import FuzzyMatchResolver

resolver = FuzzyMatchResolver(
    driver=driver,
    resolve_properties=["name"],   # list of properties to concatenate + compare
    threshold=0.9,    # Levenshtein similarity 0–1; lower = more aggressive merging
    filter_query="WHERE n:Organization",
)
a

What is this skill?

Covers PDF/text chunking through entity extraction via SimpleKGPipeline and related Neo4j graphrag patterns

Compares five chunking strategies (fixed-size, sentence/paragraph, semantic, n-gram, structural) with neo4j-graphrag and

Documents no-code LLM Graph Builder path plus apoc.load.json and LangChain/LlamaIndex document loaders

Extended reference SKILL overflow for chunking strategy and entity-resolution detail when imports get noisy

Install via npx skills add from neo4j-contrib/neo4j-skills repo

Five chunking strategies documented in the strategy comparison table

Supports SimpleKGPipeline, apoc.load.json, and LangChain/LlamaIndex loaders

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 1 installs on skills.sh; 80 GitHub stars; 2/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

What do I get? / Deliverables

Your agent picks a chunking strategy, extraction pipeline, and Neo4j load path aligned to your corpus, with extended reference for entity resolution when quality slips.

Chosen chunking and extraction pipeline configuration for the corpus

Step-by-step import runbook aligned to Neo4j loaders in use

Notes on entity resolution when extended reference is loaded

Journey fit

Primary fit

BuildIntegrations & version control

Also useful

BuildBackend, data & payments

Also useful

BuildAgent skills & templates

SKILL.md

READMESKILL.md - Neo4j Document Import Skill

> **Status: Draft / WIP**

# neo4j-document-import-skill

Guides agents through importing unstructured documents into Neo4j as a knowledge graph: PDF/text chunking, LLM entity extraction (SimpleKGPipeline), LLM Graph Builder (no-code), apoc.load.json, and LangChain/LlamaIndex document loaders.

**Install:**
```bash
npx skills add https://github.com/neo4j-contrib/neo4j-skills --skill neo4j-document-import-skill
```

Or paste this link into your coding assistant:
https://github.com/neo4j-contrib/neo4j-skills/tree/main/neo4j-document-import-skill


# KG Construction — Extended Reference

Overflow from `SKILL.md` — load when detailed chunking strategy or entity resolution config needed.

---

## Chunking Strategy Comparison

| Strategy | How it splits | Best for | neo4j-graphrag class |
|---|---|---|---|
| Fixed-size | Token count with optional boundary respect | Dense technical docs; most use-cases | `FixedSizeSplitter(chunk_size, chunk_overlap)` |
| Sentence/paragraph | Natural language boundaries (`\n\n`, `.`) | Narrative text, news articles, course content | LangChain `CharacterTextSplitter(separator="\n\n")` |
| Semantic | Embedding similarity between adjacent sentences | Long-form documents with topic shifts | LangChain `SemanticChunker` (requires embedder) |
| N-gram | Overlapping windows of n words | Short snippets, keyword-dense text | Custom — not built into neo4j-graphrag |
| Structural | By section/heading/method (doc-specific) | API docs, legal contracts, structured PDFs | Custom — parse structure then chunk |

**Rule**: Start with `FixedSizeSplitter(chunk_size=512, chunk_overlap=50)`. Switch to paragraph-based when sentences must not break (courses, articles). Switch to semantic chunking only when topic coherence within chunks is critical and embedder calls during ingestion are affordable.

**Combination pattern** (course content model from GraphAcademy course):
```
Course → Module → Lesson → Paragraph
```
Split doc into structural units (Module/Lesson), then chunk each Lesson into Paragraphs (`\n\n`). Store both levels; query at Paragraph for vector search, traverse to Lesson for context. Pattern:
```python
from langchain_text_splitters import CharacterTextSplitter
splitter = CharacterTextSplitter(separator="\n\n", chunk_size=1500, chunk_overlap=200)
paragraphs = splitter.split_documents(lesson_docs)
```

LangChain `CharacterTextSplitter` behavior:
1. Split by `separator` (paragraph breaks)
2. Combine paragraphs up to `chunk_size` chars
3. If single paragraph > `chunk_size`: keep as-is (no mid-paragraph cut)
4. Add last paragraph of chunk N to start of chunk N+1 only if it's ≤ `chunk_overlap` chars

---

## Entity Resolver — Full Config

Resolvers merge duplicate entities after bulk ingest. All use APOC `refactor.mergeNodes` internally.

### Class Hierarchy

```
EntityResolver (base)
  ├── SinglePropertyExactMatchResolver  — exact name match
  ├── BasePropertySimilarityResolver (abstract)
  │     ├── FuzzyMatchResolver          — Levenshtein; pip install rapidfuzz
  │     └── SpaCySemanticMatchResolver  — cosine; pip install neo4j-graphrag[nlp]
  └── (custom subclass)
```

### SinglePropertyExactMatchResolver

```python
from neo4j_graphrag.experimental.components.resolver import SinglePropertyExactMatchResolver

resolver = SinglePropertyExactMatchResolver(
    driver=driver,
    filter_query="WHERE n:Organization OR n:Person",   # optional: narrow scope
    resolve_property="name",   # default: "name"
    neo4j_database="neo4j",    # optional
)
stats = asyncio.run(resolver.run())
# stats.number_of_nodes_to_resolve, stats.number_of_created_nodes
```

### FuzzyMatchResolver

```python
from neo4j_graphrag.experimental.components.resolver import FuzzyMatchResolver

resolver = FuzzyMatchResolver(
    driver=driver,
    resolve_properties=["name"],   # list of properties to concatenate + compare
    threshold=0.9,    # Levenshtein similarity 0–1; lower = more aggressive merging
    filter_query="WHERE n:Organization",
)
a

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is neo4j-document-import-skill for?

When should I use neo4j-document-import-skill?

Is neo4j-document-import-skill safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is neo4j-document-import-skill for?

When should I use neo4j-document-import-skill?

Is neo4j-document-import-skill safe to install?

SKILL.md