
Neo4j Vector Index Skill
Create Neo4j vector indexes, ingest embeddings, and run hybrid vector-plus-graph retrieval for RAG or similarity search.
Install
npx skills add https://github.com/neo4j-contrib/neo4j-skills --skill neo4j-vector-index-skillWhat is this skill?
- CREATE VECTOR INDEX with dimensions and similarity function; wait for ONLINE via SHOW VECTOR INDEXES
- Python batch ingestion with UNWIND and db.create.setNodeVectorProperty
- In-Cypher ai.text.embed() [2025.12+] and ai.text.embedBatch(); notes genai.vector.encode() deprecated
- Vector SEARCH clause [2026.01+] with db.index.vector.queryNodes() fallback on 5.x+
- Hybrid semantic, lexical, and structural retrieval plus chunking strategies (fixed-size, sentence, semantic)
Adoption & trust: 1 installs on skills.sh; 80 GitHub stars; 2/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).
Recommended Skills
Journey fit
Vector index setup and Cypher query patterns are backend data-layer work during product build, before launch-scale traffic. backend is the canonical shelf because the skill focuses on indexes, ingestion loops, and search procedures—not frontend UI or growth analytics.
Common Questions / FAQ
Is Neo4j Vector Index Skill safe to install?
skills.sh reports 2 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.
SKILL.md
READMESKILL.md - Neo4j Vector Index Skill
# neo4j-vector-index-skill Skill for creating and querying vector indexes in Neo4j for semantic or structural similarity search. **Covers:** - Creating vector indexes: `CREATE VECTOR INDEX` with dimensions and similarity function - Waiting for index `ONLINE` status; `SHOW VECTOR INDEXES` - Embedding ingestion: Python batch loop with `UNWIND`, `db.create.setNodeVectorProperty` - In-Cypher embedding with `ai.text.embed()` [2025.12] — replaces deprecated `genai.vector.encode()` - Batch embedding procedure `ai.text.embedBatch()` for large datasets - Vector search: `SEARCH` clause [2026.01+] and `db.index.vector.queryNodes()` procedure fallback - Combining vector search with graph traversal (hybrid retrieval) - Hybrid search, including semantic + lexical + structural sources - Vector indexes over embeddings already written by GDS algorithms - Chunking strategy before ingestion (fixed-size, sentence, semantic) - Similarity function guidance: cosine vs euclidean — match your model's training loss - Common errors: wrong dimensions, index not ONLINE, provider null returns **Version / compatibility:** - `SEARCH` clause requires Neo4j 2026.01+; `db.index.vector.queryNodes` available 5.x+ - `ai.text.embed()` requires Neo4j 2025.12+ and CYPHER 25; `genai.vector.encode()` is deprecated - Vector type is native in CYPHER 25; stored as `LIST<FLOAT>` in older versions **Not covered:** - Full `ai.text.*` plugin reference (completion, chat, structured output) → `neo4j-genai-plugin-skill` - GraphRAG pipelines with `neo4j-graphrag` → `neo4j-graphrag-skill` - Fulltext-only / keyword-only search → `neo4j-cypher-skill` - Computing GDS node embedding algorithms (FastRP, GraphSAGE) → `neo4j-gds-skill` **Install:** ```bash npx skills add https://github.com/neo4j-contrib/neo4j-skills --skill neo4j-vector-index-skill ``` Or paste this link into your coding assistant: https://github.com/neo4j-contrib/neo4j-skills/tree/main/neo4j-vector-index-skill # Hybrid Search Hybrid search is useful when one retrieval signal is not enough: - Semantic vector search finds paraphrases; misses exact names, acronyms, codes, and domain terms. - Lexical fulltext search finds exact words; misses related concepts that do not share words. - Structural search uses graph topology, paths, communities, or GDS node embeddings; captures relationships text does not contain. Combining ranked sources improves recall and can boost results that are supported by more than one signal. The common pattern is vector + fulltext, but the same query shape works for any two or more ranked/scored sources: several vector indexes, title/body fulltext indexes, GDS-written structural embeddings, graph-derived candidate scores, or external retrieval scores. Use when the user asks for custom Cypher hybrid search, WRRF/RRF, vector + fulltext, semantic + lexical + structural search, multiple vector indexes, or combining two+ ranked/scored retrieval sources. ## When NOT to Use - `neo4j-graphrag` package `HybridRetriever` / `HybridCypherRetriever` -> use `neo4j-graphrag-skill` - Fulltext-only / keyword-only search -> use `neo4j-cypher-skill` - Single vector search -> use main `neo4j-vector-index-skill` ## Rules - Run each source independently. - Rank each source by `score DESC, stable_id ASC`. - Do not compare raw scores from different sources. - Compute `contribution = sourceWeight / (rrfConstant + sourceRank)`. - Sum contributions per node. - Order final rows by `wrrf DESC, stable_id ASC`. - Use `sourceK > finalK`; combine before final limiting. - Use stable unique property for tie breaks. If no stable key exists, add one before production use. - Keep `LIMIT $sourceK` inside `SEARCH`; Cypher rejects a `LET` alias there. - For structural vector sources, compute/write GDS embeddings first, then create a vector index over that property. ## Index Setup Vector index: ```cypher CYPHER 25 CREATE VECTOR INDEX chunk_embedding IF NOT EXISTS FOR (c:Chunk) ON (c.embedding) OPTIONS { indexConfig: { `