Neo4j Spark Skill

Name: Neo4j Spark Skill
Author: neo4j-contrib

neo4j-contrib/neo4j-skills

Wire Apache Spark or Databricks jobs to read and write Neo4j with correct connector options, partitioning, and Delta-to-graph ingestion patterns.

Install

npx skills add https://github.com/neo4j-contrib/neo4j-skills --skill neo4j-spark-skill

What is this skill?

SparkSession setup with org.neo4j:neo4j-connector-apache-spark Maven coordinates
Read paths: label scan, Cypher query, relationship scan; write paths with CREATE/MERGE and node.keys
Partition and batch tuning (partitions, batch.size, schema.flatten.limit)
Databricks cluster install, secrets, and Unity Catalog notes
Delta Lake → Neo4j ingestion pipeline pattern with PySpark and Scala examples

Adoption & trust: 1 installs on skills.sh; 80 GitHub stars; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

Recommended Skills

Supabase Postgres Best Practicessupabase/agent-skills

Supabase Postgres Best Practices is an MIT-licensed reference skill from Supabase that packages performance and reliabil…217k installs·2.2k stars

Lark Baselarksuite/cli

Lark CLI skill for Feishu multidimensional tables, including schema, records, and analysis-oriented query patterns.210k installs·13.7k stars

Convex Migration Helperget-convex/agent-skills

Convex Migration Helper is an agent skill from the Convex toolkit that walks solo builders through safe schema and data …61.9k installs·31 stars

Neon Postgresneondatabase/agent-skills

neon-postgres guides coding agents through any Neon Serverless Postgres task: creating projects, choosing connection str…38.3k installs·68 stars

Firebase Firestore Standardfirebase/agent-skills

firebase-firestore-standard is a comprehensive Firestore agent skill for solo builders who need Cloud Firestore Standard…36.7k installs·345 stars

Postgresql Table Designwshobson/agents

PostgreSQL Table Design is an agent skill that walks solo builders through designing or reviewing Postgres-specific sche…18.5k installs·36.5k stars

Journey fit

Primary fit

BuildIntegrations & version control

Spark–Neo4j integration is build-time systems work connecting analytics pipelines to the graph store. Connector setup, DataFrame read/write modes, and cloud cluster install belong in integrations rather than generic backend CRUD.

Common Questions / FAQ

Is Neo4j Spark Skill safe to install?

skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

SKILL.md

READMESKILL.md - Neo4j Spark Skill

# neo4j-spark-skill

Skill for reading and writing Neo4j data using the Neo4j Connector for Apache Spark, including Databricks, EMR, and standalone Spark environments.

**Covers:**
- SparkSession setup with Maven artifact `org.neo4j:neo4j-connector-apache-spark`
- DataFrame reads: label scan, Cypher query, relationship scan
- DataFrame writes: node CREATE/MERGE, relationship write with source/target mapping
- `node.keys` for Overwrite (MERGE) mode
- Partition and batch tuning (`partitions`, `batch.size`, `schema.flatten.limit`)
- Databricks cluster installation, secrets management, Unity Catalog notes
- Delta Lake → Neo4j ingestion pipeline pattern
- PySpark and Scala code examples

**Version / Compatibility:**
- Connector: `5.4.2_for_spark_3` (Scala 2.12 or 2.13)
- Spark: 3.3, 3.4, 3.5
- Databricks Runtime: 12.2, 13.3, 14.3 LTS
- Neo4j: 4.4, 5.x, 2025.x

**Not covered:**
- Cypher query authoring → `neo4j-cypher-skill`
- Neo4j Python bolt driver → `neo4j-driver-python-skill`
- GDS graph algorithms → `neo4j-gds-skill`
- Spring Boot + Neo4j → `neo4j-spring-data-skill`

**Install:**
```bash
npx skills add https://github.com/neo4j-contrib/neo4j-skills --skill neo4j-spark-skill
```

Or paste this link into your coding assistant:
https://github.com/neo4j-contrib/neo4j-skills/tree/main/neo4j-spark-skill


# Neo4j Spark Connector — Read Options Reference

Full option reference for `.read.format("org.neo4j.spark.DataSource")`.

## Core Read Options (mutually exclusive — pick one)

| Option | Value | Description |
|--------|-------|-------------|
| `labels` | `:Label` or `:Label1:Label2` | Read nodes with given label(s). Multiple = AND. |
| `query` | Cypher string | Custom MATCH ... RETURN query. Aliases become column names. |
| `relationship` | `REL_TYPE` | Read relationships of given type. Requires source/target label options. |

## Label Read Sub-Options

| Option | Default | Description |
|--------|---------|-------------|
| `node.keys` | — | Comma-separated property names to include as match keys |

## Relationship Read Sub-Options

| Option | Required | Description |
|--------|----------|-------------|
| `relationship.source.labels` | Yes | Colon-prefixed labels of source node `:Label` |
| `relationship.target.labels` | Yes | Colon-prefixed labels of target node `:Label` |

## Query Read Sub-Options

| Option | Description |
|--------|-------------|
| `query.count` | Cypher count query for partition planning (e.g. `MATCH (n:Person) RETURN count(n)`). Avoids full count scan. |

## Partition and Performance Options

| Option | Default | Description |
|--------|---------|-------------|
| `partitions` | `1` | Number of Spark partitions. Connector uses SKIP/LIMIT internally. |
| `batch.size` | `5000` | Rows per partition batch. |
| `schema.flatten.limit` | `10` | Rows sampled for schema inference (no APOC). Increase for heterogeneous nodes. |

## Output Columns

**Label scan result columns:**
- `<id>` — internal Neo4j element ID
- `<labels>` — array of node labels
- One column per node property

**Relationship scan result columns:**
- `<rel.id>` — internal relationship ID
- `<rel.type>` — relationship type string
- `<source.id>`, `<source.labels>`, `source.<prop>` — source node fields
- `<target.id>`, `<target.labels>`, `target.<prop>` — target node fields
- Relationship property columns at top level

## Schema Inference Notes

- Without APOC: samples `schema.flatten.limit` rows to infer types
- With APOC: uses `apoc.meta.nodeTypeProperties` — more accurate
- Map/list properties: flattened into dot-notation columns (e.g. `address.city`)
- Use `query` mode with explicit RETURN types when inference is unreliable

## Examples

### Multi-label AND filter

```python
df = (spark.read.format("org.neo4j.spark.DataSource")
    .option("labels", ":Person:Employee")
    .load())
```

### Cypher with explicit column types

```python
df = (spark.read.format("org.neo4j.spark.DataSource")
    .option("query", """
        MATCH (p:Person)-[r:ACTED_IN]

What is this skill?

SparkSession setup with org.neo4j:neo4j-connector-apache-spark Maven coordinates

Read paths: label scan, Cypher query, relationship scan; write paths with CREATE/MERGE and node.keys

Partition and batch tuning (partitions, batch.size, schema.flatten.limit)

Databricks cluster install, secrets, and Unity Catalog notes

Delta Lake → Neo4j ingestion pipeline pattern with PySpark and Scala examples

Adoption & trust: 1 installs on skills.sh; 80 GitHub stars; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

Journey fit

Primary fit

BuildIntegrations & version control

SKILL.md

READMESKILL.md - Neo4j Spark Skill

# neo4j-spark-skill

Skill for reading and writing Neo4j data using the Neo4j Connector for Apache Spark, including Databricks, EMR, and standalone Spark environments.

**Covers:**
- SparkSession setup with Maven artifact `org.neo4j:neo4j-connector-apache-spark`
- DataFrame reads: label scan, Cypher query, relationship scan
- DataFrame writes: node CREATE/MERGE, relationship write with source/target mapping
- `node.keys` for Overwrite (MERGE) mode
- Partition and batch tuning (`partitions`, `batch.size`, `schema.flatten.limit`)
- Databricks cluster installation, secrets management, Unity Catalog notes
- Delta Lake → Neo4j ingestion pipeline pattern
- PySpark and Scala code examples

**Version / Compatibility:**
- Connector: `5.4.2_for_spark_3` (Scala 2.12 or 2.13)
- Spark: 3.3, 3.4, 3.5
- Databricks Runtime: 12.2, 13.3, 14.3 LTS
- Neo4j: 4.4, 5.x, 2025.x

**Not covered:**
- Cypher query authoring → `neo4j-cypher-skill`
- Neo4j Python bolt driver → `neo4j-driver-python-skill`
- GDS graph algorithms → `neo4j-gds-skill`
- Spring Boot + Neo4j → `neo4j-spring-data-skill`

**Install:**
```bash
npx skills add https://github.com/neo4j-contrib/neo4j-skills --skill neo4j-spark-skill
```

Or paste this link into your coding assistant:
https://github.com/neo4j-contrib/neo4j-skills/tree/main/neo4j-spark-skill


# Neo4j Spark Connector — Read Options Reference

Full option reference for `.read.format("org.neo4j.spark.DataSource")`.

## Core Read Options (mutually exclusive — pick one)

| Option | Value | Description |
|--------|-------|-------------|
| `labels` | `:Label` or `:Label1:Label2` | Read nodes with given label(s). Multiple = AND. |
| `query` | Cypher string | Custom MATCH ... RETURN query. Aliases become column names. |
| `relationship` | `REL_TYPE` | Read relationships of given type. Requires source/target label options. |

## Label Read Sub-Options

| Option | Default | Description |
|--------|---------|-------------|
| `node.keys` | — | Comma-separated property names to include as match keys |

## Relationship Read Sub-Options

| Option | Required | Description |
|--------|----------|-------------|
| `relationship.source.labels` | Yes | Colon-prefixed labels of source node `:Label` |
| `relationship.target.labels` | Yes | Colon-prefixed labels of target node `:Label` |

## Query Read Sub-Options

| Option | Description |
|--------|-------------|
| `query.count` | Cypher count query for partition planning (e.g. `MATCH (n:Person) RETURN count(n)`). Avoids full count scan. |

## Partition and Performance Options

| Option | Default | Description |
|--------|---------|-------------|
| `partitions` | `1` | Number of Spark partitions. Connector uses SKIP/LIMIT internally. |
| `batch.size` | `5000` | Rows per partition batch. |
| `schema.flatten.limit` | `10` | Rows sampled for schema inference (no APOC). Increase for heterogeneous nodes. |

## Output Columns

**Label scan result columns:**
- `<id>` — internal Neo4j element ID
- `<labels>` — array of node labels
- One column per node property

**Relationship scan result columns:**
- `<rel.id>` — internal relationship ID
- `<rel.type>` — relationship type string
- `<source.id>`, `<source.labels>`, `source.<prop>` — source node fields
- `<target.id>`, `<target.labels>`, `target.<prop>` — target node fields
- Relationship property columns at top level

## Schema Inference Notes

- Without APOC: samples `schema.flatten.limit` rows to infer types
- With APOC: uses `apoc.meta.nodeTypeProperties` — more accurate
- Map/list properties: flattened into dot-notation columns (e.g. `address.city`)
- Use `query` mode with explicit RETURN types when inference is unreliable

## Examples

### Multi-label AND filter

```python
df = (spark.read.format("org.neo4j.spark.DataSource")
    .option("labels", ":Person:Employee")
    .load())
```

### Cypher with explicit column types

```python
df = (spark.read.format("org.neo4j.spark.DataSource")
    .option("query", """
        MATCH (p:Person)-[r:ACTED_IN]

Install

What is this skill?

Recommended Skills

Journey fit

Is Neo4j Spark Skill safe to install?

SKILL.md

This week for builders

Install

What is this skill?

Recommended Skills

Journey fit

Is Neo4j Spark Skill safe to install?

SKILL.md