Synapse Migration

Name: Synapse Migration
Author: microsoft

microsoft/skills-for-fabric

85 installs
886 repo stars
Updated July 23, 2026
microsoft/skills-for-fabric

synapse-migration is an agent skill for porting Azure Synapse Analytics Spark workloads to Microsoft Fabric via API-driven migration.

About

The synapse-migration skill ports Azure Synapse Analytics Spark workloads to Microsoft Fabric through API-driven migration workflows without requiring the UI Migration Assistant. It translates mssparkutils calls to notebookutils including the env to runtime namespace change, replaces Linked Services with Fabric Data Connections and OneLake Shortcuts, and migrates Spark Pools, Lake Databases, Notebooks, and Spark Job Definitions across phased orchestration. Authentication uses distinct token audiences for Synapse ARM management plane, Synapse data plane, and Fabric api.fabric.microsoft.com control plane with az rest patterns documented per phase. Resource files cover spark pool to environment mapping, lake database to lakehouse conversion, external HMS migration, connector refactoring for Kusto and Cosmos, library compatibility against Fabric runtime 1.3, validation testing, security governance, and migration report generation with portal links. Context loading guidance instructs agents to read only the phase-specific resource file instead of loading all references upfront. Triggers include migrate from synapse, synapse to fabric, mssparkutils to notebookutils, port synapse noteboo.

Ports Synapse Spark workloads to Fabric via REST-driven phases.
Maps mssparkutils to notebookutils with namespace migration guidance.
Replaces Linked Services with Data Connections and OneLake Shortcuts.
Covers pools, lake databases, notebooks, and Spark Job Definitions.
Loads phase-specific resource files instead of all docs upfront.

Synapse Migration by the numbers

85 all-time installs (skills.sh)
Ranked #593 of 1,041 Cloud & Infrastructure skills by installs in the Skillselion catalog
Security screen: LOW risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

At a glance

synapse-migration capabilities & compatibility

Capabilities: phased synapse to fabric migration orchestration · mssparkutils to notebookutils code refactoring · linked service to data connection and shortcut m · spark pool, lakehouse, notebook, and sjd migrati · validation testing and migration report generati
Works with: azure · databricks
Use cases: data analysis · devops

From the docs

What synapse-migration says it does

Port Azure Synapse Analytics Spark workloads to Microsoft Fabric.

SKILL.md

Linked Services have no direct REST API equivalent in Fabric

SKILL.md

npx skills add https://github.com/microsoft/skills-for-fabric --skill synapse-migration

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/microsoft/skills-for-fabric/synapse-migration.svg)](https://skillselion.com/skills/microsoft/skills-for-fabric/synapse-migration)

Installs	85
repo stars	★ 886
Security audit	3 / 3 scanners passed
Last updated	July 23, 2026
Repository	microsoft/skills-for-fabric ↗

How do I migrate Synapse Spark notebooks, pools, and linked services to Fabric programmatically?

Port Azure Synapse Analytics Spark workloads to Fabric by replacing mssparkutils, linked services, and Spark items via REST APIs.

Who is it for?

Teams moving Synapse Spark estates to Fabric who need scripted pool, lakehouse, and notebook migration.

Skip if: Skip for greenfield Fabric authoring, SQL dedicated pool-only migrations, or Synapse SQL analytics without Spark.

When should I use this skill?

User asks to migrate Synapse to Fabric, replace mssparkutils, or port Synapse notebooks and Spark jobs.

What you get

Migrated Fabric items with refactored notebookutils code, replaced connectivity, validation results, and a migration report.

Files

SKILL.mdMarkdownGitHub ↗

Update Check — ONCE PER SESSION (mandatory)

The first time this skill is used in a session, run the check-updates skill before proceeding.

- GitHub Copilot CLI / VS Code: invoke the check-updates skill.

- Claude Code / Cowork / Cursor / Windsurf / Codex: compare local vs remote package.json version.

- Skip if the check was already performed earlier in this session.

CRITICAL NOTES

1. To find workspace details (including its ID) from a workspace name: list all workspaces, then use JMESPath filtering

2. To find item details (including its ID) from workspace ID, item type, and item name: list all items of that type in that workspace, then use JMESPath filtering

3. mssparkutils and notebookutils share the same API surface in most cases — the namespace is the primary change

4. Linked Services have no direct REST API equivalent in Fabric — they are replaced by Data Connections (for external sources) and OneLake Shortcuts (for storage mounts)

Synapse Analytics → Microsoft Fabric Migration

Prerequisite Knowledge

These companion documents provide general Fabric REST patterns. Do NOT read them upfront — reference only when a specific phase requires a pattern not already covered in this skill's resource files:

COMMON-CORE.md — General Fabric REST API patterns, authentication & token audiences, item discovery via JMESPath
COMMON-CLI.md — az rest / az login CLI patterns, authentication recipes
SPARK-AUTHORING-CORE.md — Notebook/lakehouse creation (already covered in spark-item-migration.md and lake-database-migration.md)
SQLDW-AUTHORING-CORE.md — Fabric Warehouse T-SQL (delegate to sqldw-authoring-cli skill)

Auth, API endpoints, and item payloads are fully documented in this skill's own files. The common docs above are fallback references only.

---

Topic	Reference
Migration Orchestrator	migration-orchestrator.md
API-Driven Migration Workflow	§ API-Driven Migration Workflow
Migration Workload Map	§ Migration Workload Map
Spark Pool → Environment Migration	spark-pool-migration.md
Lake Database → Lakehouse Migration	lake-database-migration.md
External Hive Metastore → Lakehouse Migration	external-hms-migration.md
Notebook & SJD Migration	spark-item-migration.md
Library Compatibility (Synapse vs. Fabric RT 1.3)	library-compatibility.md
Connector Refactoring (Kusto, Cosmos DB, ADLS OAuth)	connector-refactoring.md
`mssparkutils` → `notebookutils` API Mapping	utility-api-mapping.md
Linked Services → Data Connections / Shortcuts	connectivity-migration.md
Before/After Code Patterns (incl. Catalog API gaps)	code-patterns.md
Migration Report (with Fabric portal links)	migration-report.md
Migration Troubleshooting Guide	migration-gotchas.md
Validation & Testing	validation-testing.md
Security & Governance (Production Readiness)	security-governance.md
T-SQL & Spark Configuration Differences	§ T-SQL & Spark Configuration Differences
Capacity Sizing Reference	§ Capacity Sizing Reference
Must / Prefer / Avoid	§ Must / Prefer / Avoid
Feature Parity Reference	§ Feature Parity Reference
Migration Gotchas — Quick Reference	§ Migration Gotchas + migration-gotchas.md
Post-Migration: What's Next	§ Post-Migration: What's Next

Context Loading Guide

IMPORTANT — Load only what you need. Do NOT read all resource files upfront. Load the specific file for the phase you are executing:

When	Read This File	Lines
User asks to migrate a workspace (full orchestration)	migration-orchestrator.md	~1264
Phase 0: Spark Pools → Environments	spark-pool-migration.md	~290
Phase 1: Databases → Lakehouses (built-in HMS)	lake-database-migration.md	~574
Phase 1: Databases → Lakehouses (external HMS)	external-hms-migration.md	~388
Phase 2–3: Notebooks & SJDs	spark-item-migration.md	~326
Code refactoring (mssparkutils, connectors)	utility-api-mapping.md + connector-refactoring.md + code-patterns.md	~588
Post-migration validation	validation-testing.md	~487
Troubleshooting failures	migration-gotchas.md	~225
Production security setup	security-governance.md	~926
Library version gaps	library-compatibility.md	~106
Generating migration report	migration-report.md	~360
Capacity sizing & SKU planning	capacity-sizing.md	~85
Feature parity matrix	feature-parity.md	~65

---

API-Driven Migration Workflow

This skill supports programmatic migration of Synapse Spark items via REST APIs (no UI-based Migration Assistant required).

Authentication

Target	Token Audience
Synapse ARM (management plane)	`https://management.azure.com`
Synapse Data Plane	`https://dev.azuresynapse.net`
Fabric REST API	`https://api.fabric.microsoft.com`

Use the token-acquisition recipe in COMMON-CLI § Authentication Recipes with the audiences above.

Migration Phases (Execute in Order)

Phase	Synapse Source	Fabric Target	Resource
Phase 0	Spark Pool	Environment	spark-pool-migration.md
Phase 1	Lake Database (built-in HMS)	Lakehouse	lake-database-migration.md
Phase 1	External Hive Metastore	Lakehouse	external-hms-migration.md
Phase 1b	Ad-hoc `abfss://` storage paths	OneLake Shortcuts	migration-orchestrator.md (migrate-and-modernize only)
Phase 2	Notebooks	Notebook	spark-item-migration.md
Phase 3	Spark Job Definitions	SJD	spark-item-migration.md
Final	Validation & Testing	—	validation-testing.md
Optional	Security & Governance	—	security-governance.md

Phase order matters: Environments (Phase 0) must exist before notebooks/SJDs can bind to them. Lakehouses (Phase 1) must exist before notebooks can bind to them (Phase 2).

For the full execution flow with sub-steps, decision points, lift-and-shift vs. modernize paths, and error recovery, see migration-orchestrator.md.

REST API Quick Reference

All Synapse and Fabric API endpoints with request/response examples are in migration-orchestrator.md (Steps 2a–2e). Authentication tokens:

Target	Token Audience
Synapse ARM	`https://management.azure.com`
Synapse Data Plane	`https://dev.azuresynapse.net`
Fabric REST API	`https://api.fabric.microsoft.com`

API docs: Synapse ARM · Synapse Data Plane · Fabric Items · Fabric Shortcuts · Fabric Connections · Fabric Environments

---

Migration Workload Map

Use this table to determine the correct Fabric target for each Synapse component:

Synapse Component	Fabric Target	Notes
Spark Pool (notebooks, jobs)	Fabric Spark (Lakehouse / Notebooks / SJD)	Starter Pool replaces on-demand pools for most workloads
Dedicated SQL Pool	Fabric Warehouse	T-SQL surface area differences apply — see § T-SQL & Spark Configuration Differences. Procedural migration guide not yet available — separate migration track. For T-SQL authoring, delegate to `sqldw-authoring-cli`.
Serverless SQL Pool	Lakehouse SQL Endpoint	Read-only Delta/Parquet queries; no DDL required
Synapse Pipelines	Fabric Data Pipelines	Activity types, triggers, and expressions are broadly compatible. Pipeline migration resource not yet available — separate migration track.
Synapse Link for Cosmos DB / SQL	Fabric Mirroring	Native mirroring replaces the Synapse Link connector pattern. Not covered by this skill.
Linked Services	Data Connections (external) / OneLake Shortcuts (storage)	See connectivity-migration.md
Integration Datasets	Fabric Pipeline source/sink config	Dataset definitions are inlined into pipeline activities in Fabric. Not covered by this skill.
Managed Virtual Networks	Fabric Managed Private Endpoints	Configure in Fabric capacity settings
Synapse Studio	Fabric workspace	All artifact types live in a single workspace with Git integration

Decision Tree: Which Fabric Spark Workload?

Synapse Spark workload
├── Interactive notebook with data exploration → Fabric Notebook (attached to Lakehouse)
├── Scheduled/production job → Spark Job Definition (SJD)
├── T-SQL over files/Delta → Lakehouse SQL Endpoint (no migration needed — just point to OneLake)
└── Real-time ingest → Fabric Eventstream + Lakehouse

---

T-SQL & Spark Configuration Differences

For detailed T-SQL surface area gaps (PolyBase → COPY INTO, distribution hints, result set caching) and Spark configuration mappings (pools, %%configure, runtime versions), see feature-parity.md.

Key actions: Remove DISTRIBUTION = HASH(col) hints, replace CREATE EXTERNAL TABLE with COPY INTO, replace spark.read.synapsesql() with OneLake shortcuts or JDBC. Delegate T-SQL authoring to sqldw-authoring-cli.

---

Capacity Sizing Reference

For Synapse pool → Fabric SKU mapping tables, sizing decision guide, and cost model comparison, see capacity-sizing.md.

Quick guide: Dev/test = F8–F16 with Starter Pool; standard production = F32–F64; enterprise = F128+. Use Fabric Trial (free F64, 60 days) for migration validation.

---

Must / Prefer / Avoid

MUST DO

Replace all `mssparkutils` imports with `notebookutils` — see utility-api-mapping.md for the complete namespace table
Replace all Linked Services with Fabric Data Connections (for external databases/services) or OneLake Shortcuts (for ADLS Gen2 / Blob storage mounts) — see connectivity-migration.md
Replace `spark.read.synapsesql()` with Lakehouse shortcut reads or JDBC connections to the Fabric Warehouse SQL endpoint
Re-test all notebooks after migration against the target Fabric Runtime version — Spark minor version differences can surface deprecated API warnings
Externalize all workspace/item IDs — never hardcode; use pipeline parameters or Variable Libraries
Replace pool-level library installs with Fabric Environments attached at the workspace or notebook level

PREFER

OneLake Shortcuts over full data copies — mount existing ADLS Gen2 containers as shortcuts rather than re-ingesting data during migration
Fabric Starter Pool for dev/test migrations — eliminates pool warm-up wait time inherent in Synapse on-demand pools
Lakehouse SQL Endpoint as a drop-in for Serverless SQL Pool reads — point existing consumers at the endpoint with minimal query changes
Medallion architecture for migrated data — align with Bronze/Silver/Gold patterns (see e2e-medallion-architecture skill)
Incremental migration — migrate and validate workload by workload rather than performing a big-bang cutover
Parameterized notebooks to allow environment promotion (dev → test → prod) without code changes

AVOID

Do not copy-paste PolyBase `CREATE EXTERNAL TABLE` DDL into Fabric Warehouse — rewrite as COPY INTO or use Lakehouse for external data access
Do not assume Synapse Linked Service connection strings are reusable — credentials and endpoints must be reconfigured as Fabric Data Connections
Do not install libraries in notebook cells (%pip install at runtime) for production workloads — use Fabric Environments for reproducible, versioned library management
Do not migrate Dedicated SQL Pool distribution hints (HASH, ROUND_ROBIN, REPLICATE) verbatim — remove them; Fabric Warehouse handles distribution automatically
Do not use `wasb://` or `abfss://container@storageaccount.dfs.core.windows.net/` paths as primary data paths — migrate data access to OneLake abfss://workspace@onelake.dfs.fabric.microsoft.com/ paths

---

Examples

See code-patterns.md for full before/after examples. Key quick references:

`mssparkutils.env` → `notebookutils.runtime`

# Synapse
workspace = mssparkutils.env.getWorkspaceName()

# Fabric
workspace = notebookutils.runtime.context["workspaceName"]

Linked Service credential → Key Vault secret

# Synapse
conn = mssparkutils.credentials.getConnectionStringOrCreds("MyLinkedService")

# Fabric
conn = notebookutils.credentials.getSecret("https://myvault.vault.azure.net/", "my-secret")

Dedicated SQL Pool DDL → Fabric Warehouse DDL

-- Synapse (remove distribution hints)
CREATE TABLE dbo.Fact (...) WITH (DISTRIBUTION = HASH(id), CLUSTERED COLUMNSTORE INDEX);

-- Fabric Warehouse
CREATE TABLE dbo.Fact (...);

---

Feature Parity Reference

Full Synapse → Fabric feature matrix (28 features), T-SQL surface area gaps, and Spark configuration differences are in feature-parity.md.

Key gaps (⚠️/❌): spark.read.synapsesql() replaced by JDBC/shortcuts · Linked Services redesigned as Data Connections/Shortcuts · External HMS partial (migrate as shortcuts) · mssparkutils.env renamed to notebookutils.runtime · Result set caching ❌ · Workload management ❌ · PolyBase → COPY INTO

---

Migration Gotchas — Quick Reference

The full troubleshooting guide with code examples and multi-option resolutions is in migration-gotchas.md. This summary surfaces the key issues for quick scanning during migration:

#	Flag ID	Issue	Severity	Blocks?	Resolution Summary
G1	`SYNAPSESQL_NO_EQUIVALENT`	`spark.read.synapsesql()` has no Fabric equivalent	High	Yes	Replace with OneLake shortcut read, Warehouse JDBC, or Data Pipeline
G2	`LIBRARY_VERSION_CONFLICT`	Custom library version conflicts with Fabric Runtime	Medium	Maybe	Pin compatible version in Environment, or find Fabric-native alternative
G3	`DELTA_PROTOCOL_MISMATCH`	Delta protocol version incompatibility	High	Yes	Rewrite table with matching protocol (`delta.minReaderVersion`/`minWriterVersion`)
G4	`SECURITY_MODEL_INCOMPATIBLE`	Synapse managed identity / IP firewall not portable	Medium	Yes	Reconfigure as Workspace Identity + Fabric Managed Private Endpoints
G5	`GPU_POOL_UNSUPPORTED`	GPU-accelerated Spark pools not available in Fabric	High	Yes	Migration blocker — keep workload in Synapse or use Azure ML
G6	`DOTNET_SPARK_UNSUPPORTED`	.NET for Spark (C#/F# SJDs) not supported	High	Yes	Migration blocker — rewrite in PySpark or keep in Synapse
G7	`NULLABLE_POOL_REFERENCE`	`bigDataPool`/`targetBigDataPool` field is `null` (not missing) — causes `NoneType` crash	Medium	No	Use `(x.get("bigDataPool") or {}).get(...)` pattern
G8	`SESSION_CONFIG_IGNORED`	Some `%%configure` keys silently ignored in Fabric	Low	No	Remove unsupported keys; use Environment for pool-level config
G9	`SHORTCUT_CONNECTION_FAILED`	ADLS shortcut creation fails (connection/permission)	High	Partial	Verify connection credential type (Key > WorkspaceIdentity > OAuth2) and RBAC

---

Post-Migration: What's Next

After completing Phases 0–3 and validation, hand off to these companion skills for ongoing operations:

Agentic Exploration Workflow

Once data has landed in Fabric Lakehouses, use this sequence to validate and explore:

1. Discover → List schemas, tables, and row counts via Lakehouse SQL Endpoint (sqldw-consumption-cli) 2. Sample → SELECT TOP 5 on migrated tables to verify data integrity 3. Validate → Run validation checks from validation-testing.md (V1–V6) 4. Explore → Write Spark or T-SQL queries against migrated data using spark-consumption-cli or sqldw-consumption-cli 5. Build → Create Gold-layer aggregations with e2e-medallion-architecture (Bronze → Silver → Gold) 6. Consume → Build semantic models and reports with semantic-model-authoring

Companion Skill Cross-References

Post-Migration Task	Skill	When to Use
Interactive Lakehouse SQL queries	`sqldw-consumption-cli`	Exploring migrated data via SQL Endpoint
Interactive PySpark exploration	`spark-consumption-cli`	Ad-hoc Spark queries on migrated Lakehouses
Notebook & SJD authoring (new)	`spark-authoring-cli`	Creating new Spark items post-migration
Medallion architecture build-out	`e2e-medallion-architecture`	Structuring Bronze/Silver/Gold after lift-and-shift
Warehouse performance monitoring	`sqldw-operations-cli`	Diagnosing slow queries on Fabric Warehouse
Semantic model creation	`semantic-model-authoring`	Building Power BI models over migrated data
Report consumption & DAX	`semantic-model-consumption`	Querying existing semantic models
KQL analytics	`eventhouse-authoring-cli` / `eventhouse-consumption-cli`	If migrating real-time workloads to Eventhouse

Variable Library for Environment Promotion

After migration, avoid hardcoded workspace/item IDs by centralizing configuration in a Variable Library item:

# Read config from Variable Library — works in notebooks
lib = notebookutils.variableLibrary.getLibrary("MigrationConfig")
lakehouse_name = lib.lakehouse_name
workspace_id = lib.workspace_id

# ❌ WRONG — .get() does not exist
# notebookutils.variableLibrary.get("MigrationConfig", "lakehouse_name")

Use Value Sets (valueSets/dev.json, valueSets/prod.json) to promote across environments without code changes
Boolean values are returned as strings — compare with .lower() == "true", not bool()
In Data Pipelines, reference via @pipeline().libraryVariables.<name> (not @variables())
Full Variable Library patterns → see common/notebook-authoring/context-and-params.md § Variable Library

Capacity Sizing Reference

Use this table to estimate the Fabric capacity SKU needed based on current Synapse Spark pool configuration. This is a planning reference only — the migration workflow operates against whatever capacity is already assigned to the target workspace.

Note: Fabric capacity is shared across all workload types (Spark, Warehouse, Power BI, Pipelines) in the workspace. If other workloads run on the same capacity, size up accordingly.

Synapse Spark Pool → Fabric Capacity Mapping

Synapse Pool Config	Total vCores	Typical Workload	Recommended Fabric SKU	Fabric CUs	Spark vCores Available
3 Small nodes (4 vCore / 32 GB)	12	Dev/test, small datasets (<1 GB)	F8 (dev) or F16	8 / 16	8 / 16
3–5 Medium nodes (8 vCore / 64 GB)	24–40	Standard analytics, medium datasets	F32	32	32
5–10 Medium nodes (8 vCore / 64 GB)	40–80	Production ETL, multiple concurrent jobs	F64	64	64
3–10 Large nodes (16 vCore / 128 GB)	48–160	Heavy ETL, large datasets (10+ GB)	F64 or F128	64 / 128	64 / 128
10–20 Large nodes (16 vCore / 128 GB)	160–320	Enterprise workloads, many concurrent jobs	F128 or F256	128 / 256	128 / 256
XL/XXL nodes or 20+ nodes	200+	Large-scale data engineering	F256+	256+	256+

Fabric Capacity SKU Quick Reference

SKU	Capacity Units (CUs)	Max Spark vCores	Max Concurrent Spark Jobs	Burst	Typical Use
F2	2	2	1	Smoothed	Sandbox/POC only
F4	4	4	1	Smoothed	Individual developer
F8	8	8	1	Smoothed	Dev/test
F16	16	16	1–2	Smoothed	Small team dev
F32	32	32	2–4	Smoothed	Small production
F64	64	64	4–8	Smoothed	Standard production
F128	128	128	8–16	Smoothed	Enterprise production
F256	256	256	16–32	Smoothed	Large enterprise
F512+	512+	512+	32+	Smoothed	Enterprise-scale data engineering

Spark vCores and concurrent job limits are approximate and depend on node sizes selected in Custom Pools and current burst utilization. Fabric uses burst and smoothing — short spikes can exceed the CU baseline, but sustained usage is throttled to the SKU limit.

Sizing Decision Guide

Factor	How to Assess	Impact on SKU
Peak concurrent notebooks/SJDs	Count max parallel jobs during Synapse peak hours	More concurrency → larger SKU
Largest single-job resource need	Check Synapse executor/driver memory configs	Large executors → need enough CUs to allocate them
Data volume per job	Measure typical input dataset sizes	>10 GB per job → F64+; >100 GB → F128+
Shared capacity with other workloads	Will Warehouse / Power BI / Pipelines share this capacity?	Shared → size up 1–2 tiers
Burst vs. sustained	Is Spark usage spiky (batch ETL) or continuous?	Spiky → can use smaller SKU with burst; sustained → size for peak
Dev vs. production	Dev can use Starter Pool on F8; prod needs Custom Pool	Dev = F8–F16; Prod = F32+

Cost Model Comparison

Aspect	Synapse Spark	Fabric Spark
Billing unit	Per-node, per-minute (when pool is active)	Per-capacity, per-hour (always-on or paused)
Idle cost	Zero (auto-pause after timeout)	Capacity cost continues unless paused/deallocated
Scale model	Node count autoscale (min–max per pool)	Capacity SKU (fixed CUs, burst smoothing)
Pause/resume	Auto-pause per pool (minutes granularity)	Capacity pause/resume (via Portal or REST API)
Reservation pricing	Azure Reserved Instances (1yr/3yr)	Fabric capacity reservations (1yr)
Trial	N/A	Fabric Trial capacity (F64 equivalent, 60 days)

Cost tip: For dev/test migrations, use a Fabric Trial capacity (free F64 for 60 days) or F8 with pause/resume to minimize cost during the migration validation period. Scale up for production.

Synapse → Fabric Code Patterns

Before/after examples for common Synapse Analytics → Microsoft Fabric migration scenarios.

---

Spark Notebook: Import and Session Setup

# BEFORE — Synapse notebook header
from notebookutils import mssparkutils
from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()
sc = spark.sparkContext

# AFTER — Fabric notebook header (nothing to import or initialize)
# spark, sc, and notebookutils are pre-instantiated in every Fabric notebook
# No imports required

---

Reading Data: ADLS Path → OneLake Path

# BEFORE — Synapse: read from ADLS Gen2 via linked service auth
df = spark.read.format("delta") \
    .load("abfss://silver@mystorageaccount.dfs.core.windows.net/customers/")

# AFTER — Fabric: read from OneLake (after creating a shortcut or writing data to Lakehouse)
df = spark.read.format("delta") \
    .load("abfss://MyWorkspace@onelake.dfs.fabric.microsoft.com/SilverLakehouse.Lakehouse/Tables/customers")

# OR use relative path (when notebook has Lakehouse attached as default)
df = spark.read.format("delta").load("Tables/customers")

---

Writing Data to Delta Lake

# BEFORE — Synapse: write to ADLS Gen2
df.write.format("delta") \
    .mode("overwrite") \
    .save("abfss://gold@mystorageaccount.dfs.core.windows.net/summary/")

# AFTER — Fabric: write to Lakehouse Tables (managed Delta)
df.write.format("delta") \
    .mode("overwrite") \
    .saveAsTable("gold_summary")  # Writes to attached Lakehouse Tables/gold_summary

# Or explicit OneLake path
df.write.format("delta") \
    .mode("overwrite") \
    .save("Tables/gold_summary")

---

Credentials: Linked Service → Key Vault Secret

# BEFORE — Synapse: read connection string from Key Vault Linked Service
conn_str = mssparkutils.credentials.getConnectionStringOrCreds("AzureSQL_LinkedService")

jdbc_url = f"jdbc:sqlserver://myserver.database.windows.net;databaseName=mydb;password={conn_str}"

# AFTER — Fabric: read secret from Key Vault directly
password = notebookutils.credentials.getSecret(
    "https://mykeyvault.vault.azure.net/",
    "sql-password"
)

token = notebookutils.credentials.getToken("https://database.windows.net/")
jdbc_url = "jdbc:sqlserver://myserver.database.windows.net;databaseName=mydb;encrypt=true"

df = spark.read.format("jdbc") \
    .option("url", jdbc_url) \
    .option("accessToken", token) \
    .option("dbtable", "dbo.Customers") \
    .load()

---

Environment Context

# BEFORE — Synapse: read job/workspace context
workspace = mssparkutils.env.getWorkspaceName()
job_id = mssparkutils.env.getJobId()

# AFTER — Fabric: read from runtime context dict
ctx = notebookutils.runtime.context
workspace = ctx["workspaceName"]
job_id = ctx["jobId"]
workspace_id = ctx["workspaceId"]

---

Child Notebook Execution

# BEFORE — Synapse
result = mssparkutils.notebook.run(
    "silver_transform",
    timeout=600,
    arguments={"input_table": "bronze_orders", "batch_date": "2024-01-01"}
)

# AFTER — Fabric (identical API)
result = notebookutils.notebook.run(
    "silver_transform",
    timeout=600,
    arguments={"input_table": "bronze_orders", "batch_date": "2024-01-01"}
)

---

Dedicated SQL Pool DDL → Fabric Warehouse

-- BEFORE — Synapse Dedicated SQL Pool
CREATE TABLE dbo.FactSales (
    SaleID INT NOT NULL,
    CustomerID INT,
    SaleDate DATE,
    Amount DECIMAL(18,2)
)
WITH (
    DISTRIBUTION = HASH(CustomerID),
    CLUSTERED COLUMNSTORE INDEX
);

-- AFTER — Fabric Warehouse (remove distribution hints; auto-managed)
CREATE TABLE dbo.FactSales (
    SaleID INT NOT NULL,
    CustomerID INT,
    SaleDate DATE,
    Amount DECIMAL(18,2)
);
-- Note: Fabric Warehouse uses Delta-backed storage with automatic distribution

---

Bulk Load: PolyBase → COPY INTO

-- BEFORE — Synapse: PolyBase external table + INSERT
CREATE EXTERNAL DATA SOURCE adls_source
    WITH (TYPE = HADOOP, LOCATION = 'abfss://raw@mystorageaccount.dfs.core.windows.net/');

CREATE EXTERNAL TABLE dbo.ext_StagingOrders (...)
    WITH (DATA_SOURCE = adls_source, LOCATION = '/orders/2024/', FILE_FORMAT = CsvFormat);

INSERT INTO dbo.FactOrders SELECT * FROM dbo.ext_StagingOrders;

-- AFTER — Fabric Warehouse: COPY INTO from OneLake
COPY INTO dbo.FactOrders
FROM 'https://onelake.dfs.fabric.microsoft.com/<workspace>/<lakehouse>.Lakehouse/Files/orders/2024/'
WITH (
    FILE_TYPE = 'CSV',
    FIRSTROW = 2,
    FIELDTERMINATOR = ',',
    ROWTERMINATOR = '\n'
);

---

File System Operations

# BEFORE — Synapse
files = mssparkutils.fs.ls("abfss://raw@mystorageaccount.dfs.core.windows.net/incoming/")
for f in files:
    mssparkutils.fs.cp(f.path, f"abfss://archive@mystorageaccount.dfs.core.windows.net/{f.name}")

# AFTER — Fabric
files = notebookutils.fs.ls("Files/incoming/")
for f in files:
    notebookutils.fs.cp(f.path, f"Files/archive/{f.name}")

---

Spark Catalog API — Unsupported Methods

Several spark.catalog methods are not supported in Fabric and will throw AnalysisException. Replace with Spark SQL equivalents.

Safe methods (no change needed): spark.catalog.createTable(), tableExists(), listTables(), listColumns(), dropTempView(), cacheTable() all work normally in Fabric. Only database-level and function-level methods require refactoring.

Database Methods

# ❌ BEFORE — Synapse: list databases
dbs = spark.catalog.listDatabases()
for db in dbs:
    print(db.name)

# ✅ AFTER — Fabric: use Spark SQL
spark.sql("SHOW DATABASES").show()
# or collect as list
dbs = [row.namespace for row in spark.sql("SHOW DATABASES").collect()]

# ❌ BEFORE — Synapse: get current database
current = spark.catalog.currentDatabase()

# ✅ AFTER — Fabric: use Spark SQL
current = spark.sql("SELECT CURRENT_DATABASE()").first()["current_database()"]

# ❌ BEFORE — Synapse: describe a database
db_info = spark.catalog.getDatabase("sales_db")
print(db_info.locationUri)

# ✅ AFTER — Fabric: use DESCRIBE DATABASE
spark.sql("DESCRIBE DATABASE sales_db").show()
# For extended info:
spark.sql("DESCRIBE DATABASE EXTENDED sales_db").show()

Function Methods

# ❌ BEFORE — Synapse: list functions
funcs = spark.catalog.listFunctions()

# ✅ AFTER — Fabric: NOT SUPPORTED — remove or replace
# If listing built-in functions is needed:
spark.sql("SHOW FUNCTIONS").show()

# ❌ BEFORE — Synapse: register function
spark.catalog.registerFunction("double_it", lambda x: x * 2)

# ✅ AFTER — Fabric: use spark.udf.register()
from pyspark.sql.types import IntegerType
spark.udf.register("double_it", lambda x: x * 2, IntegerType())

# ❌ BEFORE — Synapse: check if function exists
if spark.catalog.functionExists("double_it"):
    df = spark.sql("SELECT double_it(value) FROM t")

# ✅ AFTER — Fabric: NOT SUPPORTED — remove check or use try/except
# Option A: just call it (will fail at runtime if not registered)
df = spark.sql("SELECT double_it(value) FROM t")

# Option B: search SHOW FUNCTIONS output
func_exists = len(spark.sql("SHOW USER FUNCTIONS").filter("function = 'double_it'").collect()) > 0

Quick Reference Table

Synapse `spark.catalog` Method	Fabric Replacement	Notes
`listDatabases()`	`spark.sql("SHOW DATABASES")`	Returns DataFrame
`currentDatabase()`	`spark.sql("SELECT CURRENT_DATABASE()")`	Returns single-row DataFrame
`getDatabase(name)`	`spark.sql(f"DESCRIBE DATABASE {name}")`	Returns metadata DataFrame
`setCurrentDatabase(name)`	`spark.sql(f"USE {name}")`	Works in both — no change needed
`listFunctions()`	`spark.sql("SHOW FUNCTIONS")`
`registerFunction(name, fn)`	`spark.udf.register(name, fn, returnType)`	Must specify return type
`functionExists(name)`	`spark.sql("SHOW USER FUNCTIONS").filter(...)`	Manual check

---

Spark Configuration (`%%configure`)

# BEFORE — Synapse: configure Spark session via magic
%%configure
{
    "conf": {
        "spark.executor.memory": "8g",
        "spark.executor.cores": 4,
        "spark.sql.shuffle.partitions": 200
    }
}

# AFTER — Fabric: identical magic cell syntax (no change required)
%%configure
{
    "conf": {
        "spark.executor.memory": "8g",
        "spark.executor.cores": 4,
        "spark.sql.shuffle.partitions": 200
    }
}

Synapse Connectivity Migration — Linked Services → Fabric Data Connections & OneLake Shortcuts

Reference for migrating Synapse Analytics connectivity patterns to Microsoft Fabric.

---

Decision Guide: What Replaces a Linked Service?

Synapse Linked Service Type	Fabric Replacement	When to Use
Azure Data Lake Storage Gen2	OneLake Shortcut (ADLS Gen2 shortcut)	Primary pattern — mount existing storage as a Lakehouse shortcut; no data copy
Azure Blob Storage	OneLake Shortcut (Azure Blob shortcut)	Same as ADLS — shortcut avoids re-ingestion
Azure SQL Database	Fabric Data Connection (SQL auth or Entra ID)	Fabric notebooks and pipelines connect via JDBC or Copy activity
Azure SQL Managed Instance	Fabric Data Connection (SQL)	JDBC in notebooks; Copy activity in pipelines
Azure Synapse Analytics (SQL Pool)	Not needed post-migration	Source becomes Fabric Warehouse
Azure Cosmos DB	Fabric Data Connection (Cosmos DB connector)	Use Spark connector with notebookutils credential
Azure Event Hubs / Service Bus	Fabric Eventstream or Data Connection	Eventstream for real-time; connection for batch
REST / HTTP	Web activity in Fabric Pipelines	For pipeline REST calls
Key Vault (for secrets)	`notebookutils.credentials.getSecret(keyVaultUrl, name)`	Direct Key Vault SDK call — no connection needed
On-premises SQL (via IR)	On-premises Data Gateway + Fabric Data Connection	Configure gateway in Fabric capacity settings

---

OneLake Shortcut: Replacing ADLS Gen2 Linked Services

This is the most common migration — Synapse often uses ADLS Gen2 linked services to access raw data.

In Fabric Portal (UI)

1. Open the target Lakehouse → Files section 2. Select New shortcut → Azure Data Lake Storage Gen2 3. Provide the ADLS Gen2 URL and authentication (Organizational account / Service Principal / SAS) 4. The shortcut appears under Files/ and is accessible via abfss:// OneLake paths

Via REST API

# Create a OneLake Shortcut to ADLS Gen2
WORKSPACE_ID="<workspace_id>"
LAKEHOUSE_ID="<lakehouse_id>"
TOKEN=$(az account get-access-token --resource https://api.fabric.microsoft.com --query accessToken -o tsv)

az rest --method POST \
  --url "https://api.fabric.microsoft.com/v1/workspaces/${WORKSPACE_ID}/items/${LAKEHOUSE_ID}/shortcuts" \
  --headers "Authorization=Bearer ${TOKEN}" "Content-Type=application/json" \
  --body '{
    "name": "raw_data",
    "path": "Files",
    "target": {
      "type": "AdlsGen2",
      "adlsGen2": {
        "location": "https://<storageaccount>.dfs.core.windows.net",
        "subpath": "/<container>/<folder>",
        "connectionId": "<connection-id>"
      }
    }
  }'

Accessing Shortcut Data in Notebooks

# Synapse — direct ADLS path via Linked Service auth
df = spark.read.parquet("abfss://container@storageaccount.dfs.core.windows.net/path/")

# Fabric — OneLake path after shortcut creation
df = spark.read.parquet("abfss://workspacename@onelake.dfs.fabric.microsoft.com/lakehouse.Lakehouse/Files/raw_data/path/")

# Or using relative path (within notebook's attached Lakehouse)
df = spark.read.parquet("Files/raw_data/path/")

---

Fabric Data Connection: Replacing External Database Linked Services

For Synapse linked services pointing to external databases (Azure SQL, Cosmos DB, etc.), create a Fabric Data Connection.

Creating a Data Connection (Portal)

1. Navigate to Fabric workspace → New → Connection 2. Choose the connector type (SQL Server, Azure SQL, etc.) 3. Configure host, database, credentials (Entra ID or username/password from Key Vault) 4. Save and reference by name from notebooks or pipelines

Using a Data Connection in a Notebook

# Read from external SQL via notebookutils connection token
conn_token = notebookutils.connection.getConnectionToken("AzureSQL_MyDB")

jdbc_url = "jdbc:sqlserver://<server>.database.windows.net;databaseName=<db>;encrypt=true"
df = spark.read.format("jdbc") \
    .option("url", jdbc_url) \
    .option("dbtable", "dbo.MyTable") \
    .option("accessToken", conn_token) \
    .load()

---

Key Vault Secret Migration

Synapse Linked Services for Key Vault are replaced by direct notebookutils.credentials.getSecret() calls.

# Synapse — Key Vault Linked Service (NOT available in Fabric)
secret = mssparkutils.credentials.getSecret("MyKeyVaultLinkedService", "my-secret")

# Fabric — direct Key Vault call using Entra ID token
secret = notebookutils.credentials.getSecret(
    "https://mykeyvault.vault.azure.net/",
    "my-secret"
)

The Fabric notebook's Managed Identity (or the logged-in user's Entra ID) must have Key Vault Secrets User role on the Key Vault.

---

Integration Runtime → On-Premises Data Gateway

Synapse IR Type	Fabric Equivalent
Azure Integration Runtime (cloud-to-cloud)	Not needed — Fabric Pipelines use managed compute
Self-hosted IR (on-premises connectivity)	On-premises Data Gateway — install on on-prem network, register in Fabric
Azure-SSIS IR	Not directly supported — migrate SSIS packages to Fabric Pipelines or Azure Data Factory

Configure the On-premises Data Gateway in Fabric Admin Portal → Connections and Gateways.

---

Pipeline Connectivity: Synapse Dataset → Fabric Pipeline Source/Sink

In Synapse Pipelines, datasets define the connection + path. In Fabric Pipelines, the connection and path are inlined into the Copy activity.

// Synapse Pipeline Copy Activity (dataset reference)
{
  "source": { "type": "AzureBlobFSSource", "dataset": { "referenceName": "MyADLSDataset" } }
}

// Fabric Pipeline Copy Activity (inline connection)
{
  "source": {
    "type": "DelimitedTextSource",
    "storeSettings": {
      "type": "AzureBlobFSReadSettings",
      "fileSystemName": "container",
      "folderPath": "path/to/data",
      "connectionRef": "MyDataConnection"
    }
  }
}

Connector-Specific Refactoring — Kusto, Cosmos DB, Token Library, ADLS OAuth

Detailed before/after patterns for migrating Synapse connector code to Fabric. These go beyond the general linked-service replacement covered in connectivity-migration.md.

Pre-check: Run the pre-refactoring audit to find affected notebooks. The search patterns below map to specific connectors.

---

Azure Data Explorer (Kusto) Connector

Search pattern: kusto.spark.synapse or spark.synapse.linkedService.*DataExplorer

Reading from Kusto

# ❌ BEFORE — Synapse: Kusto via linked service
kustoDF = (spark.read
    .format("com.microsoft.kusto.spark.synapse.datasource")
    .option("spark.synapse.linkedService", "AzureDataExplorer1")
    .option("kustoCluster", "https://mycluster.kusto.windows.net")
    .option("kustoDatabase", "mydb")
    .option("kustoQuery", "MyTable | take 100")
    .load())

# ✅ AFTER — Fabric: Kusto via access token
kustoDF = (spark.read
    .format("com.microsoft.kusto.spark.synapse.datasource")
    .option("accessToken", notebookutils.credentials.getToken("https://mycluster.kusto.windows.net"))
    .option("kustoCluster", "https://mycluster.kusto.windows.net")
    .option("kustoDatabase", "mydb")
    .option("kustoQuery", "MyTable | take 100")
    .load())

Writing to Kusto

# ❌ BEFORE — Synapse
(df.write
    .format("com.microsoft.kusto.spark.synapse.datasource")
    .option("spark.synapse.linkedService", "AzureDataExplorer1")
    .option("kustoCluster", "https://mycluster.kusto.windows.net")
    .option("kustoDatabase", "mydb")
    .option("kustoTable", "MyTargetTable")
    .option("tableCreateOptions", "CreateIfNotExist")
    .mode("Append")
    .save())

# ✅ AFTER — Fabric
(df.write
    .format("com.microsoft.kusto.spark.synapse.datasource")
    .option("accessToken", notebookutils.credentials.getToken("https://mycluster.kusto.windows.net"))
    .option("kustoCluster", "https://mycluster.kusto.windows.net")
    .option("kustoDatabase", "mydb")
    .option("kustoTable", "MyTargetTable")
    .option("tableCreateOptions", "CreateIfNotExist")
    .mode("Append")
    .save())

Changes: 1. Remove .option("spark.synapse.linkedService", "...") 2. Add .option("accessToken", notebookutils.credentials.getToken("<cluster_url>")) 3. All other options (kustoCluster, kustoDatabase, kustoQuery, kustoTable) remain unchanged

---

Cosmos DB Connector (OLTP)

Search pattern: cosmos.oltp or spark.synapse.linkedService.*Cosmos or getSecretWithLS.*cosmos

Reading from Cosmos DB

# ❌ BEFORE — Synapse: Cosmos DB via linked service
cosmosDF = (spark.read
    .format("cosmos.oltp")
    .option("spark.synapse.linkedService", "CosmosDbLS")
    .option("spark.cosmos.container", "mycontainer")
    .option("spark.cosmos.read.inferSchema.enabled", "true")
    .load())

# ✅ AFTER — Fabric: Cosmos DB via Key Vault secret
cosmos_key = notebookutils.credentials.getSecret(
    "https://mykeyvault.vault.azure.net/", "cosmos-account-key"
)

cosmosDF = (spark.read
    .format("cosmos.oltp")
    .option("spark.cosmos.accountEndpoint", "https://mycosmosaccount.documents.azure.com:443/")
    .option("spark.cosmos.accountKey", cosmos_key)
    .option("spark.cosmos.database", "mydb")
    .option("spark.cosmos.container", "mycontainer")
    .option("spark.cosmos.read.inferSchema.enabled", "true")
    .load())

Writing to Cosmos DB

# ❌ BEFORE — Synapse
(df.write
    .format("cosmos.oltp")
    .option("spark.synapse.linkedService", "CosmosDbLS")
    .option("spark.cosmos.container", "mycontainer")
    .option("spark.cosmos.write.strategy", "ItemOverwrite")
    .mode("Append")
    .save())

# ✅ AFTER — Fabric
cosmos_key = notebookutils.credentials.getSecret(
    "https://mykeyvault.vault.azure.net/", "cosmos-account-key"
)

(df.write
    .format("cosmos.oltp")
    .option("spark.cosmos.accountEndpoint", "https://mycosmosaccount.documents.azure.com:443/")
    .option("spark.cosmos.accountKey", cosmos_key)
    .option("spark.cosmos.database", "mydb")
    .option("spark.cosmos.container", "mycontainer")
    .option("spark.cosmos.write.strategy", "ItemOverwrite")
    .mode("Append")
    .save())

Changes: 1. Remove .option("spark.synapse.linkedService", "...") 2. Add .option("spark.cosmos.accountEndpoint", "...") 3. Retrieve account key from Key Vault: notebookutils.credentials.getSecret(vaultUrl, secretName) 4. Add .option("spark.cosmos.accountKey", cosmos_key) 5. Add .option("spark.cosmos.database", "...") — linked service auto-resolved this; now explicit

Cosmos DB analytics connector (azure-cosmos-analytics-spark): This JAR is missing from Fabric Runtime 1.3. If your SJDs use com.azure.cosmos.spark with the analytics store, upload the JAR to your Fabric Environment. See library-compatibility.md.

---

Cosmos DB Connector — Spark Config Style

Some notebooks set Cosmos DB connection at the Spark config level rather than per-read/write:

# ❌ BEFORE — Synapse: Spark config with linked service
spark.conf.set("spark.cosmos.linkedService", "CosmosDbLS")
spark.conf.set("spark.cosmos.container", "events")

df = spark.read.format("cosmos.oltp").load()

# ✅ AFTER — Fabric: Spark config with direct credentials
cosmos_key = notebookutils.credentials.getSecret(
    "https://mykeyvault.vault.azure.net/", "cosmos-account-key"
)
spark.conf.set("spark.cosmos.accountEndpoint", "https://mycosmosaccount.documents.azure.com:443/")
spark.conf.set("spark.cosmos.accountKey", cosmos_key)
spark.conf.set("spark.cosmos.database", "mydb")
spark.conf.set("spark.cosmos.container", "events")

df = spark.read.format("cosmos.oltp").load()

---

ADLS Gen2 OAuth — LinkedServiceBasedTokenProvider → ClientCredsTokenProvider

Search pattern: LinkedServiceBasedTokenProvider or spark.storage.synapse or getPropertiesAsMap

This is the most common pattern for ADLS Gen2 access using a service principal through a Synapse linked service.

Python

# ❌ BEFORE — Synapse: OAuth via linked service token provider
spark.conf.set("spark.storage.synapse.linkedServiceName", "MyADLSLinkedService")
spark.conf.set(
    "fs.azure.account.oauth.provider.type.mystorageaccount.dfs.core.windows.net",
    "com.microsoft.azure.synapse.tokenlibrary.LinkedServiceBasedTokenProvider"
)

df = spark.read.parquet("abfss://container@mystorageaccount.dfs.core.windows.net/data/")

# ✅ AFTER — Fabric: OAuth via standard ClientCredsTokenProvider
storage_account = "mystorageaccount"
client_id = notebookutils.credentials.getSecret("https://mykeyvault.vault.azure.net/", "sp-client-id")
client_secret = notebookutils.credentials.getSecret("https://mykeyvault.vault.azure.net/", "sp-client-secret")
tenant_id = notebookutils.credentials.getSecret("https://mykeyvault.vault.azure.net/", "tenant-id")

spark.conf.set(f"fs.azure.account.auth.type.{storage_account}.dfs.core.windows.net", "OAuth")
spark.conf.set(f"fs.azure.account.oauth.provider.type.{storage_account}.dfs.core.windows.net",
               "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set(f"fs.azure.account.oauth2.client.id.{storage_account}.dfs.core.windows.net", client_id)
spark.conf.set(f"fs.azure.account.oauth2.client.secret.{storage_account}.dfs.core.windows.net", client_secret)
spark.conf.set(f"fs.azure.account.oauth2.client.endpoint.{storage_account}.dfs.core.windows.net",
               f"https://login.microsoftonline.com/{tenant_id}/oauth2/token")

df = spark.read.parquet(f"abfss://container@{storage_account}.dfs.core.windows.net/data/")

Scala

// ❌ BEFORE — Synapse (Scala): linked service token provider
val linked_service_cfg = "MyADLSLinkedService"
val conexion = TokenLibrary.getPropertiesAsMap(linked_service_cfg)
val my_account = conexion("Endpoint").toString.substring(8)

spark.conf.set(s"fs.azure.account.oauth.provider.type.${my_account}.dfs.core.windows.net",
  "com.microsoft.azure.synapse.tokenlibrary.LinkedServiceBasedTokenProvider")
spark.conf.set("spark.storage.synapse.linkedServiceName", linked_service_cfg)

// ✅ AFTER — Fabric (Scala): standard OAuth
val storageAccount = "mystorageaccount"
val clientId = notebookutils.credentials.getSecret("https://mykeyvault.vault.azure.net/", "sp-client-id")
val clientSecret = notebookutils.credentials.getSecret("https://mykeyvault.vault.azure.net/", "sp-client-secret")
val tenantId = notebookutils.credentials.getSecret("https://mykeyvault.vault.azure.net/", "tenant-id")

spark.conf.set(s"fs.azure.account.auth.type.${storageAccount}.dfs.core.windows.net", "OAuth")
spark.conf.set(s"fs.azure.account.oauth.provider.type.${storageAccount}.dfs.core.windows.net",
  "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set(s"fs.azure.account.oauth2.client.id.${storageAccount}.dfs.core.windows.net", clientId)
spark.conf.set(s"fs.azure.account.oauth2.client.secret.${storageAccount}.dfs.core.windows.net", clientSecret)
spark.conf.set(s"fs.azure.account.oauth2.client.endpoint.${storageAccount}.dfs.core.windows.net",
  s"https://login.microsoftonline.com/${tenantId}/oauth2/token")

Changes: 1. Remove spark.storage.synapse.linkedServiceName — not supported in Fabric 2. Remove TokenLibrary.getPropertiesAsMap() — not available in Fabric 3. Replace LinkedServiceBasedTokenProvider with ClientCredsTokenProvider 4. Configure client.id, client.secret, client.endpoint per storage account 5. Store credentials in Key Vault; retrieve via notebookutils.credentials.getSecret()

Preferred alternative: If the ADLS Gen2 data only needs to be read, create an OneLake Shortcut instead. This eliminates OAuth config entirely — the shortcut handles authentication. See connectivity-migration.md.

---

Token Library (Synapse-only)

Search pattern: TokenLibrary or getPropertiesAsMap

Synapse's TokenLibrary provides two capabilities, both replaced differently:

Token Acquisition

# ❌ BEFORE — Synapse: get access token via linked service
token = TokenLibrary.getAccessToken("https://database.windows.net/")

# ✅ AFTER — Fabric: use notebookutils
token = notebookutils.credentials.getToken("https://database.windows.net/")

Linked Service Property Extraction

// ❌ BEFORE — Synapse: extract linked service properties
val props = TokenLibrary.getPropertiesAsMap("MyLinkedService")
val endpoint = props("Endpoint")
val accountName = endpoint.toString.substring(8)

// ✅ AFTER — Fabric: no linked services — hardcode or parameterize
val accountName = "mystorageaccount"  // or read from notebook parameters
// If dynamic, store in Key Vault:
// val accountName = notebookutils.credentials.getSecret("https://myvault.vault.azure.net/", "storage-account-name")

Secret Retrieval via Linked Service

# ❌ BEFORE — Synapse: get secret via Key Vault linked service
secret = mssparkutils.credentials.getSecretWithLS("MyKeyVaultLS", "my-secret-name")
# or
secret = TokenLibrary.getSecret("MyKeyVaultLinkedService", "my-secret-name")

# ✅ AFTER — Fabric: reference Key Vault URL directly (no linked service)
secret = notebookutils.credentials.getSecret(
    "https://mykeyvault.vault.azure.net/",
    "my-secret-name"
)

Key change: In Fabric, getSecret() takes the full Key Vault URL as the first parameter, not a linked service name. The Key Vault must have an access policy granting your Fabric workspace identity Get permission on secrets.

---

`spark.read.synapsesql()` — Synapse SQL Connector

Search pattern: synapsesql

This Synapse-specific connector reads from Dedicated SQL Pool. It has no Fabric equivalent.

# ❌ BEFORE — Synapse: read from Dedicated SQL Pool
df = spark.read.synapsesql("mypool.dbo.FactSales")

# ✅ AFTER (Option A) — Fabric: read from migrated Lakehouse Delta table
df = spark.read.format("delta").load("Tables/FactSales")

# ✅ AFTER (Option B) — Fabric: read from Fabric Warehouse via JDBC
token = notebookutils.credentials.getToken("https://database.windows.net/")
jdbc_url = "jdbc:sqlserver://mywarehouse-endpoint.datawarehouse.fabric.microsoft.com:1433;database=mywarehouse"

df = (spark.read
    .format("jdbc")
    .option("url", jdbc_url)
    .option("accessToken", token)
    .option("dbtable", "dbo.FactSales")
    .load())

Decision guide:

Data migrated to Lakehouse (most common): Use Option A — direct Delta read, fastest
Data in Fabric Warehouse: Use Option B — JDBC with Entra ID token

---

Connector Refactoring Checklist

Search Pattern	Connector	What Changes
`spark.synapse.linkedService.*DataExplorer`	Kusto/ADX	Replace linked service with `accessToken` via `getToken()`
`spark.synapse.linkedService.*Cosmos`	Cosmos DB	Replace linked service with `accountEndpoint` + `accountKey` from Key Vault
`cosmos.oltp` + `getSecretWithLS`	Cosmos DB	Replace `getSecretWithLS()` with `getSecret(vaultUrl, name)`
`LinkedServiceBasedTokenProvider`	ADLS Gen2 OAuth	Replace with `ClientCredsTokenProvider` + SP creds from Key Vault
`spark.storage.synapse.linkedServiceName`	ADLS Gen2	Remove entirely — not supported in Fabric
`TokenLibrary.getPropertiesAsMap`	Any linked service	Remove; hardcode or parameterize values
`TokenLibrary.getSecret`	Key Vault	Replace with `notebookutils.credentials.getSecret(vaultUrl, name)`
`getSecretWithLS`	Key Vault	Replace with `getSecret(vaultUrl, name)` — use full vault URL
`synapsesql`	Dedicated SQL Pool	Replace with Delta read or JDBC with `accessToken`

DMTS note: DMTS Connections (Data Management Trusted Service) are supported in Fabric notebooks only — not yet in Spark Job Definitions. If your SJD code uses DMTS, refactor to direct endpoint authentication.

External Hive Metastore → Fabric Lakehouse Migration

Migrate Synapse workspaces that use an external Hive Metastore (backed by Azure SQL Database or Azure Database for MySQL) to Fabric Lakehouses.

When to use this guide: Your Synapse Spark pools are configured with spark.hadoop.javax.jdo.option.ConnectionURL pointing to an external database. If your workspace uses the built-in HMS (no external connection configured), use lake-database-migration.md instead.

Deprecation notice: External HMS support in Synapse is deprecated after Spark 3.4. Fabric does not support connecting to an external Hive Metastore — all metadata must be migrated into the Fabric Lakehouse catalog.

Auth tokens needed (see COMMON-CLI.md § Authentication Recipes for commands):

- Synapse ARM audience: https://management.azure.com (for detection)

- HMS database: JDBC credentials (SQL auth or Entra ID) to query the external HMS

- Fabric audience: https://api.fabric.microsoft.com

---

Migration Workflow

External HMS Migration:
├── Step 0: Detect external HMS configuration
├── Step 1: Connect to HMS database and inventory databases & tables
├── Step 2: Select databases to migrate and choose mapping mode
├── Step 3: Create Fabric Lakehouse(es)
├── Step 4: Create schemas (if using schema mapping)
├── Step 5: Create OneLake shortcuts (Delta → Tables/, non-Delta → Files/)
├── Step 6: Handle non-Delta tables — convert to Delta or retain original format
├── Step 7: Validate
└── Step 8: (Optional) Validate before proceeding to Phase 2

---

Step 0: Detect External HMS Configuration

Read the Spark pool configuration from the ARM API to determine if an external HMS is configured.

Endpoint:

GET https://management.azure.com/subscriptions/{subId}/resourceGroups/{rg}/providers/Microsoft.Synapse/workspaces/{ws}/bigDataPools/{poolName}?api-version=2021-06-01

In the response, check properties.sparkConfigProperties.content for these keys:

Spark Config Key	Present?	Meaning
`spark.hadoop.javax.jdo.option.ConnectionURL`	Yes	External HMS — this guide applies
`spark.hadoop.javax.jdo.option.ConnectionDriverName`	Yes	JDBC driver (e.g., `com.microsoft.sqlserver.jdbc.SQLServerDriver` for Azure SQL, `org.mariadb.jdbc.Driver` for MySQL)
`spark.hadoop.javax.jdo.option.ConnectionUserName`	Yes	HMS database username
`spark.hadoop.javax.jdo.option.ConnectionPassword`	Yes	HMS database password (may reference Key Vault)
`spark.sql.hive.metastore.version`	Optional	HMS version (e.g., `3.1.0`)
`spark.sql.hive.metastore.jars`	Optional	Path to HMS client JARs

If none of these keys are present: The workspace uses the built-in HMS → use lake-database-migration.md instead.

Extract Connection Details

# Parse the JDBC connection URL from pool config
# Azure SQL DB example: jdbc:sqlserver://myserver.database.windows.net:1433;database=hive_metastore;...
# MySQL example: jdbc:mysql://myserver.mysql.database.azure.com:3306/hive_metastore?...

connection_url = spark_config["spark.hadoop.javax.jdo.option.ConnectionURL"]
driver = spark_config["spark.hadoop.javax.jdo.option.ConnectionDriverName"]
username = spark_config["spark.hadoop.javax.jdo.option.ConnectionUserName"]
# password may be a Key Vault reference — resolve before connecting

---

Step 1: Inventory — Query the HMS Database

Connect to the external HMS database via JDBC and extract metadata. The Hive Metastore uses a standard schema (same structure for both Azure SQL DB and MySQL).

Namespace: The external HMS has a flat namespace — Database → Table. There are no inner schemas. This simplifies mapping to Fabric.

JDBC Connection Troubleshooting

If the JDBC connection to the external HMS database fails, check these common causes:

Error	Cause	Fix
`Login failed for user` / `Access denied for user`	Wrong credentials or expired password	Verify username/password; check if Key Vault secret has rotated
`Cannot open server ... requested by the login`	Database name is wrong or database has been deleted	Verify the database name in the JDBC URL
`Connection timed out` / `No route to host`	Firewall blocks access from the machine running the migration	Add client IP to Azure SQL / MySQL firewall rules; check VNet/Private Endpoint settings
`SSL handshake failed` / `certificate verify failed`	TLS configuration mismatch	Add `encrypt=true;trustServerCertificate=true` (Azure SQL) or `useSSL=true&requireSSL=false` (MySQL) to JDBC URL
`com.microsoft.sqlserver.jdbc.SQLServerException: TCP/IP connection ... has failed`	SQL Server is paused or stopped (serverless)	Resume the Azure SQL DB in the Portal
`Communications link failure`	Network-level connectivity issue (DNS, proxy, VPN)	Test connectivity with `Test-NetConnection -ComputerName {server} -Port {port}`

1a. List Databases

SELECT
    d.DB_ID,
    d.NAME           AS database_name,
    d.DB_LOCATION_URI AS location,
    d.OWNER_NAME     AS owner
FROM DBS d
ORDER BY d.NAME;

1b. List Tables with Storage Info

SELECT
    d.NAME           AS database_name,
    t.TBL_NAME       AS table_name,
    t.TBL_TYPE       AS table_type,       -- MANAGED_TABLE or EXTERNAL_TABLE
    s.LOCATION       AS data_location,
    s.INPUT_FORMAT   AS input_format,
    s.OUTPUT_FORMAT  AS output_format,
    tp_provider.PARAM_VALUE AS spark_provider,  -- 'delta', 'parquet', etc.
    tp_delta.PARAM_VALUE    AS is_delta          -- non-null if Delta table
FROM TBLS t
JOIN DBS d    ON t.DB_ID = d.DB_ID
JOIN SDS s    ON t.SD_ID = s.SD_ID
LEFT JOIN TABLE_PARAMS tp_provider
    ON t.TBL_ID = tp_provider.TBL_ID AND tp_provider.PARAM_KEY = 'spark.sql.sources.provider'
LEFT JOIN TABLE_PARAMS tp_delta
    ON t.TBL_ID = tp_delta.TBL_ID AND tp_delta.PARAM_KEY = 'delta.lastCommitTimestamp'
ORDER BY d.NAME, t.TBL_NAME;

Detecting table format:

How to detect	Condition	Format
`spark.sql.sources.provider` = `'delta'`	Preferred	Delta
`delta.lastCommitTimestamp` is not null	Fallback	Delta
`INPUT_FORMAT` contains `parquet` and no delta markers	—	Parquet
`INPUT_FORMAT` contains `orc`	—	ORC
`INPUT_FORMAT` contains `Text` or `csv`	—	CSV/Text
`spark.sql.sources.provider` = `'parquet'`	—	Parquet
`spark.sql.sources.provider` = `'orc'`	—	ORC

1c. List Columns

SELECT
    d.NAME           AS database_name,
    t.TBL_NAME       AS table_name,
    c.COLUMN_NAME,
    c.TYPE_NAME,
    c.INTEGER_IDX    AS ordinal_position
FROM COLUMNS_V2 c
JOIN SDS s    ON c.CD_ID = s.CD_ID
JOIN TBLS t   ON t.SD_ID = s.SD_ID
JOIN DBS d    ON t.DB_ID = d.DB_ID
ORDER BY d.NAME, t.TBL_NAME, c.INTEGER_IDX;

1d. List Partition Keys

SELECT
    d.NAME           AS database_name,
    t.TBL_NAME       AS table_name,
    pk.PKEY_NAME     AS partition_column,
    pk.PKEY_TYPE     AS partition_type,
    pk.INTEGER_IDX   AS ordinal_position
FROM PARTITION_KEYS pk
JOIN TBLS t   ON pk.TBL_ID = t.TBL_ID
JOIN DBS d    ON t.DB_ID = d.DB_ID
ORDER BY d.NAME, t.TBL_NAME, pk.INTEGER_IDX;

1e. List Partitions (for non-Delta tables)

SELECT
    d.NAME           AS database_name,
    t.TBL_NAME       AS table_name,
    p.PART_ID,
    s.LOCATION       AS partition_location,
    p.CREATE_TIME
FROM PARTITIONS p
JOIN TBLS t   ON p.TBL_ID = t.TBL_ID
JOIN DBS d    ON t.DB_ID = d.DB_ID
JOIN SDS s    ON p.SD_ID = s.SD_ID
ORDER BY d.NAME, t.TBL_NAME, p.CREATE_TIME;

Delta tables: Skip partition enumeration — Delta handles partitions internally via _delta_log. Only query partitions for non-Delta tables that need MSCK REPAIR TABLE after migration.

1f. Summary Output

After running the queries, produce an inventory summary:

External HMS Inventory:
  HMS Database: jdbc:sqlserver://myserver.database.windows.net;database=hive_metastore
  Total databases: 5
  Total tables: 142

  Database: sales (37 tables)
    Delta tables: 30 (24 managed, 6 external)
    Parquet tables: 5 (all external)
    ORC tables: 2 (all managed)
    Partitioned tables: 8

  Database: marketing (22 tables)
    Delta tables: 22 (all managed)
    Partitioned tables: 3

  Database: staging (45 tables)
    ...

---

Step 2: Select Databases and Choose Mapping Mode

Database Selection

If the external HMS is shared with other platforms (HDInsight, Databricks), not all databases may be Synapse-owned. Ask the user which databases to migrate:

{
  "databasesToMigrate": ["sales", "marketing", "staging"],
  "databasesToSkip": ["hdinsight_etl", "databricks_ml"]
}

If the HMS is Synapse-only (being decommissioned), migrate all databases.

Mapping Mode

Choose one of two modes:

Mode A: Schemas in One Lakehouse (Default)

All selected HMS databases → schemas within one target Lakehouse.

HMS Database	Fabric Target
`sales`	Schema `sales` in target Lakehouse
`marketing`	Schema `marketing` in target Lakehouse
`staging`	Schema `staging` in target Lakehouse
`default`	Schema `dbo` (Lakehouse default)

Advantages: Fewer items to manage; cross-schema queries via 2-part names; single SQL endpoint.

Disadvantages: Less isolation; shared OneLake path; harder to assign per-database permissions.

No schema collision risk: The external HMS has a flat namespace (Database → Table), so each database name maps directly to a Fabric schema name with no composite naming needed.

Mode B: Separate Lakehouses

Each selected HMS database → its own Fabric Lakehouse.

HMS Database	Fabric Target
`sales`	Lakehouse `sales`
`marketing`	Lakehouse `marketing`
`staging`	Lakehouse `staging`

Advantages: Strong isolation; independent security (OneLake RBAC per Lakehouse); independent SQL endpoints.

Disadvantages: More items to manage; cross-database queries require 3-part names.

Recommended when: The HMS is shared with other platforms and you want clear isolation for migrated databases, or different databases are owned by different teams.

Emit Mapping Report

Before creating anything, show the user the planned mapping:

External HMS Migration Plan:
  Source: jdbc:sqlserver://myserver.database.windows.net;database=hive_metastore
  Mode: A (schemas in one Lakehouse)
  Target Lakehouse: MigratedData_Lakehouse

  HMS sales.customers         → MigratedData_Lakehouse.sales.customers (Delta, shortcut)
  HMS sales.orders            → MigratedData_Lakehouse.sales.orders (Delta, shortcut)
  HMS sales.legacy_archive    → MigratedData_Lakehouse.sales.legacy_archive (Parquet, Files/)
  HMS marketing.campaigns     → MigratedData_Lakehouse.marketing.campaigns (Delta, shortcut)
  HMS staging.raw_events      → MigratedData_Lakehouse.staging.raw_events (Delta, shortcut)

---

Step 3: Create Fabric Lakehouse(es)

Endpoint: POST https://api.fabric.microsoft.com/v1/workspaces/{workspaceId}/items

{
  "displayName": "{lakehouseName}",
  "type": "Lakehouse",
  "description": "Migrated from external Hive Metastore",
  "creationPayload": {
    "enableSchemas": true
  }
}

Mode A: Create one Lakehouse with enableSchemas: true
Mode B: Create one Lakehouse per selected database (still use enableSchemas: true)

Returns HTTP 202 (LRO). Poll Location header until status == "Succeeded". Capture the id for subsequent steps.

---

Step 4: Create Schemas (Mode A Only)

For each selected HMS database, create the corresponding Fabric schema:

CREATE SCHEMA IF NOT EXISTS sales;
CREATE SCHEMA IF NOT EXISTS marketing;
CREATE SCHEMA IF NOT EXISTS staging;
-- 'default' database maps to the built-in 'dbo' schema — no creation needed

Execute via:

SQL endpoint: Connect to the Lakehouse SQL endpoint
Fabric notebook: Run in a notebook cell attached to the Lakehouse
Livy session: POST /v1/workspaces/{workspaceId}/lakehouses/{lakehouseId}/livySessions

---

Step 5: Create OneLake Shortcuts

For each table in the selected databases, create a shortcut based on format and type.

Shortcut Target Decision

Table Format	Shortcut Location	Catalog Registration
Delta	`Tables/{schema}/{tableName}`	Auto-registers in Lakehouse catalog
Parquet / ORC / CSV / Avro	`Files/{schema}/{tableName}`	Not auto-registered — handle in Step 6

Parse Storage Location

The SDS.LOCATION (or data_location from Step 1b) provides the ADLS Gen2 path. Parse it into shortcut parameters:

Managed tables (typical warehouse directory):

abfss://{container}@{storage}.dfs.core.windows.net/synapse/workspaces/{workspace}/warehouse/{database}.db/{tableName}

location: https://{storage}.dfs.core.windows.net
subpath: /{container}/synapse/workspaces/{workspace}/warehouse/{database}.db/{tableName}

Note: External HMS managed tables may use .db suffix in the warehouse directory (e.g., sales.db/customers), unlike built-in HMS which omits it. Check the actual SDS.LOCATION value.

External tables (arbitrary ADLS path):

abfss://{container}@{storage}.dfs.core.windows.net/{custom/path/to/table}

location: https://{storage}.dfs.core.windows.net
subpath: /{container}/{custom/path/to/table}

Create Shortcut API

Endpoint: POST https://api.fabric.microsoft.com/v1/workspaces/{workspaceId}/items/{lakehouseId}/shortcuts

Note: The endpoint uses /items/{lakehouseId}/shortcuts, NOT /lakehouses/{lakehouseId}/shortcuts (which returns 404).

{
  "name": "{tableName}",
  "path": "Tables/{schemaName}",
  "target": {
    "type": "AdlsGen2",
    "adlsGen2": {
      "location": "https://{storageAccount}.dfs.core.windows.net",
      "subpath": "/{container}/{path/to/table}",
      "connectionId": "{connectionId}"
    }
  }
}

`connectionId` is required. See lake-database-migration.md § Step 4b for how to discover or create the ADLS connection.

For non-Delta tables, change "path" to "Files/{schemaName}".

Authentication for Shortcuts

The Fabric workspace identity (or creating user) must have Storage Blob Data Reader on the target ADLS Gen2 storage account.

---

Step 6: Handle Non-Delta Tables

For non-Delta tables (Parquet, ORC, CSV, Avro), the shortcut is created under Files/ but the table is not auto-registered in the Lakehouse catalog. Choose one option:

Option A: Convert to Delta (Recommended)

# Read from the shortcut path
df = spark.read.format("{originalFormat}").load("Files/{schemaName}/{tableName}")

# Write as Delta to the Tables section
df.write.format("delta").mode("overwrite").saveAsTable("{schemaName}.{tableName}")

For partitioned tables (identified in Step 1d):

df = spark.read.format("parquet").load("Files/{schemaName}/{tableName}")
df.write.format("delta") \
    .partitionBy("{partitionCol}") \
    .mode("overwrite") \
    .saveAsTable("{schemaName}.{tableName}")

Advantages: Full Lakehouse catalog registration, SQL endpoint queries, Power BI Direct Lake, V-Order, ACID transactions.

Option B: Retain Original Format

Keep tables in their legacy format under Files/. Register them in the catalog manually:

CREATE TABLE IF NOT EXISTS {schemaName}.{tableName}
USING {format}
LOCATION 'Files/{schemaName}/{tableName}';

For Hive-style partitioned tables, recover partition metadata:

CREATE TABLE IF NOT EXISTS {schemaName}.{tableName}
USING PARQUET
PARTITIONED BY ({partitionCols})
LOCATION 'Files/{schemaName}/{tableName}';

MSCK REPAIR TABLE {schemaName}.{tableName};

Why `MSCK REPAIR TABLE`? Non-Delta tables rely on HMS partition metadata for partition pruning. Without this step, queries scan all files instead of pruning to relevant partitions.

Comparison

Capability	Delta (Option A)	Original Format (Option B)
Lakehouse catalog registration	Yes	Yes (after `CREATE TABLE`)
SQL endpoint queries	Yes	Yes
Power BI Direct Lake	Yes	No — requires Delta
V-Order optimization	Yes	No
ACID / time travel	Yes	No
Partition pruning	Automatic	Requires `MSCK REPAIR TABLE`
Data duplication	Yes (new copy)	No (zero-copy via shortcut)

---

Step 7: Validate

After migration, verify:

1. Lakehouse catalog: Check that tables appear in the Lakehouse Explorer UI under Tables 2. SQL endpoint: Query migrated tables via the SQL endpoint to confirm schema and data 3. Row counts: Compare row counts between HMS (via JDBC or Spark SQL on Synapse) and Fabric

    -- On Synapse (or via JDBC to HMS + Spark)
    SELECT COUNT(*) FROM sales.customers;

    -- On Fabric
    SELECT COUNT(*) FROM sales.customers;

4. Shortcut health: Verify shortcuts are accessible

    notebookutils.fs.ls("Tables/sales/")
    notebookutils.fs.ls("Files/sales/")  # if non-Delta tables exist

5. Partition coverage (for Option B non-Delta tables): Verify partition counts match

    SHOW PARTITIONS {schemaName}.{tableName};

---

Step 8: (Optional) Validate Before Proceeding to Phase 2

Run these checks before migrating Notebooks. Notebooks rely on Lakehouses being healthy — missing shortcuts or unregistered tables cause immediate runtime failures.

Check	How	Pass Criteria
Shortcut health	`notebookutils.fs.ls("Tables/{schema}/")` and `Files/{schema}/`	All shortcuts resolve; no `PathNotFound` errors
Row counts	Compare `SELECT COUNT(*)` on HMS source (via JDBC/Spark) vs. Fabric for each table	Counts match
Schema comparison	Compare column names and types from HMS `COLUMNS_V2` vs. Fabric `DESCRIBE TABLE`	Exact match
Non-Delta registration	`SHOW TABLES IN {schema}`	All Option B tables appear in catalog
Partition coverage	`SHOW PARTITIONS {schema}.{table}` for Option B partitioned tables	Partition count matches HMS `PARTITIONS` table

Do not proceed to Phase 2 until all shortcuts are healthy and row counts match. A notebook that reads from a missing or broken shortcut will fail silently or produce wrong results.

See validation-testing.md → V2: Data Validation for detailed scripts.

---

Limitations and Considerations

Limitation	Impact	Mitigation
External HMS is deprecated after Spark 3.4	No new development; should migrate sooner rather than later	This guide helps you migrate off it
Fabric cannot connect to an external HMS	Must copy metadata into Lakehouse catalog — no live connection	This guide extracts and recreates all metadata
HMS functions (UDFs) are not migrated	Custom functions stored in HMS are not extracted by these queries	Recreate manually via `spark.udf.register()` or `CREATE FUNCTION`
HMS views are not migrated	Views stored in HMS are not extracted	Extract view definitions (see query below), then recreate as Spark SQL views in Fabric
Shared HMS — other platforms still using it	Migrating databases to Fabric doesn't remove them from the external HMS	Coordinate with other platform teams; HMS remains untouched
Large catalogs (10K+ tables)	JDBC queries scale well, but shortcut creation is sequential (one API call per table)	Batch shortcut creation; consider parallel requests (respect API rate limits)
Managed table data in Synapse storage	Workspace-internal storage requires Fabric identity to have read access	Grant Storage Blob Data Reader before creating shortcuts

Extracting HMS view definitions:

SELECT
    d.NAME AS database_name,
    t.TBL_NAME AS view_name,
    vt.PARAM_VALUE AS view_sql
FROM TBLS t
JOIN DBS d ON t.DB_ID = d.DB_ID
JOIN TABLE_PARAMS vt ON t.TBL_ID = vt.TBL_ID AND vt.PARAM_KEY = 'view.query.text'
WHERE t.TBL_TYPE = 'VIRTUAL_VIEW'
  AND vt.PARAM_VALUE IS NOT NULL
ORDER BY d.NAME, t.TBL_NAME;

The view SQL is stored in the view.query.text parameter. Recreate in Fabric: CREATE OR REPLACE VIEW {schema}.{view_name} AS {view_sql}.

Feature Parity Reference

Quick-reference summary of Synapse Spark features and their Fabric equivalents.

Synapse → Fabric Feature Matrix

Synapse Feature	Fabric Equivalent	Parity	Notes
Spark Pool (on-demand)	Starter Pool	✅ Full	Auto-provisioned, no config needed
Spark Pool (custom)	Custom Pool / Environment	✅ Full	Node family + size + autoscale via Environment
Pool-level libraries	Environment (libraries section)	✅ Full	PyPI, Conda, custom .whl/.jar
`mssparkutils.*`	`notebookutils.*`	✅ Full	Namespace change only — see utility-api-mapping.md
`mssparkutils.env`	`notebookutils.runtime`	⚠️ Renamed	`.env.getWorkspaceName()` → `.runtime.context["workspaceName"]`
Linked Services	Data Connections / Shortcuts	⚠️ Redesigned	No 1:1 mapping — see connectivity-migration.md
`spark.read.synapsesql()`	JDBC / OneLake shortcut	⚠️ Replaced	Connector not available in Fabric
Lake Database (built-in HMS)	Lakehouse (managed Delta)	✅ Full	Tables → shortcuts, schemas supported
External Hive Metastore	Lakehouse (via shortcuts)	⚠️ Partial	HMS not natively supported — migrate tables as shortcuts
Notebook `%%configure`	`%%configure`	✅ Full	Identical syntax
`spark.conf.set()`	`spark.conf.set()`	✅ Full	Identical
Spark SQL (DDL/DML)	Spark SQL	✅ Full	`CREATE SCHEMA`, `CREATE TABLE`, etc.
Notebook parameters	Notebook parameters	✅ Full	Same `parameters` cell mechanism
Spark Job Definitions	Spark Job Definitions	✅ Full	Same concept, different deployment API
Delta Lake read/write	Delta Lake read/write	✅ Full	Native format in Fabric
Notebook scheduling	Job Scheduler / Pipelines	✅ Full	REST API or Pipeline activity
Git integration	Git integration	✅ Full	Workspace-level Git sync
`TokenLibrary` (OAuth)	Workspace Identity / `notebookutils.credentials`	⚠️ Replaced	See connector-refactoring.md
Catalog API (`spark.catalog.*`)	Catalog API	⚠️ Partial	`tableExists()`, `listTables()`, `listColumns()`, `cacheTable()`, `dropTempView()` work; database-level methods (`listDatabases()`, `currentDatabase()`, `getDatabase()`) and function-level methods need Spark SQL replacements — see code-patterns.md
Managed VNet / Private Endpoints	Managed Private Endpoints	⚠️ Partial	Capacity-level config, portal only
Result set caching	Not available	❌ Missing	Rely on query plan caching
Workload management (classifiers)	Not available	❌ Missing	Use capacity management
PolyBase external tables	`COPY INTO` / Lakehouse shortcuts	⚠️ Replaced	Rewrite required
`DISTRIBUTION = HASH(col)`	Auto-distributed	⚠️ Removed	Remove hints — Fabric handles distribution

Legend: ✅ Full parity — ⚠️ Partial / renamed / replaced — ❌ Not available

T-SQL Surface Area Gaps

Fabric Warehouse supports a broad T-SQL surface, but some Dedicated SQL Pool features differ:

Synapse Dedicated SQL Pool Feature	Fabric Warehouse Equivalent	Action Required
`CREATE EXTERNAL TABLE` (PolyBase)	`COPY INTO` or Lakehouse SQL Endpoint	Rewrite ingestion; use `COPY INTO` for bulk load from ADLS/OneLake
`DISTRIBUTION = HASH(col)`	Not applicable — Fabric auto-distributes	Remove distribution hints from DDL
`CLUSTERED COLUMNSTORE INDEX` (default)	Delta Lake (Lakehouse) or Fabric Warehouse DCI	Warehouse tables use Delta-backed storage automatically
Result set caching	Not available	Remove cache hints; rely on query plan caching
Workload management (classifiers)	Not available	Use workspace capacity management
`sp_rename`	Supported	No change needed
`MERGE` statement	Supported	No change needed
Temp tables (`#temp`)	Supported	No change needed
Window functions	Supported	No change needed

Delegate to `sqldw-authoring-cli` for all T-SQL DDL/DML authoring tasks after mapping the workload.

Spark Configuration Differences

Synapse Spark Concept	Fabric Spark Equivalent	Notes
Spark Pool definition (node type, autoscale min/max)	Custom Pool or Starter Pool	Starter Pool (auto-provisioned, no config needed) covers most dev workloads; Custom Pools for production SLAs
`%%configure` magic cell (session-level config)	`%%configure` magic — identical syntax	Supported in Fabric notebooks
`spark.conf.set(...)`	`spark.conf.set(...)` — identical	No change needed
Environment-scoped libraries (pool packages)	Fabric Environment attached to workspace/notebook	Replace pool-level library installs with a Fabric Environment item
Synapse-specific Spark versions	Fabric Runtime versions (1.1 = Spark 3.3, 1.2 = Spark 3.4, 1.3 = Spark 3.5)	Align runtime version; test deprecated API calls
`spark.read.synapsesql(...)` connector	Not available — use `notebookutils` + Lakehouse shortcuts or Warehouse JDBC	Replace with OneLake reads or SQL endpoint queries

Synapse Lake Database → Fabric Lakehouse Migration

Migrate Synapse Lake Databases and Hive Metastore metadata to Fabric Lakehouses via REST APIs.

Hive Metastore Coverage: Synapse's built-in Hive Metastore (HMS) and Lake Databases share the same underlying catalog. Databases, tables, views, and partitions created via Lake Database designer, Spark SQL (CREATE DATABASE, CREATE TABLE), or notebook code (df.write.saveAsTable(...)) are all stored in the built-in HMS and are all visible through the Lake Database REST API used in this guide. This means HMS migration for managed/built-in metastores is fully covered here — no separate HMS export/import notebooks are needed.

External Hive Metastore (Azure SQL DB / MySQL-backed): If your Synapse workspace uses an external HMS, see external-hms-migration.md for the complete migration guide using JDBC queries. Fabric does not support connecting to an external HMS — all metadata must be migrated into the Fabric Lakehouse catalog. External HMS support in Synapse is deprecated after Spark 3.4.

Prerequisite: Authenticate to both Synapse and Fabric APIs before starting (see COMMON-CLI.md § Authentication Recipes).

- Synapse data-plane audience: https://dev.azuresynapse.net

- Fabric audience: https://api.fabric.microsoft.com

---

Phase 1 Overview

Lake Databases must be migrated before Notebooks and Spark Job Definitions so that Fabric Lakehouses exist for notebook lakehouse binding (Phase 2, Step 4).

Phase 1: Lake Database Migration
├── Step 1: Inventory — list databases & tables from Synapse
├── Step 2: Choose mapping mode — schemas vs. separate Lakehouses
├── Step 3: Create Fabric Lakehouse(es)
├── Step 4: Create schemas (if using schema mapping mode)
├── Step 4b: Discover or create ADLS connection (probe-test candidates)
├── Step 5: Create OneLake shortcuts (Delta → Tables/, non-Delta → Files/)
├── Step 6: (Optional) Convert non-Delta tables to Delta
├── Step 7: Validate — verify catalog registration and data accessibility
└── Step 8: (Optional) Validate before proceeding to Phase 2

---

Step 1: Inventory Synapse Lake Databases

List All Databases

Endpoint: GET {endpoint}/databases?api-version=2021-04-01

GET https://{workspaceName}.dev.azuresynapse.net/databases?api-version=2021-04-01

Response contains items[] — each item has:

name — database name
type — always "DATABASE"
properties.source.location — ADLS Gen2 base path (e.g., abfss://container@storage.dfs.core.windows.net/dbname)
properties.source.provider — typically "ADLS"
properties.origin.type — "SPARK" (Spark-native) or "SQLOD" (Serverless SQL-originated)

Filter: Only migrate databases where origin.type == "SPARK" or source.provider == "ADLS". Skip SQLOD-origin databases — these are Serverless SQL Pool views, not Spark Lake Databases.

Get Database Details

Endpoint: GET {endpoint}/databases/{databaseName}?api-version=2021-04-01

Returns DatabaseEntity with properties.source.location (the warehouse directory path for managed tables).

List Schemas in a Database

Endpoint: GET {endpoint}/databases/{databaseName}/schemas?api-version=2021-04-01

Returns items[] of schema objects. If only default/dbo exists, the database has no custom schemas.

List Tables in a Database

Endpoint: GET {endpoint}/databases/{databaseName}/tables?api-version=2021-04-01

Each table item contains:

name — table name
properties.tableType — "MANAGED" or absent (external)
properties.namespace.databaseName — parent database
properties.namespace.schemaName — parent schema (may be null for default)
properties.storageDescriptor.columns[] — column definitions with name and originDataTypeName.typeName
properties.storageDescriptor.format.formatType — "delta", "parquet", "csv", "orc", "avro", "textfile", etc.
properties.storageDescriptor.source.location — data file path in ADLS Gen2
properties.partitioning — partition columns (if Hive-style partitioned)

List Views

Endpoint: GET {endpoint}/databases/{databaseName}/VIEWs?api-version=2021-04-01

Returns view definitions. Extract the SQL for manual recreation in Fabric.

---

Step 2: Choose Mapping Mode

Ask the user which mapping mode to use. The choice can be made at the workspace level (all databases follow the same pattern) or per database (hybrid mode).

Mode A: Schemas (Default)

All Lake Databases → schemas within one target Lakehouse.

Synapse	Fabric
Database `sales`	Schema `sales` in target Lakehouse
Database `marketing`	Schema `marketing` in target Lakehouse
Database `default`	Schema `dbo` (Lakehouse default)

Advantages: Fewer items to manage; cross-schema queries via 2-part names; single SQL endpoint; aligns with Spark Migration Assistant behavior.

Disadvantages: Less isolation; shared OneLake path; harder to assign per-database permissions.

HMS databases (created via Spark SQL CREATE DATABASE) are ideal candidates for Mode A because they only have a default/dbo schema — no collision risk, simple 1:1 mapping of database name → Fabric schema name.

Mode B: Separate Lakehouses

Each Lake Database → its own Fabric Lakehouse.

Synapse	Fabric
Database `sales`	Lakehouse `sales`
Database `marketing`	Lakehouse `marketing`

Advantages: Strong isolation; independent security (OneLake RBAC per Lakehouse); independent SQL endpoints.

Disadvantages: More items to manage; cross-database queries require 3-part names; more shortcuts to create.

Mode C: Hybrid (Per-Database Assignment)

Let the user assign each database individually — some go into a shared Lakehouse as schemas, others get their own dedicated Lakehouse. This is the most flexible option for workspaces with mixed ownership or security requirements.

Example:

Synapse Database	Target Lakehouse	Maps As	Reason
`sales`	`Sales_Lakehouse` (dedicated)	Entire Lakehouse	Team-owned, needs own security boundary
`marketing`	`Marketing_Lakehouse` (dedicated)	Entire Lakehouse	Separate team, separate SQL endpoint
`staging_raw`	`ETL_Lakehouse` (shared)	Schema `staging_raw`	Shared ETL pipeline, same team
`staging_curated`	`ETL_Lakehouse` (shared)	Schema `staging_curated`	Shared ETL pipeline, same team
`default`	`ETL_Lakehouse` (shared)	Schema `dbo`	HMS default database, ETL workloads

When to use Hybrid:

Different databases are owned by different teams and need independent access control
Some databases are tightly related (ETL stages, same domain) and benefit from consolidation
Some databases need dedicated SQL endpoints for separate downstream consumers (Power BI, APIs)
Mix of HMS databases (simple, no schemas) and Lake Database designer databases (may have inner schemas)

User input format — ask the user to provide an assignment map:

{
  "databaseAssignments": [
    { "database": "sales",           "lakehouse": "Sales_Lakehouse",     "mode": "dedicated" },
    { "database": "marketing",       "lakehouse": "Marketing_Lakehouse", "mode": "dedicated" },
    { "database": "staging_raw",     "lakehouse": "ETL_Lakehouse",       "mode": "schema" },
    { "database": "staging_curated", "lakehouse": "ETL_Lakehouse",       "mode": "schema" },
    { "database": "default",         "lakehouse": "ETL_Lakehouse",       "mode": "schema" }
  ]
}

"mode": "dedicated" — create a dedicated Lakehouse for this database; tables go under dbo schema (or inner schemas if they exist)
"mode": "schema" — map this database as a schema within the shared Lakehouse; the schema name defaults to the database name

Schema collision handling (for databases assigned as "schema" to the same Lakehouse): Apply the same composite naming rules as Mode A — see below.

Schema Collision Handling (Mode A and Mode C Schema Assignments)

When Synapse databases contain inner schemas beyond default/dbo, a two-level namespace (Database.Schema.Table) must be flattened to one level (Schema.Table).

Naming rules:

Synapse Source	Databases Have Inner Schemas?	Fabric Schema Name
`Database1.dbo.Table1`	No custom schemas in any database	`Database1`
`Database1.SchemaA.Table1`	Custom schemas exist	`Database1_SchemaA`
`Database1.dbo.Table4`	Custom schemas exist	`Database1` (drop `dbo`, use database name only)
`Database2.SchemaA.Table5`	Custom schemas in multiple databases	`Database2_SchemaA` (no collision)

Auto-detection logic:

1. List all databases → for each, list schemas
2. If ALL databases have only default/dbo schema:
   → Simple mode: database name = Fabric schema name
3. If ANY database has custom schemas:
   → Composite mode: {database}_{schema} (except dbo → {database})
4. If ONLY ONE database is being migrated with custom schemas:
   → Pass-through mode: inner schema names map 1:1 to Fabric schemas
5. User can always override to Mode B or Mode C

Note: In Mode C, collision detection only applies to databases assigned to the same shared Lakehouse. Databases with "mode": "dedicated" are independent — their inner schemas map directly to Fabric schemas with no collision risk.

Emit a mapping report before creating anything so the user can review name translations:

Mapping Report:
  Mode C (Hybrid) — 2 Lakehouses + 1 shared Lakehouse

  ETL_Lakehouse (shared):
    Synapse staging_raw.dbo.RawOrders         → ETL_Lakehouse.staging_raw.RawOrders
    Synapse staging_curated.dbo.CleanOrders   → ETL_Lakehouse.staging_curated.CleanOrders
    Synapse default.dbo.TempData              → ETL_Lakehouse.dbo.TempData

  Sales_Lakehouse (dedicated):
    Synapse sales.dbo.FactSales               → Sales_Lakehouse.dbo.FactSales
    Synapse sales.dbo.DimCustomer             → Sales_Lakehouse.dbo.DimCustomer

  Marketing_Lakehouse (dedicated):
    Synapse marketing.dbo.Campaigns           → Marketing_Lakehouse.dbo.Campaigns

---

Step 3: Create Fabric Lakehouse(es)

Create Lakehouse

Endpoint: POST /v1/workspaces/{workspaceId}/items

{
  "displayName": "{lakehouseName}",
  "type": "Lakehouse",
  "description": "Migrated from Synapse Lake Database",
  "creationPayload": {
    "enableSchemas": true
  }
}

Mode A: Create one Lakehouse with enableSchemas: true
Mode B: Create one Lakehouse per database (still use enableSchemas: true if database had inner schemas)

Returns HTTP 202 (LRO). Poll Location header until status == "Succeeded". Response includes id (lakehouse ID needed for shortcuts and notebook binding).

Handling name collisions (409):

If a Lakehouse with the same name already exists (HTTP 409), reuse it instead of failing:

resp = requests.post(f"{FABRIC_BASE}/workspaces/{ws_id}/items", headers=fab_headers, json=payload)
if resp.status_code == 409:
    # Lakehouse already exists — look it up by name and reuse
    items = requests.get(f"{FABRIC_BASE}/workspaces/{ws_id}/items?type=Lakehouse", headers=fab_headers).json()
    existing = next((i for i in items.get("value", []) if i["displayName"] == lakehouse_name), None)
    if existing:
        lakehouse_id = existing["id"]
        print(f"  Reusing existing Lakehouse: {lakehouse_name} (id={lakehouse_id})")

Capture Lakehouse Details

After creation, record these for later phases:

lakehouseId — needed for OneLake shortcuts (Step 5) and notebook binding (Phase 2)
workspaceId — needed for notebook metadata.dependencies.lakehouse
displayName — needed for notebook lakehouse binding

---

Step 4: Create Schemas (Mode A and Mode C Schema Assignments)

For each Synapse database (and its inner schemas), create the corresponding Fabric schema.

Schemas cannot be created via REST API — they require Spark SQL or the SQL endpoint:

CREATE SCHEMA IF NOT EXISTS Database1;
CREATE SCHEMA IF NOT EXISTS Database1_staging;
CREATE SCHEMA IF NOT EXISTS Database2;
CREATE SCHEMA IF NOT EXISTS Database2_staging;

Execute via:

SQL endpoint: Connect to the Lakehouse SQL endpoint and run T-SQL
Fabric notebook: Run in a notebook cell attached to the Lakehouse
Fabric REST API: Execute via Livy session (POST /v1/workspaces/{workspaceId}/lakehouses/{lakehouseId}/livySessions)

---

Step 4b: Discover or Create ADLS Connection

Shortcuts require a connectionId — a Fabric Connection object that holds credentials for accessing the ADLS Gen2 storage. The connection's credential (not the caller's Fabric token) is what Fabric uses to read data from storage.

Connection Discovery Strategy

1. List all connections: GET /v1/connections (Fabric API) 2. Filter by storage account: Match connections where connectionDetails.type == "AzureDataLakeStorage" and connectionDetails.path contains the target storage hostname 3. Filter by container: Parse the container from the connection path. Skip connections locked to a different container than the one containing Synapse data 4. Score and rank candidates:

Criteria	Score	Reason
Root path (no container lock)	+10	Can access any container
Matches target container	+5	Covers the Synapse data
OAuth2 credential	+2	More likely to have current RBAC
WorkspaceIdentity credential	+0	May lack RBAC, be disabled, or be blocked by policy

5. Probe-test each candidate (highest score first): Create a temporary shortcut (_probe_{tableName}), check the response, then delete the probe. Use the first connection that succeeds.

Probe-Test Logic

# Create a probe shortcut with the candidate connection
probe_payload = {
    "name": "_probe_{tableName}",
    "path": "Tables",
    "target": {"type": "AdlsGen2", "adlsGen2": {
        "location": location, "subpath": subpath, "connectionId": candidate_id
    }}
}
resp = POST /v1/workspaces/{wsId}/items/{lhId}/shortcuts (probe_payload)

if resp.status_code in (200, 201, 409):
    # Connection works — delete probe, use this connection
    DELETE /v1/workspaces/{wsId}/items/{lhId}/shortcuts/Tables/_probe_{tableName}
    selected_connection_id = candidate_id
elif resp.status_code in (400, 403):
    # Credential issue — try next candidate
    continue

Creating a New Connection

If no existing connection passes the probe, attempt programmatic creation in this order:

Attempt 1: Key-based connection (via ARM `listKeys`)

Use the ARM token to retrieve the storage account key, then create the connection:

# Step 1: Get storage account key via ARM
arm_url = (
    f"https://management.azure.com/subscriptions/{subscription_id}"
    f"/resourceGroups/{resource_group}"
    f"/providers/Microsoft.Storage/storageAccounts/{storage_account_name}"
    f"/listKeys?api-version=2023-05-01"
)
key_resp = requests.post(arm_url, headers=arm_headers)

if key_resp.status_code == 200:
    storage_key = key_resp.json()["keys"][0]["value"]

    # Step 2: Create connection with Key credential
    create_conn = {
        "connectivityType": "ShareableCloud",
        "displayName": f"{storage_account_name}_{container}_migration",
        "connectionDetails": {
            "type": "AzureDataLakeStorage",
            "creationMethod": "AzureDataLakeStorage",
            "parameters": [
                {"dataType": "Text", "name": "server", "value": f"https://{storage_account_name}.dfs.core.windows.net"},
                {"dataType": "Text", "name": "path", "value": f"/{container}"}
            ]
        },
        "privacyLevel": "Organizational",
        "credentialDetails": {
            "singleSignOnType": "None",
            "connectionEncryption": "NotEncrypted",
            "skipTestConnection": False,
            "credentials": {
                "credentialType": "Key",
                "key": storage_key
            }
        }
    }
    conn_resp = requests.post("https://api.fabric.microsoft.com/v1/connections",
                              headers=fab_headers, json=create_conn)
    if conn_resp.status_code == 201:
        selected_connection_id = conn_resp.json()["id"]

Requires: The caller must have Microsoft.Storage/storageAccounts/listKeys/action on the storage account (typically the Storage Account Key Operator Service Role or Contributor role).

Attempt 2: WorkspaceIdentity connection

If the Key approach fails (403 on listKeys), fall back to WorkspaceIdentity:

create_conn = {
    "connectivityType": "ShareableCloud",
    "displayName": f"{storage_account_name}_{container}_migration",
    "connectionDetails": {
        "type": "AzureDataLakeStorage",
        "creationMethod": "AzureDataLakeStorage",
        "parameters": [
            {"dataType": "Text", "name": "server", "value": f"https://{storage_account_name}.dfs.core.windows.net"},
            {"dataType": "Text", "name": "path", "value": f"/{container}"}
        ]
    },
    "privacyLevel": "Organizational",
    "credentialDetails": {
        "singleSignOnType": "None",
        "connectionEncryption": "NotEncrypted",
        "skipTestConnection": False,
        "credentials": {"credentialType": "WorkspaceIdentity"}
    }
}

Requires: The workspace's managed identity must have Storage Blob Data Reader RBAC on the storage account.

Why OAuth2 cannot be created via API

OAuth2 connections require interactive browser consent (authorization code grant flow). The Fabric Connections API explicitly rejects credentialType: "OAuth2" with "CredentialType input is not supported for this API".

Fallback: Manual Portal creation

If both programmatic approaches fail, display the following manual-setup guidance:

No ADLS connection could be created automatically.
Please create one manually:

1. Open Fabric Portal → Settings → Manage connections and gateways
   https://app.fabric.microsoft.com/connections
   (MSIT: https://msit.powerbi.com/connections)
2. Click '+ New' → Cloud → Azure Data Lake Storage Gen2
3. Server: https://{storageAccount}.dfs.core.windows.net
   Path: /{container}
   Authentication: OAuth2 (sign in with credentials that have Storage Blob Data Reader)
4. Re-run the migration script — it will discover and probe-test the new connection.

Documentation: https://learn.microsoft.com/fabric/data-engineering/lakehouse-shortcuts#create-a-shortcut

Common Connection Errors

Error	Cause	Fix
400 `"Stored Credential Operation - PowerBIEntityNotFound"`	Connection's OAuth token expired or was revoked	Re-authenticate the connection in Fabric Portal, or create a new one
400 `"Stored Credential"` (any variant)	Connection credential is invalid, expired, or was rotated	Re-authenticate the connection or create a new one with fresh credentials
403 on shortcut creation	Connection's identity lacks `Storage Blob Data Reader` on the storage account	Grant RBAC to the connection's identity
429 `Retry-After: {N}`	Fabric API rate limit — too many shortcut calls in quick succession	Wait `Retry-After` seconds, then retry the same request
Connection reset / `ConnectionError`	Network-level timeout during shortcut creation (large batch)	Retry with exponential backoff; check network connectivity
400 `"Required property 'connectionId' not found"`	Missing `connectionId` in the shortcut payload	Always include `connectionId` in the `adlsGen2` target

---

Step 5: Create OneLake Shortcuts

Shortcut Target Decision

Table Format	Table Type	Shortcut Location	Result
Delta	Managed	`Tables/{schema}/{tableName}`	Auto-registers in Lakehouse catalog
Delta	External	`Tables/{schema}/{tableName}`	Auto-registers in Lakehouse catalog
Parquet	Managed or External	`Files/{schema}/{tableName}`	Accessible via Spark; not auto-registered
CSV	Managed or External	`Files/{schema}/{tableName}`	Accessible via Spark; not auto-registered
JSON	Managed or External	`Files/{schema}/{tableName}`	Accessible via Spark; not auto-registered
ORC	Managed or External	`Files/{schema}/{tableName}`	Accessible via Spark; not auto-registered
Avro	Managed or External	`Files/{schema}/{tableName}`	Accessible via Spark; not auto-registered

Create Shortcut API

Endpoint: POST /v1/workspaces/{workspaceId}/items/{lakehouseId}/shortcuts

Note: The endpoint uses /items/{lakehouseId}/shortcuts, NOT /lakehouses/{lakehouseId}/shortcuts (which returns 404).

{
  "name": "{tableName}",
  "path": "Tables/{schemaName}",
  "target": {
    "type": "AdlsGen2",
    "adlsGen2": {
      "location": "https://{storageAccount}.dfs.core.windows.net",
      "subpath": "/{container}/{path/to/table}",
      "connectionId": "{connectionId}"
    }
  }
}

`connectionId` is required. Without it, the API returns 400 "Required property 'connectionId' not found". See Step 4b above for how to discover or create the connection.

Shortcut Target Path by Table Type

Managed tables: The data path is under the Synapse workspace warehouse directory:

abfss://{container}@{storage}.dfs.core.windows.net/synapse/workspaces/{workspace}/warehouse/{database}/{tableName}

Split this into:

location: https://{storage}.dfs.core.windows.net
subpath: /{container}/synapse/workspaces/{workspace}/warehouse/{database}/{tableName}

External tables: Use the storageDescriptor.source.location path directly:

abfss://{container}@{storage}.dfs.core.windows.net/{custom/path/to/table}

Split into:

location: https://{storage}.dfs.core.windows.net
subpath: /{container}/{custom/path/to/table}

Authentication for Shortcuts

Shortcut creation involves two separate authorization checks:

1. Fabric API authorization (your token): Must have Contributor/Admin role on the Fabric workspace to call the Shortcuts API 2. Storage data access (connection's credential): The connection's identity must have Storage Blob Data Reader (or higher) on the target ADLS Gen2 storage account/container

The connection credential — not the API caller's token — is what Fabric uses to read data from storage at runtime. See Step 4b for connection discovery.

Shortcut Granularity Strategy

Choose between per-table shortcuts or per-database shortcuts based on the table format:

Scenario	Strategy	Shortcuts Created	Catalog Registration
All Delta tables	Per-table under `Tables/{schema}/`	One per table	Automatic
All non-Delta tables	Per-database under `Files/{schema}/`	One per database (with tables)	Requires `CREATE TABLE USING {format}`
Mixed formats	Per-table for Delta → `Tables/`; per-database for non-Delta → `Files/`	Hybrid	Automatic for Delta; manual for non-Delta

Per-database shortcut (non-Delta): Instead of creating 111 individual shortcuts, create one shortcut per database pointing to the warehouse directory. All tables appear as subfolders.

{
  "name": "{databaseName}",
  "path": "Files/{schemaName}",
  "target": {
    "type": "AdlsGen2",
    "adlsGen2": {
      "location": "https://{storageAccount}.dfs.core.windows.net",
      "subpath": "/{container}/synapse/workspaces/{workspace}/warehouse/{database}",
      "connectionId": "{connectionId}"
    }
  }
}

This creates Files/{schema}/{databaseName}/ containing all table subfolders. Access via:

df = spark.read.parquet("Files/{schema}/{databaseName}/{tableName}")

When to use per-database shortcuts: When all tables in the database are non-Delta (parquet, CSV, etc.) and the tables share a common warehouse directory path. This reduces the number of shortcuts from N (tables) to M (databases with tables).

Shortcut Creation Error Cascade

When creating shortcuts in bulk (many tables across many databases), certain errors indicate that all remaining shortcuts will also fail. Abort early to avoid wasting API calls:

Error	Action	Rationale
403 (permission denied)	Abort all remaining shortcuts	The connection lacks `Storage Blob Data Reader` on the storage account — every subsequent shortcut to the same account will also fail
400 "Stored Credential"	Abort all remaining shortcuts	The connection's credential is invalid — no shortcut using this connection will succeed
ConnectionError / reset	Abort all remaining shortcuts	Network-level failure — likely a transient outage affecting all requests
429 (rate limit)	Wait `Retry-After` seconds, then retry	Transient — the same request will succeed after the cooldown period
409 (conflict)	Continue — treat as success	Shortcut already exists (idempotent re-run)
404 (not found)	Continue — skip this table	The source path doesn't exist; other tables may still be valid

Implementation pattern:

abort_shortcuts = False
for db_name, tables in db_inventory.items():
    if abort_shortcuts:
        # Record remaining as skipped
        break
    for table in tables:
        resp = create_shortcut(table, connection_id)
        if resp.status_code in (200, 201, 409):
            # Success or already exists — continue
            pass
        elif resp.status_code == 403:
            print(f"403 — connection lacks storage access. Aborting remaining shortcuts.")
            abort_shortcuts = True
            break
        elif resp.status_code == 400 and "Stored Credential" in resp.text:
            print(f"400 — connection credential invalid. Aborting remaining shortcuts.")
            abort_shortcuts = True
            break
        elif resp.status_code == 429:
            retry_after = int(resp.headers.get("Retry-After", 30))
            time.sleep(retry_after)
            resp = create_shortcut(table, connection_id)  # retry once
        # else: log and continue

Why abort on 403/400? These are not transient — they indicate a systemic permission or credential issue that affects every shortcut using the same connection. Continuing wastes API quota and produces N identical error messages. Fix the root cause, then re-run.

---

Step 6: Handle Non-Delta Tables

For non-Delta tables (Parquet, CSV, JSON, ORC, Avro), the shortcut is created under Files/ (Step 5) and data is accessible via Spark. However, these tables are not auto-registered in the Lakehouse catalog. Choose one of the two options below.

Option A: Convert to Delta (Recommended)

Converts the data to Delta format for full catalog registration, SQL endpoint access, and Power BI Direct Lake support.

# Read from the shortcut path
df = spark.read.format("{originalFormat}").load("Files/{schemaName}/{tableName}")

# Write as Delta to the Tables section
df.write.format("delta").mode("overwrite").saveAsTable("{schemaName}.{tableName}")

For partitioned tables, preserve partition columns:

df = spark.read.format("parquet").load("Files/{schemaName}/{tableName}")
df.write.format("delta") \
    .partitionBy("{partitionCol}") \
    .mode("overwrite") \
    .saveAsTable("{schemaName}.{tableName}")

Note: This creates a physical copy of the data in Delta format. The original shortcut under Files/ remains as a reference.

Advantages: Full Lakehouse catalog registration, SQL endpoint queries, Power BI Direct Lake, V-Order optimization, ACID transactions, time travel.

Option B: Retain Original Format

Keep tables in their legacy format (Parquet, ORC, etc.) under Files/. This avoids data duplication and preserves the original file layout.

Register in the catalog — create an external table definition so Spark SQL and the SQL endpoint can query the data without converting it:

-- Register a non-Delta table pointing to the shortcut path
CREATE TABLE IF NOT EXISTS {schemaName}.{tableName}
USING {format}
LOCATION 'Files/{schemaName}/{tableName}';

For example:

-- Parquet table
CREATE TABLE IF NOT EXISTS sales.historical_orders
USING PARQUET
LOCATION 'Files/sales/historical_orders';

-- ORC table
CREATE TABLE IF NOT EXISTS analytics.legacy_events
USING ORC
LOCATION 'Files/analytics/legacy_events';

For Hive-style partitioned tables (year=2024/month=01/ directory structure), you must also recover partition metadata after creating the table:

-- Register the partitioned table
CREATE TABLE IF NOT EXISTS {schemaName}.{tableName}
USING PARQUET
PARTITIONED BY ({partitionCols})
LOCATION 'Files/{schemaName}/{tableName}';

-- Recover partitions from directory structure
MSCK REPAIR TABLE {schemaName}.{tableName};

For example:

CREATE TABLE IF NOT EXISTS sales.transactions
USING PARQUET
PARTITIONED BY (year INT, month INT)
LOCATION 'Files/sales/transactions';

MSCK REPAIR TABLE sales.transactions;

Why `MSCK REPAIR TABLE`? Non-Delta tables rely on HMS partition metadata for partition pruning. Unlike Delta (where _delta_log is self-describing), Hive-style partitioned tables need the catalog to know which partitions exist. Without this step, queries read all files instead of pruning to the relevant partitions — causing full scans and poor performance.

After initial registration, if new partitions are added to the source data (e.g., Synapse continues writing year=2025/month=05/), re-run MSCK REPAIR TABLE to pick up the new partitions.

Limitations of retaining original format:

Capability	Delta (Option A)	Original Format (Option B)
Lakehouse Explorer UI (Tables section)	Yes	Yes (after `CREATE TABLE`)
SQL endpoint queries	Yes	Yes (after `CREATE TABLE`)
Power BI Direct Lake	Yes	No — requires Delta
V-Order optimization	Yes	No
ACID transactions / time travel	Yes	No
Partition pruning	Automatic	Requires `MSCK REPAIR TABLE`
Data duplication	Yes (new copy)	No (zero-copy via shortcut)

Decision Guide

Non-Delta table in Synapse:
├── Consumed by Power BI Direct Lake?
│   └── YES → Option A (convert to Delta)
├── Need ACID / time travel / merge operations?
│   └── YES → Option A (convert to Delta)
├── Large table, want to avoid data duplication?
│   └── YES → Option B (retain original format)
├── Read-only / archival data?
│   └── YES → Option B (retain original format)
└── Default recommendation
    └── Option A (convert to Delta) — Fabric is Delta-first

---

Step 7: Validate

After migration, verify:

1. Lakehouse catalog: Check that Delta tables appear in the Lakehouse Explorer UI under Tables 2. SQL endpoint: Query migrated tables via the SQL endpoint to confirm schema and data 3. Row counts: Compare row counts between Synapse and Fabric for each table 4. Shortcut health: Verify shortcuts are accessible (notebookutils.fs.ls("Tables/{schema}/"))

---

Object Type Reference

Full Inventory of Synapse Lake Database Object Types

Object Type	API Artifact Type	Fabric Support	Migration Action
Delta table (managed)	`TABLE` (formatType: delta, tableType: MANAGED)	Lakehouse Tables (shortcut)	Shortcut → auto-registers
Delta table (external)	`TABLE` (formatType: delta)	Lakehouse Tables (shortcut)	Shortcut → auto-registers
Parquet table (managed)	`TABLE` (formatType: parquet, tableType: MANAGED)	Lakehouse Files (shortcut)	Shortcut under Files/; Option A (Delta conversion) or Option B (retain + `CREATE TABLE` + `MSCK REPAIR TABLE`)
Parquet table (external)	`TABLE` (formatType: parquet)	Lakehouse Files (shortcut)	Shortcut under Files/; Option A or Option B
CSV table	`TABLE` (formatType: csv/textfile)	Lakehouse Files (shortcut)	Shortcut under Files/; Option A (recommended) or Option B
JSON table	`TABLE` (formatType: json)	Lakehouse Files (shortcut)	Shortcut under Files/; Option A (recommended) or Option B
ORC table	`TABLE` (formatType: orc)	Lakehouse Files (shortcut)	Shortcut under Files/; Option A or Option B
Avro table	`TABLE` (formatType: avro)	Lakehouse Files (shortcut)	Shortcut under Files/; Option A or Option B
View	`VIEW`	Not directly migratable	Extract SQL; recreate as Spark SQL `CREATE VIEW` or SQL endpoint view
Schema	`SCHEMA`	Lakehouse Schema	`CREATE SCHEMA IF NOT EXISTS {name}`
Function (UDF)	`FUNCTION`	Not migratable via API	Recreate manually via `spark.udf.register()` or `CREATE FUNCTION`
Partition Info	`PARTITIONINFO`	Preserved via shortcut	Delta: automatic. Non-Delta: directory structure preserved
Relationship	`RELATIONSHIP`	No Fabric equivalent	Document for reference only

Decision Tree

For each table in Synapse Lake Database:
├── Is format Delta?
│   ├── YES → Create shortcut under Tables/{schema}/ → auto-registers in catalog ✅
│   └── NO (Parquet/CSV/JSON/ORC/Avro)
│       ├── Option A: Convert to Delta → Shortcut under Files/ → Spark read → write as Delta to Tables/
│       └── Option B: Retain format → Shortcut under Files/ → CREATE TABLE USING {format} → MSCK REPAIR TABLE (if partitioned)
│
├── Is table Managed?
│   ├── YES → Shortcut target = Synapse warehouse directory path
│   │         (ensure Fabric identity has Storage Blob Data Reader on Synapse primary storage)
│   └── NO (External) → Shortcut target = original ADLS Gen2 path
│
└── Is it a View/Function/Relationship?
    ├── View → Extract SQL, recreate in Fabric
    ├── Function → Recreate via spark.udf.register()
    └── Relationship → Document only (no Fabric equivalent)

---

Step 8: (Optional) Validate Before Proceeding to Phase 2

Run these checks before migrating Notebooks. Notebooks rely on Lakehouses being healthy — missing shortcuts or unregistered tables cause immediate runtime failures.

Check	How	Pass Criteria
Shortcut health	`notebookutils.fs.ls("Tables/{schema}/")` and `Files/{schema}/`	All shortcuts resolve; no `PathNotFound` errors
Row counts	Compare `SELECT COUNT(*)` on Synapse vs. Fabric for each table	Counts match (or are within acceptable tolerance for streaming tables)
Schema comparison	Compare column names, types, and order between Synapse and Fabric	Exact match
Non-Delta registration	`SHOW TABLES IN {schema}`	All Option B tables appear in catalog
Partition coverage	`SHOW PARTITIONS {schema}.{table}` for Option B partitioned tables	Partition count matches Synapse

Do not proceed to Phase 2 until all shortcuts are healthy and row counts match. A notebook that reads from a missing or broken shortcut will fail silently or produce wrong results.

See validation-testing.md → V2: Data Validation for detailed scripts.

Library Compatibility: Synapse Spark 3.5 vs. Fabric Runtime 1.3

Last validated: April 2026 against Fabric Runtime 1.3 (Spark 3.5, Delta 3.2). Library versions change with runtime updates — re-verify after Fabric Runtime upgrades.

Identify and resolve library gaps before running migrated notebooks to prevent ImportError, ClassNotFoundException, and silent behavioral differences.

---

Quick Compatibility Check

Run this workflow to identify which gaps actually affect your code:

1. Export Synapse library list        →  pip freeze (in a Synapse notebook cell)
2. Export custom pool libraries       →  Synapse Studio → Manage → Spark Pools → {pool} → Packages
3. Search notebooks for imports       →  grep -r "import\|from .* import" across all .py / .ipynb files
4. Cross-reference against gap tables →  Only libraries that appear in BOTH your code AND the tables below need action
5. Pre-install in Fabric Environment  →  Add to environment.yml or upload as custom library before running notebooks

Reference manifests: For a full line-by-line comparison of every built-in library, see the microsoft/synapse-spark-runtime GitHub repo. Compare Fabric/Runtime 1.3 vs Synapse/spark3.5 release notes.

---

Python Libraries Missing from Fabric Runtime 1.3

40 Python libraries present in Synapse Spark 3.5 are absent from Fabric Runtime 1.3.

Category	Libraries	Action
CUDA / GPU (10 libs)	`libcublas`, `libcufft`, `libcufile`, `libcurand`, `libcusolver`, `libcusparse`, `libnpp`, `libnvfatbin`, `libnvjitlink`, `libnvjpeg`	Migration blocker — Fabric does not support GPU pools. Refactor to CPU-based alternatives or keep on Synapse.
HTTP / API clients	`httpx`, `httpcore`, `h11`, `google-auth`, `jmespath`	Install via Environment: `pip install httpx google-auth jmespath`
ML / Interpretability	`interpret`, `interpret-core`	Install via Environment: `pip install interpret`
Data serialization	`marshmallow`, `jsonpickle`, `frozendict`, `fixedint`	Install via Environment if needed: `pip install marshmallow jsonpickle`
Logging / Telemetry	`fluent-logger`, `humanfriendly`, `library-metadata-cooker`, `impulse-python-handler`	`fluent-logger`: install if used. Others are Synapse-internal — likely not needed in user code.
Jupyter internals	`jupyter-client`, `jupyter-core`, `jupyter-ui-poll`, `jupyterlab-widgets`, `ipython-pygments-lexers`	Fabric manages Jupyter infrastructure internally. Generally not needed in user code.
System / C libraries	`libgcc`, `libstdcxx`, `libgrpc`, `libabseil`, `libexpat`, `libnsl`, `libzlib`	Low-level system libs. Usually not imported directly. Only install if you have C extensions that depend on them.
File / concurrency	`filelock`, `fsspec`, `knack`	Install via Environment if used: `pip install filelock fsspec`

---

Java/Scala Libraries Missing from Fabric Runtime 1.3

Library	Synapse Version	Action
`azure-cosmos-analytics-spark`	2.2.5	Install as a custom JAR in the Fabric Environment if your Spark jobs use the Cosmos DB analytics connector.
`junit-jupiter-params`	5.5.2	Test-only library. Not needed in production notebooks.
`junit-platform-commons`	1.5.2	Test-only library. Not needed in production notebooks.

---

R Libraries

Near-identical. Only 1 gap:

Library	Synapse	Fabric	Action
`lightgbm`	4.6.0	Not included	Install via Environment if needed
`FabricTelemetry`	Not included	1.0.2	Fabric-internal — no action

---

Notable Version Differences (Python)

68 Python libraries exist on both platforms but with different versions. Most are minor, but 17 have major version jumps that can cause behavioral changes or breakage:

Library	Fabric Version	Synapse Version	Risk	Impact
`xgboost` (`libxgboost`)	2.0.3	3.0.1	High	XGBoost API changes between v2 and v3. Test all model training/prediction code.
`flask`	2.2.5	3.0.3	High	Flask 3.x has breaking changes. If serving Flask APIs from notebooks, test thoroughly.
`libprotobuf`	3.20.3	4.25.3	High	Protobuf 4.x has breaking changes for custom `.proto` definitions.
`libpq`	12.17	17.4	Medium	PostgreSQL client library. Major version jump — test DB connections.
`libgcc-ng` / `libstdcxx-ng`	11.2.0	15.2.0	Medium	GCC runtime. May affect C extension compatibility.
`lxml`	4.9.3	5.3.0	Medium	Minor API changes. Test XML parsing workflows.
`markupsafe`	2.1.3	3.0.2	Low	MarkupSafe 3.x drops Python 3.7 support but API is compatible with 3.8+.

Direction: Synapse generally ships newer versions of system-level libraries (GCC, protobuf, libpq) while Fabric ships newer versions of data/ML libraries. If you need a specific version, pin it in your Fabric Environment configuration.

Version Pinning Example

If a notebook depends on XGBoost 3.x behavior (available in Synapse but not the default in Fabric):

# environment.yml — pin in your Fabric Environment
dependencies:
  - pip:
    - xgboost==3.0.1  # Fabric ships 2.0.3; pin to match Synapse version

---

Pre-Migration Audit Script

Run this in a Synapse notebook cell to generate a dependency diff:

import subprocess, json

# Get installed packages
result = subprocess.run(["pip", "freeze"], capture_output=True, text=True)
synapse_pkgs = dict(line.split("==") for line in result.stdout.strip().split("\n") if "==" in line)

# Known Fabric RT 1.3 missing packages (from gap tables above)
fabric_missing = {
    "httpx", "httpcore", "h11", "google-auth", "jmespath",
    "interpret", "interpret-core",
    "marshmallow", "jsonpickle", "frozendict", "fixedint",
    "fluent-logger", "humanfriendly",
    "filelock", "fsspec", "knack"
}

# Check which missing packages are actually installed in this Synapse pool
gaps = {pkg: ver for pkg, ver in synapse_pkgs.items() if pkg.lower() in fabric_missing}
if gaps:
    print("⚠ Libraries to pre-install in Fabric Environment:")
    for pkg, ver in sorted(gaps.items()):
        print(f"  {pkg}=={ver}")
else:
    print("✅ No missing-library gaps detected for this pool.")

Then search notebooks for actual usage:

# Search all notebooks for imports of gap libraries
grep -rn "import httpx\|import google.auth\|import interpret\|import marshmallow\|import jsonpickle\|import fsspec\|import filelock" *.py *.ipynb

---

Resolution Workflow

For each gap library found in your code:
├── GPU library (libcu*, libnv*)
│   └── MIGRATION BLOCKER — refactor to CPU or keep on Synapse
├── Installable via pip/conda
│   └── Add to Fabric Environment environment.yml → publish
├── Custom JAR (azure-cosmos-analytics-spark)
│   └── Upload JAR to Fabric Environment custom libraries → publish
└── Version difference (e.g., xgboost 2.x vs 3.x)
    └── Pin specific version in environment.yml OR test with Fabric default

After resolving all gaps, the Fabric Environment from Phase 0 should include all required libraries before running Phase 2/3 notebooks and SJDs.

Synapse `mssparkutils` → Fabric `notebookutils` API Mapping

Side-by-side reference for porting mssparkutils calls to notebookutils in Microsoft Fabric.

Most mssparkutils APIs have identical signatures in notebookutils. The primary change is the import/namespace. Differences are explicitly noted.

---

Import Change

# Synapse (remove this)
from notebookutils import mssparkutils  # or: import mssparkutils

# Fabric (use this — no import needed; notebookutils is pre-instantiated)
# notebookutils is available globally in Fabric notebooks

---

File System (`fs`)

`mssparkutils`	`notebookutils`	Notes
`mssparkutils.fs.ls(path)`	`notebookutils.fs.ls(path)`	Returns list of `FileInfo` objects
`mssparkutils.fs.cp(src, dest, recurse=False)`	`notebookutils.fs.cp(src, dest, recurse=False)`	Identical
`mssparkutils.fs.mv(src, dest, recurse=False)`	`notebookutils.fs.mv(src, dest, recurse=False)`	Identical
`mssparkutils.fs.rm(path, recurse=False)`	`notebookutils.fs.rm(path, recurse=False)`	Identical
`mssparkutils.fs.mkdirs(path)`	`notebookutils.fs.mkdirs(path)`	Identical
`mssparkutils.fs.put(path, content, overwrite=False)`	`notebookutils.fs.put(path, content, overwrite=False)`	Identical
`mssparkutils.fs.head(path, maxBytes=65536)`	`notebookutils.fs.head(path, maxBytes=65536)`	Identical
`mssparkutils.fs.append(path, content, createFileIfNotExists)`	`notebookutils.fs.append(path, content, createFileIfNotExists)`	Identical
`mssparkutils.fs.help()`	`notebookutils.fs.help()`	Identical

Path Format

# Synapse: ADLS Gen2 path
mssparkutils.fs.ls("abfss://container@storageaccount.dfs.core.windows.net/path")

# Fabric: OneLake path
notebookutils.fs.ls("abfss://workspacename@onelake.dfs.fabric.microsoft.com/lakehouse.Lakehouse/Files/path")

# Fabric: also works with relative Lakehouse path (within attached Lakehouse)
notebookutils.fs.ls("Files/path")

---

Credentials (`credentials`)

`mssparkutils`	`notebookutils`	Notes
`mssparkutils.credentials.getToken(audience)`	`notebookutils.credentials.getToken(audience)`	Identical
`mssparkutils.credentials.getSecret(keyVaultUrl, secretName)`	`notebookutils.credentials.getSecret(keyVaultUrl, secretName)`	Identical
`mssparkutils.credentials.getConnectionStringOrCreds(linkedServiceName)`	Not available in Fabric	Linked Services do not exist in Fabric — replace with Data Connection or Key Vault secret

# Synapse — read from Linked Service (NOT available in Fabric)
conn_str = mssparkutils.credentials.getConnectionStringOrCreds("MyLinkedService")

# Fabric — read secret from Key Vault
secret = notebookutils.credentials.getSecret(
    "https://mykeyvault.vault.azure.net/",
    "my-connection-string"
)

---

Notebook (`notebook`)

`mssparkutils`	`notebookutils`	Notes
`mssparkutils.notebook.run(name, timeout, args)`	`notebookutils.notebook.run(name, timeout, args)`	Identical
`mssparkutils.notebook.exit(value)`	`notebookutils.notebook.exit(value)`	Identical
`mssparkutils.notebook.help()`	`notebookutils.notebook.help()`	Identical

# Run a child notebook and receive its output value
result = notebookutils.notebook.run(
    "child_notebook_name",
    timeout=300,
    arguments={"input_param": "value"}
)

---

Runtime (`runtime` / `env`)

Namespace change: Synapse used mssparkutils.env; Fabric uses notebookutils.runtime.

`mssparkutils`	`notebookutils`	Notes
`mssparkutils.env.getJobId()`	`notebookutils.runtime.context["jobId"]`	Context dict replaces individual env getters
`mssparkutils.env.getWorkspaceName()`	`notebookutils.runtime.context["workspaceName"]`
`mssparkutils.env.getUserId()`	`notebookutils.runtime.context["userId"]`
`mssparkutils.env.getUserName()`	`notebookutils.runtime.context["userName"]`
`mssparkutils.env.getNotebookPath()`	`notebookutils.runtime.context["notebookPath"]`

# Synapse
workspace = mssparkutils.env.getWorkspaceName()

# Fabric
ctx = notebookutils.runtime.context
workspace = ctx["workspaceName"]
notebook_path = ctx["notebookPath"]
job_id = ctx["jobId"]

---

Lakehouse (`lakehouse`)

Fabric-only — no equivalent in Synapse.

# Get the default Lakehouse attached to the notebook
lh = notebookutils.lakehouse.get()
print(lh.id, lh.name, lh.workspaceId)

# List all Lakehouses in the workspace
all_lakehouses = notebookutils.lakehouse.list()

# Get a specific Lakehouse
lh = notebookutils.lakehouse.get(name="my_lakehouse")

---

Connections (`connection`)

Fabric-only — replaces Synapse Linked Services for external data source access.

# Get a connection by name (configured in Fabric Data Connections)
conn = notebookutils.connection.get("MyConnectionName")

# Use the connection token for downstream API calls
token = notebookutils.connection.getConnectionToken("MyConnectionName")

See connectivity-migration.md for how to migrate Synapse Linked Services to Fabric Data Connections.

Related skills

Azure AiIntegrates Azure AI Content Safety, Document Intelligence, Speech, and Search services into Java-based agents and applications.479k1.3k

Azure PrepareGenerate the exact Azure infrastructure files, Dockerfiles, and azure.yaml configuration needed before deploying any new or modernized application.479k1.3k

Azure StorageConnect agents and applications to Azure Blob Storage, File Shares, Queues, Tables, and Data Lake without leaving the coding environment.478k1.3k

Appinsights InstrumentationAutomatically instrument web applications running on Azure App Service with Application Insights for observability without manual configuration.478k1.3k

Azure Resource LookupInstantly list, query, and discover any Azure resources across subscriptions without leaving the agent chat.478k1.3k

Azure AigatewayConfigure Azure API Management as a secure, governed gateway for routing traffic to LLMs, MCP servers, and agent tools.478k1.3k

FAQ

What replaces Synapse Linked Services in Fabric?

Data Connections for external sources and OneLake Shortcuts for storage mounts; there is no direct linked service REST equivalent.

How should agents load synapse-migration references?

Read only the resource file for the active migration phase instead of loading all references upfront.

Is synapse-migration safe to install?

Review the Security Audits panel on this page before installing in production.

Cloud & Infrastructurepipelinesetl

About

Synapse Migration by the numbers

synapse-migration capabilities & compatibility

What synapse-migration says it does

Add your badge

How do I migrate Synapse Spark notebooks, pools, and linked services to Fabric programmatically?

Who is it for?

When should I use this skill?

What you get

Files

Synapse Analytics → Microsoft Fabric Migration

Prerequisite Knowledge

Table of Contents

Context Loading Guide

API-Driven Migration Workflow

Authentication

Migration Phases (Execute in Order)

REST API Quick Reference

Migration Workload Map

Decision Tree: Which Fabric Spark Workload?

T-SQL & Spark Configuration Differences

Capacity Sizing Reference

Must / Prefer / Avoid

MUST DO

PREFER

AVOID

Examples

Feature Parity Reference

Migration Gotchas — Quick Reference

Post-Migration: What's Next

Agentic Exploration Workflow

Companion Skill Cross-References

Variable Library for Environment Promotion

Capacity Sizing Reference

Synapse Spark Pool → Fabric Capacity Mapping

Fabric Capacity SKU Quick Reference

Sizing Decision Guide

Cost Model Comparison

Synapse → Fabric Code Patterns

Spark Notebook: Import and Session Setup

Reading Data: ADLS Path → OneLake Path

Writing Data to Delta Lake

Credentials: Linked Service → Key Vault Secret

Environment Context

Child Notebook Execution

Dedicated SQL Pool DDL → Fabric Warehouse

Bulk Load: PolyBase → COPY INTO

File System Operations

Spark Catalog API — Unsupported Methods

Database Methods

Function Methods

Quick Reference Table

Spark Configuration (%%configure)

Synapse Connectivity Migration — Linked Services → Fabric Data Connections & OneLake Shortcuts

Decision Guide: What Replaces a Linked Service?

OneLake Shortcut: Replacing ADLS Gen2 Linked Services

In Fabric Portal (UI)

Via REST API

Accessing Shortcut Data in Notebooks

Fabric Data Connection: Replacing External Database Linked Services

Creating a Data Connection (Portal)

Using a Data Connection in a Notebook

Key Vault Secret Migration

Integration Runtime → On-Premises Data Gateway

Pipeline Connectivity: Synapse Dataset → Fabric Pipeline Source/Sink

Connector-Specific Refactoring — Kusto, Cosmos DB, Token Library, ADLS OAuth

Azure Data Explorer (Kusto) Connector

Reading from Kusto

Writing to Kusto

Cosmos DB Connector (OLTP)

Reading from Cosmos DB

Writing to Cosmos DB

Cosmos DB Connector — Spark Config Style

ADLS Gen2 OAuth — LinkedServiceBasedTokenProvider → ClientCredsTokenProvider

Python

Scala

Token Library (Synapse-only)

Token Acquisition

Linked Service Property Extraction

Secret Retrieval via Linked Service

Spark Configuration (`%%configure`)

`spark.read.synapsesql()` — Synapse SQL Connector

Attempt 1: Key-based connection (via ARM `listKeys`)