
E2e Medallion Architecture
Design and implement Bronze/Silver/Gold lakehouse layers in Microsoft Fabric with PySpark, Delta Lake, and orchestrated pipelines.
Overview
e2e-medallion-architecture is an agent skill for the Build phase that implements Bronze/Silver/Gold Medallion lakehouse pipelines in Microsoft Fabric using PySpark, Delta Lake, and Fabric Pipelines.
Install
npx skills add https://github.com/microsoft/skills-for-fabric --skill e2e-medallion-architectureWhat is this skill?
- End-to-end Medallion (Bronze/Silver/Gold) patterns on Microsoft Fabric lakehouses
- PySpark and Delta Lake implementations with per-layer Spark tuning guidance
- Fabric Pipelines and notebooks to orchestrate Bronze→Silver→Gold flows with data quality enforcement
- Mandatory once-per-session check-updates before first use in a session
- Workspace and item discovery via list + JMESPath filtering when resolving Fabric IDs by name
- Mandatory once-per-session update check before first skill use in a session
Adoption & trust: 64 installs on skills.sh; 427 GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need a governed analytics lakehouse on Fabric but only have fragmented notebooks without layered quality rules or orchestration from raw ingest to gold metrics.
Who is it for?
Solo builders or small teams standardizing Fabric lakehouse analytics with Delta tables and pipeline-driven layer promotions.
Skip if: Teams not on Microsoft Fabric, simple one-table ETL, or frontend-only products with no lakehouse analytics path.
When should I use this skill?
User wants medallion architecture, bronze silver gold layers, e2e lakehouse pipeline, multi-layer lakehouse, or Fabric ingestion-to-analytics with data quality.
What do I get? / Deliverables
You get a multi-workspace medallion design with layer-specific lakehouses, Spark jobs, quality enforcement, and pipeline orchestration from Bronze through Gold.
- Medallion layer workspace layout
- Bronze/Silver/Gold notebook and pipeline flow
- Per-layer Spark configuration guidance
Recommended Skills
Journey fit
End-to-end medallion lakehouses are built while shaping the data platform that powers analytics—primary shelf is Build backend data engineering. Layered ingestion, transformation, and gold semantic models are backend data architecture, not frontend or launch distribution work.
How it compares
Fabric-native medallion implementation skill—not a generic dbt-only warehouse guide or a BI dashboard-only skill.
Common Questions / FAQ
Who is e2e-medallion-architecture for?
Indie and solo builders implementing analytics platforms on Microsoft Fabric who want agent-guided Bronze/Silver/Gold lakehouse setup with PySpark and pipelines.
When should I use e2e-medallion-architecture?
During Build backend data work when designing lakehouse layers, ingestion-to-gold flows, Spark configs per tier, or Fabric pipeline orchestration for medallion patterns.
Is e2e-medallion-architecture safe to install?
Review the Security Audits panel on this page; Fabric notebooks and pipelines can touch production data—run in dev workspaces, validate IAM and secrets handling before promoting jobs.
Workflow Chain
Requires first: check updates
SKILL.md
READMESKILL.md - E2e Medallion Architecture
> **Update Check — ONCE PER SESSION (mandatory)** > The first time this skill is used in a session, run the **check-updates** skill before proceeding. > - **GitHub Copilot CLI / VS Code**: invoke the `check-updates` skill. > - **Claude Code / Cowork / Cursor / Windsurf / Codex**: compare local vs remote package.json version. > - Skip if the check was already performed earlier in this session. > **CRITICAL NOTES** > 1. To find the workspace details (including its ID) from workspace name: list all workspaces and, then, use JMESPath filtering > 2. To find the item details (including its ID) from workspace ID, item type, and item name: list all items of that type in that workspace and, then, use JMESPath filtering # End-to-End Medallion Architecture ## Prerequisite Knowledge Read these companion documents — they contain the foundational context this skill depends on: - [COMMON-CORE.md](../../common/COMMON-CORE.md) — Fabric REST API patterns, authentication, token audiences, item discovery - [COMMON-CLI.md](../../common/COMMON-CLI.md) — `az rest`, `az login`, token acquisition, Fabric REST via CLI - [SPARK-AUTHORING-CORE.md](../../common/SPARK-AUTHORING-CORE.md) — Notebook deployment, lakehouse creation, job execution - [notebook-api-operations.md](../spark-authoring-cli/resources/notebook-api-operations.md) — **Required for notebook creation** — `.ipynb` structure requirements, cell format, `getDefinition`/`updateDefinition` workflow For Spark-specific optimization details, see [data-engineering-patterns.md](../spark-authoring-cli/resources/data-engineering-patterns.md). --- ## Architecture Overview **Medallion Architecture** is a data lakehouse pattern with three progressive layers: | Layer | Purpose | Optimization Profile | Use Case | |-------|---------|---------------------|----------| | **Bronze** (Raw) | Land raw data exactly as received | Write-optimized, append-only, partitioned by ingestion date | Audit trail, reprocessing, lineage | | **Silver** (Cleaned) | Deduplicated, validated, conformed data | Balanced read/write, partitioned by business date | Feature engineering, operational reporting | | **Gold** (Aggregated) | Pre-calculated metrics for analytics | Read-optimized (ZORDER, compaction), partitioned by month/year | Power BI reports, dashboards, ad-hoc analytics via SQL endpoint | - **Bronze**: Schema-on-read — flexible schema, Delta time travel supports audit and rollback - **Silver**: Schema enforcement — reject non-conforming writes; handle schema evolution with `mergeSchema` when sources change - **Gold**: Strict schema governance — curated, business-approved datasets only --- ## Must/Prefer/Avoid ### MUST DO - Create a **separate lakehouse** for each medallion layer (Bronze, Silver, Gold) - Add **metadata columns** in Bronze: ingestion timestamp, source file, batch ID - Apply **data quality rules** in the Bronze-to-Silver transformation (deduplication, null handling, range validation) - Use **Delta Lake format** for all medallion layer tables - Use **partition-aware overwrite** in Silver/Gold writes to avoid reprocessing unchanged data - Include **validation steps** after each layer (row counts, schema checks, anomaly detection) - Follow the **