
Data Engineering Patterns Fabric Databricks
Look up production-ready Microsoft Fabric, Databricks, and PySpark patterns while designing lakehouse pipelines and governance.
Install
npx skills add https://github.com/aradotso/data-skills --skill data-engineering-patterns-fabric-databricksWhat is this skill?
- 600+ field-tested patterns across Microsoft Fabric and Azure Databricks
- 12-book style organization: Fabric pipelines, lakehouse, warehouse, Power BI; Databricks clusters, Delta, streaming, Uni
- Covers Delta Lake optimization, Auto Loader, Structured Streaming, Photon, and cost architecture
- Explicit triggers for lakehouse architecture, governance, and production best practices
Adoption & trust: 1 installs on skills.sh; 1 GitHub stars; trending (+100% hot-view momentum).
Recommended Skills
Paper Context Resolverlllllllama/ai-paper-reproduction-skill
Repo Intake And Planlllllllama/ai-paper-reproduction-skill
Env And Assets Bootstraplllllllama/ai-paper-reproduction-skill
Minimal Run And Auditlllllllama/ai-paper-reproduction-skill
Analyze Projectlllllllama/rigorpilot-skills
Ai Research Reproductionlllllllama/rigorpilot-skills
Journey fit
Primary fit
Build is canonical because the skill is a pattern library for implementing data platforms, not day-two ad copy or launch distribution. Backend subphase fits pipeline architecture, Delta Lake, orchestration, and warehouse layers rather than frontend or agent tooling.
SKILL.md
READMESKILL.md - Data Engineering Patterns Fabric Databricks
# Data Engineering Patterns - Fabric & Databricks > Skill by [ara.so](https://ara.so) — Data Skills collection. This skill provides access to 600+ field-tested data engineering patterns for Microsoft Fabric, Azure Databricks, and PySpark. These patterns cover everything from pipeline design and Delta Lake optimization to Unity Catalog governance and cost architecture. ## What This Project Provides A comprehensive collection of patterns organized into 12 books covering: **Microsoft Fabric (250 patterns):** - Pipelines and Data Factory - Lakehouse and PySpark - Warehouse and SQL - Power BI in Fabric - Architecture Patterns **Azure Databricks (350 patterns):** - Clusters and Compute - Delta Lake - Workflows and Orchestration - Structured Streaming and Auto Loader - Unity Catalog - Databricks SQL and Photon - Platform and Cost Architecture **PySpark:** - 88 concepts for production Spark across both platforms ## Installation Clone the repository to access all pattern PDFs: ```bash git clone https://github.com/ssanjaychandra123/data-engineering-patterns.git cd data-engineering-patterns ``` ## Repository Structure ``` data-engineering-patterns/ ├── Fabric Patterns/ │ ├── Fabric Engineering Patterns Book I - Pipelines and Data Factory.pdf │ ├── Fabric Engineering Patterns Book II - Lakehouse and PySpark.pdf │ ├── Fabric Engineering Patterns Book III - Warehouse and SQL.pdf │ ├── Fabric Engineering Patterns Book IV - Power BI in Fabric.pdf │ └── Fabric Engineering Patterns Book V - Architecture Patterns.pdf ├── Databricks Patterns/ │ ├── Azure Databricks Engineering Patterns Book I - Clusters and Compute.pdf │ ├── Azure Databricks Engineering Patterns Book II - Delta Lake.pdf │ ├── Azure Databricks Engineering Patterns Book III - Workflows and Orchestration.pdf │ ├── Azure Databricks Engineering Patterns Book IV - Structured Streaming and Auto Loader.pdf │ ├── Azure Databricks Engineering Patterns Book V - Unity Catalog.pdf │ ├── Azure Databricks Engineering Patterns Book VI - Databricks SQL and Photon.pdf │ └── Azure Databricks Engineering Patterns Book VII - Platform and Cost Architecture.pdf └── PySpark/ └── The PySpark Handbook for Fabric and Databricks.pdf ``` ## Key Pattern Categories ### Microsoft Fabric Patterns #### Pipeline and Data Factory Patterns Common patterns include: - Incremental data loading strategies - Pipeline retry and error handling - Parameter-driven pipeline design - Activity dependencies and control flow - Copy activity optimization - Metadata-driven frameworks Example incremental load pattern in Fabric Pipeline: ```python # Notebook activity in Fabric pipeline from datetime import datetime, timedelta # Get pipeline parameters watermark = spark.conf.get("pipeline.watermark") table_name = spark.conf.get("pipeline.tableName") # Read incremental data df = spark.read.format("delta") \ .load(f"abfss://source@storage.dfs.core.windows.net/{table_name}") \ .filter(f"modified_date > '{watermark}'") # Write to target df.write.format("delta") \ .mode("append") \ .option("mergeSchema", "true") \ .save(f"Tables/{table_name}") # Return new watermark new_watermark = df.agg({"modified_date": "max"}).collect()[0][0] mssparkutils.notebook.exit(str(new_watermark)) ``` #### Lakehouse and PySpark