Fabric Lakehouse

Name: Fabric Lakehouse
Author: github

github/awesome-copilot

8.7k installs
37.1k repo stars
Updated July 28, 2026
github/awesome-copilot

fabric-lakehouse is an agent skill that Use this skill to get context about Fabric Lakehouse and its features for software systems and AI-powered functions. It offers descriptions of Lakehouse data co.

About

Use this skill to get context about Fabric Lakehouse and its features for software systems and AI-powered functions. It offers descriptions of Lakehouse data components, organization with schemas and shortcuts, access control, and code examples. This skill supports users in designing, building, and optimizing Lakehouse solutions using best practices. --- name: fabric-lakehouse description: 'Use this skill to get context about Fabric Lakehouse and its features for software systems and AI-powered functions. It offers descriptions of Lakehouse data components, organization with schemas and shortcuts, access control, and code examples. This skill supports users in designing, building, and optimizing Lakehouse solutions using best practices.' metadata: author: tedvilutis version: "1.0" --- # When to Use This Skill Use this skill when you need to: - Generate a document or explanation that includes definition and context about Fabric Lakehouse and its capabilities. - Design, build, and optimize Lakehouse solutions using best practices.

When to Use This Skill
Generate a document or explanation that includes definition and context about Fabric Lakehouse and its capabilities.
Design, build, and optimize Lakehouse solutions using best practices.
Understand the core concepts and components of a Lakehouse in Microsoft Fabric.
Learn how to manage tabular and non-tabular data within a Lakehouse.

Fabric Lakehouse by the numbers

8,676 all-time installs (skills.sh)
+24 installs in the week ending Jul 28, 2026 (Skillselion tracking)
Ranked #69 of 1,041 Cloud & Infrastructure skills by installs in the Skillselion catalog
Security screen: MEDIUM risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

At a glance

fabric-lakehouse capabilities & compatibility

Capabilities: when to use this skill · generate a document or explanation that includes · design, build, and optimize lakehouse solutions · understand the core concepts and components of a · learn how to manage tabular and non tabular data
Use cases: documentation

From the docs

What fabric-lakehouse says it does

--- name: fabric-lakehouse description: 'Use this skill to get context about Fabric Lakehouse and its features for software systems and AI-powered functions.

SKILL.md

It offers descriptions of Lakehouse data components, organization with schemas and shortcuts, access control, and code examples.

SKILL.md

- Design, build, and optimize Lakehouse solutions using best practices.

SKILL.md

- Understand the core concepts and components of a Lakehouse in Microsoft Fabric.

SKILL.md

npx skills add https://github.com/github/awesome-copilot --skill fabric-lakehouse

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/github/awesome-copilot/fabric-lakehouse.svg)](https://skillselion.com/skills/github/awesome-copilot/fabric-lakehouse)

Installs	8.7k
repo stars	★ 37.1k
Security audit	3 / 3 scanners passed
Last updated	July 28, 2026
Repository	github/awesome-copilot ↗

What problem does fabric-lakehouse solve for developers using this skill?

Use this skill to get context about Fabric Lakehouse and its features for software systems and AI-powered functions. It offers descriptions of Lakehouse data components, organization with schemas and

Who is it for?

Developers who need fabric-lakehouse patterns described in the cached skill documentation.

Skip if: Skip when docs are empty or the task is outside the skill's documented scope.

When should I use this skill?

Use this skill to get context about Fabric Lakehouse and its features for software systems and AI-powered functions. It offers descriptions of Lakehouse data components, organization with schemas and

What you get

Actionable workflows and conventions from SKILL.md for fabric-lakehouse.

Data Factory pipeline definitions
ETL/ELT activity configurations

By the numbers

References 180+ Data Factory connectors for external data sources
Covers 8+ pipeline activity types including Copy, Notebook, and Dataflow

Files

SKILL.mdMarkdownGitHub ↗

When to Use This Skill

Use this skill when you need to:

Generate a document or explanation that includes definition and context about Fabric Lakehouse and its capabilities.
Design, build, and optimize Lakehouse solutions using best practices.
Understand the core concepts and components of a Lakehouse in Microsoft Fabric.
Learn how to manage tabular and non-tabular data within a Lakehouse.

Fabric Lakehouse

Core Concepts

What is a Lakehouse?

Lakehouse in Microsoft Fabric is an item that gives users a place to store their tabular data (like tables) and non-tabular data (like files). It combines the flexibility of a data lake with the management capabilities of a data warehouse. It provides:

Unified storage in OneLake for structured and unstructured data
Delta Lake format for ACID transactions, versioning, and time travel
SQL analytics endpoint for T-SQL queries
Semantic model for Power BI integration
Support for other table formats like CSV, Parquet
Support for any file formats
Tools for table optimization and data management

Key Components

Delta Tables: Managed tables with ACID compliance and schema enforcement
Files: Unstructured/semi-structured data in the Files section
SQL Endpoint: Auto-generated read-only SQL interface for querying
Shortcuts: Virtual links to external/internal data without copying
Fabric Materialized Views: Pre-computed tables for fast query performance

Tabular data in a Lakehouse

Tabular data in a form of tables are stored under "Tables" folder. Main format for tables in Lakehouse is Delta. Lakehouse can store tabular data in other formats like CSV or Parquet, these formats are only available for Spark querying. Tables can be internal, when data is stored under "Tables" folder, or external, when only reference to a table is stored under "Tables" folder but the data itself is stored in a referenced location. Tables are referenced through Shortcuts, which can be internal (pointing to another location in Fabric) or external (pointing to data stored outside of Fabric).

Schemas for tables in a Lakehouse

When creating a lakehouse, users can choose to enable schemas. Schemas are used to organize Lakehouse tables. Schemas are implemented as folders under the "Tables" folder and store tables inside of those folders. The default schema is "dbo" and it can't be deleted or renamed. All other schemas are optional and can be created, renamed, or deleted. Users can reference a schema located in another lakehouse using a Schema Shortcut, thereby referencing all tables in the destination schema with a single shortcut.

Files in a Lakehouse

Files are stored under "Files" folder. Users can create folders and subfolders to organize their files. Any file format can be stored in Lakehouse.

Fabric Materialized Views

Set of pre-computed tables that are automatically updated based on a schedule. They provide fast query performance for complex aggregations and joins. Materialized views are defined using PySpark or Spark SQL and stored in an associated Notebook.

Spark Views

Logical tables defined by a SQL query. They do not store data but provide a virtual layer for querying. Views are defined using Spark SQL and stored in Lakehouse next to Tables.

Security

Item access or control plane security

Users can have workspace roles (Admin, Member, Contributor, Viewer) that provide different levels of access to Lakehouse and its contents. Users can also get access permission using sharing capabilities of Lakehouse.

Data access or OneLake Security

For data access use OneLake security model, which is based on Microsoft Entra ID (formerly Azure Active Directory) and role-based access control (RBAC). Lakehouse data is stored in OneLake, so access to data is controlled through OneLake permissions. In addition to object-level permissions, Lakehouse also supports column-level and row-level security for tables, allowing fine-grained control over who can see specific columns or rows in a table.

Lakehouse Shortcuts

Shortcuts create virtual links to data without copying:

Types of Shortcuts

Internal: Link to other Fabric Lakehouses/tables, cross-workspace data sharing
ADLS Gen2: Link to ADLS Gen2 containers in Azure
Amazon S3: AWS S3 buckets, cross-cloud data access
Dataverse: Microsoft Dataverse, business application data
Google Cloud Storage: GCS buckets, cross-cloud data access

Performance Optimization

V-Order Optimization

For faster data read with semantic model enable V-Order optimization on Delta tables. This presorts data in a way that improves query performance for common access patterns.

Table Optimization

Tables can also be optimized using the OPTIMIZE command, which compacts small files into larger ones and can also apply Z-ordering to improve query performance on specific columns. Regular optimization helps maintain performance as data is ingested and updated over time. The Vacuum command can be used to clean up old files and free up storage space, especially after updates and deletes.

Lineage

The Lakehouse item supports lineage, which allows users to track the origin and transformations of data. Lineage information is automatically captured for tables and files in Lakehouse, showing how data flows from source to destination. This helps with debugging, auditing, and understanding data dependencies.

PySpark Code Examples

See PySpark code for details.

Getting data into Lakehouse

See Get data for details.

Data Factory Integration

Microsoft Fabric includes Data Factory for ETL/ELT orchestration:

180+ connectors for data sources
Copy activity for data movement
Dataflow Gen2 for transformations
Notebook activity for Spark processing
Scheduling and triggers

Pipeline Activities

Activity	Description
Copy Data	Move data between sources and Lakehouse
Notebook	Execute Spark notebooks
Dataflow	Run Dataflow Gen2 transformations
Stored Procedure	Execute SQL procedures
ForEach	Loop over items
If Condition	Conditional branching
Get Metadata	Retrieve file/folder metadata
Lakehouse Maintenance	Optimize and vacuum Delta tables

Orchestration Patterns

Pipeline: Daily_ETL_Pipeline
├── Get Metadata (check for new files)
├── ForEach (process each file)
│   ├── Copy Data (bronze layer)
│   └── Notebook (silver transformation)
├── Notebook (gold aggregation)
└── Lakehouse Maintenance (optimize tables)

---

Spark Configuration (Best Practices)

# Enable Fabric optimizations
spark.conf.set("spark.sql.parquet.vorder.enabled", "true")
spark.conf.set("spark.microsoft.delta.optimizeWrite.enabled", "true")

Reading Data

# Read CSV file
df = spark.read.format("csv") \
    .option("header", "true") \
    .option("inferSchema", "true") \
    .load("Files/bronze/data.csv")

# Read JSON file
df = spark.read.format("json").load("Files/bronze/data.json")

# Read Parquet file
df = spark.read.format("parquet").load("Files/bronze/data.parquet")

# Read Delta table
df = spark.read.table("my_delta_table")

# Read from SQL endpoint
df = spark.sql("SELECT * FROM lakehouse.my_table")

Writing Delta Tables

# Write DataFrame as managed Delta table
df.write.format("delta") \
    .mode("overwrite") \
    .saveAsTable("silver_customers")

# Write with partitioning
df.write.format("delta") \
    .mode("overwrite") \
    .partitionBy("year", "month") \
    .saveAsTable("silver_transactions")

# Append to existing table
df.write.format("delta") \
    .mode("append") \
    .saveAsTable("silver_events")

Delta Table Operations (CRUD)

# UPDATE
spark.sql("""
    UPDATE silver_customers
    SET status = 'active'
    WHERE last_login > '2024-01-01' -- Example date, adjust as needed
""")

# DELETE
spark.sql("""
    DELETE FROM silver_customers
    WHERE is_deleted = true
""")

# MERGE (Upsert)
spark.sql("""
    MERGE INTO silver_customers AS target
    USING staging_customers AS source
    ON target.customer_id = source.customer_id
    WHEN MATCHED THEN UPDATE SET *
    WHEN NOT MATCHED THEN INSERT *
""")

Schema Definition

from pyspark.sql.types import StructType, StructField, StringType, IntegerType, TimestampType, DecimalType

schema = StructType([
    StructField("id", IntegerType(), False),
    StructField("name", StringType(), True),
    StructField("email", StringType(), True),
    StructField("amount", DecimalType(18, 2), True),
    StructField("created_at", TimestampType(), True)
])

df = spark.read.format("csv") \
    .schema(schema) \
    .option("header", "true") \
    .load("Files/bronze/customers.csv")

SQL Magic in Notebooks

%%sql
-- Query Delta table directly
SELECT 
    customer_id,
    COUNT(*) as order_count,
    SUM(amount) as total_amount
FROM gold_orders
GROUP BY customer_id
ORDER BY total_amount DESC
LIMIT 10

V-Order Optimization

# Enable V-Order for read optimization
spark.conf.set("spark.sql.parquet.vorder.enabled", "true")

Table Optimization

%%sql
-- Optimize table (compact small files)
OPTIMIZE silver_transactions

-- Optimize with Z-ordering on query columns
OPTIMIZE silver_transactions ZORDER BY (customer_id, transaction_date)

-- Vacuum old files (default 7 days retention)
VACUUM silver_transactions

-- Vacuum with custom retention
VACUUM silver_transactions RETAIN 168 HOURS

Incremental Load Pattern

from pyspark.sql.functions import col

# Get last processed watermark
last_watermark = spark.sql("""
    SELECT MAX(processed_timestamp) as watermark 
    FROM silver_orders
""").collect()[0]["watermark"]

# Load only new records
new_records = spark.read.format("delta") \
    .table("bronze_orders") \
    .filter(col("created_at") > last_watermark)

# Merge new records
new_records.createOrReplaceTempView("staging_orders")
spark.sql("""
    MERGE INTO silver_orders AS target
    USING staging_orders AS source
    ON target.order_id = source.order_id
    WHEN MATCHED THEN UPDATE SET *
    WHEN NOT MATCHED THEN INSERT *
""")

SCD Type 2 Pattern

from pyspark.sql.functions import current_timestamp, lit

# Close existing records
spark.sql("""
    UPDATE dim_customer
    SET is_current = false, end_date = current_timestamp()
    WHERE customer_id IN (SELECT customer_id FROM staging_customer)
    AND is_current = true
""")

# Insert new versions
spark.sql("""
    INSERT INTO dim_customer
    SELECT 
        customer_id,
        name,
        email,
        address,
        current_timestamp() as start_date,
        null as end_date,
        true as is_current
    FROM staging_customer
""")

Related skills

Azure AiIntegrates Azure AI Content Safety, Document Intelligence, Speech, and Search services into Java-based agents and applications.479k1.3k

Azure PrepareGenerate the exact Azure infrastructure files, Dockerfiles, and azure.yaml configuration needed before deploying any new or modernized application.479k1.3k

Azure StorageConnect agents and applications to Azure Blob Storage, File Shares, Queues, Tables, and Data Lake without leaving the coding environment.478k1.3k

Appinsights InstrumentationAutomatically instrument web applications running on Azure App Service with Application Insights for observability without manual configuration.478k1.3k

Azure Resource LookupInstantly list, query, and discover any Azure resources across subscriptions without leaving the agent chat.478k1.3k

Azure AigatewayConfigure Azure API Management as a secure, governed gateway for routing traffic to LLMs, MCP servers, and agent tools.478k1.3k

How it compares

Pick fabric-lakehouse for Microsoft Fabric Lakehouse pipeline orchestration; use general Python ETL skills when the data platform is not Fabric-based.

About

Fabric Lakehouse by the numbers

fabric-lakehouse capabilities & compatibility

What fabric-lakehouse says it does

Add your badge

What problem does fabric-lakehouse solve for developers using this skill?

Who is it for?

When should I use this skill?

What you get

By the numbers

Files

When to Use This Skill

Fabric Lakehouse

Core Concepts

What is a Lakehouse?

Key Components

Tabular data in a Lakehouse

Schemas for tables in a Lakehouse

Files in a Lakehouse

Fabric Materialized Views

Spark Views

Security

Item access or control plane security

Data access or OneLake Security

Lakehouse Shortcuts

Types of Shortcuts

Performance Optimization

V-Order Optimization

Table Optimization

Lineage

PySpark Code Examples

Getting data into Lakehouse

Data Factory Integration

Pipeline Activities

Orchestration Patterns

Spark Configuration (Best Practices)

Reading Data

Writing Delta Tables

Delta Table Operations (CRUD)

Schema Definition

SQL Magic in Notebooks

V-Order Optimization

Table Optimization

Incremental Load Pattern

SCD Type 2 Pattern

Related skills

How it compares

FAQ

What does fabric-lakehouse do?

When should I use fabric-lakehouse?

Is fabric-lakehouse safe to install?

This week in AI coding