
Amee Joshi Data Engineering Portfolio
Use a reference Azure data-engineering portfolio to copy Medallion, lakehouse, and ETL patterns when designing your own analytics platform.
Install
npx skills add https://github.com/aradotso/data-skills --skill amee-joshi-data-engineering-portfolioWhat is this skill?
- Medallion Architecture Bronze–Silver–Gold reference implementations
- Azure stack: ADF, ADLS Gen2, Databricks, Synapse Analytics
- Delta Lake lakehouse patterns with dimensional modeling and SCD Type 1 & 2
- Metadata-driven ingestion frameworks and incremental ETL/ELT design
- Analytics-ready datasets with Power BI and Tableau reporting examples
Adoption & trust: 1 installs on skills.sh; 1 GitHub stars; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).
Recommended Skills
Azure Deploymicrosoft/azure-skills
Azure Preparemicrosoft/azure-skills
Azure Storagemicrosoft/azure-skills
Azure Validatemicrosoft/azure-skills
Appinsights Instrumentationmicrosoft/azure-skills
Azure Resource Lookupmicrosoft/azure-skills
Journey fit
Primary fit
Build/backend is the canonical shelf because the portfolio demonstrates implementation patterns for pipelines, modeling, and lakehouse storage. Backend covers data platform engineering rather than frontend or agent-tooling concerns.
Common Questions / FAQ
Is Amee Joshi Data Engineering Portfolio safe to install?
skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.
SKILL.md
READMESKILL.md - Amee Joshi Data Engineering Portfolio
# Amee Joshi Data Engineering Portfolio > Skill by [ara.so](https://ara.so) — Data Skills collection. This portfolio showcases production-grade data engineering patterns and architectures for building scalable, cloud-native data platforms. It demonstrates end-to-end solutions covering data ingestion, transformation, modeling, and analytics using Azure services, Databricks, SQL Server, and BI tools. ## What This Portfolio Demonstrates This is a reference collection showing: - **Medallion Architecture (Bronze-Silver-Gold)** implementations - **Azure cloud data platforms** (ADF, ADLS Gen2, Databricks, Synapse Analytics) - **Data lakehouse patterns** with Delta Lake - **Dimensional modeling** (Star Schema, SCD Type 1 & 2) - **Metadata-driven ingestion frameworks** - **Analytics-ready datasets** for BI consumption - **ETL/ELT pipeline design** with incremental loading - **Power BI and Tableau** reporting solutions ## Key Portfolio Projects ### 1. Azure Databricks Retail Lakehouse **Repository:** `azure-databricks-end-to-end-retail-lakehouse` **Pattern:** Enterprise Medallion Architecture with Delta Lake **Architecture:** ``` Bronze (Raw) → Silver (Cleansed) → Gold (Analytics-Ready) ``` **Key Implementation Concepts:** ```python # Bronze Layer - Raw Ingestion from pyspark.sql import SparkSession from delta.tables import DeltaTable # Ingest raw data with metadata df_raw = (spark.read .format("parquet") .load(f"{bronze_path}/source_data/") .withColumn("ingestion_timestamp", current_timestamp()) .withColumn("source_file", input_file_name()) ) # Write to Bronze Delta table (df_raw.write .format("delta") .mode("append") .option("mergeSchema", "true") .save(f"{bronze_path}/retail_transactions") ) ``` ```python # Silver Layer - Data Quality & Transformation from pyspark.sql.functions import col, when, trim, upper # Cleanse and standardize df_silver = (df_bronze .filter(col("transaction_id").isNotNull()) .withColumn("customer_name", trim(upper(col("customer_name")))) .withColumn("transaction_amount", when(col("transaction_amount") < 0, 0) .otherwise(col("transaction_amount"))) .dropDuplicates(["transaction_id"]) .select("transaction_id", "customer_id", "product_id", "transaction_amount", "transaction_date") ) # Write with schema enforcement (df_silver.write .format("delta") .mode("overwrite") .option("overwriteSchema", "false") .save(f"{silver_path}/transactions") ) ``` ```python # Gold Layer - SCD Type 2 Dimension def apply_scd_type2(target_table, source_df, key_columns, scd_columns): """ Implements Slowly Changing Dimension Type 2 """ from delta.tables import DeltaTable from pyspark.sql.functions import lit, current_timestamp # Prepare source with SCD metadata source_prepared = (source_df .withColumn("effective_date", current_timestamp()) .withColumn("end_date", lit(None).cast("timestamp")) .withColumn("is_current", lit(True)) ) # Read existing target target_delta = DeltaTable.forPath(spark, target_table) # Identify changes merge_condition = " AND ".join([f"target.{k} = source.{k}" for k in key_columns]) # Perform SCD Type 2 merge (target_delta.alias("target") .merge(source_prepared.alias("source"), merge_condition) .whenMatchedUpdate( condition = "target.