
Retail Etl Pipeline Medallion
Stand up a Bronze/Silver/Gold retail warehouse with PySpark and SQL Server when CSVs from CRM/ERP need a single version of truth.
Install
npx skills add https://github.com/aradotso/data-skills --skill retail-etl-pipeline-medallionWhat is this skill?
- Medallion pattern: Bronze raw ingest → Silver cleansed → Gold business-ready
- Retail scenarios: shrinkage, recipe conversions, rebate tiers, multi-branch sales
- PySpark plus SQL Server stack with stored-procedure style transforms
- CSV sources from CRM/ERP into consolidated inventory and sales truth
- End-to-end pipeline design documentation for warehousing
Adoption & trust: 1 installs on skills.sh; 1 GitHub stars; trending (+100% hot-view momentum).
Recommended Skills
Paper Context Resolverlllllllama/ai-paper-reproduction-skill
Repo Intake And Planlllllllama/ai-paper-reproduction-skill
Env And Assets Bootstraplllllllama/ai-paper-reproduction-skill
Minimal Run And Auditlllllllama/ai-paper-reproduction-skill
Analyze Projectlllllllama/rigorpilot-skills
Ai Research Reproductionlllllllama/rigorpilot-skills
Journey fit
Primary fit
Medallion ETL is implementation work—layered ingestion and transforms belong in Build, not launch SEO or operate incident response. Backend/data engineering: stored procedures, Spark jobs, and warehouse layers—not frontend or agent-tooling packaging.
SKILL.md
READMESKILL.md - Retail Etl Pipeline Medallion
# Retail ETL Pipeline - Medallion Architecture Skill > Skill by [ara.so](https://ara.so) — Data Skills collection ## Overview The Retail ETL Pipeline project implements a complete data engineering solution for retail operations using the **Medallion Architecture** pattern (Bronze → Silver → Gold layers). It handles complex retail scenarios including: - **Inventory shrinkage** resolution - **Recipe conversions** for meat/poultry products - **Supplier rebate tier** tracking - **Multi-branch sales** consolidation - **Stock level** management across locations The pipeline processes raw CSV data from CRM/ERP systems through three progressive quality layers, ultimately delivering a "Single Version of Truth" for business intelligence. ## Architecture Layers ### Bronze Layer (Raw Ingestion) - Raw data ingestion from CSV files - Minimal transformation, preserving source format - Audit columns: `_loaded_at`, `_source_file` ### Silver Layer (Cleaned & Standardized) - Data type enforcement - Deduplication - Standardization (dates, currencies, product codes) - Business rule validation ### Gold Layer (Business-Ready Analytics) - Aggregated metrics - Calculated KPIs (inventory turnover, shrinkage %) - Dimensional models for BI tools ## Installation & Setup ### Prerequisites ```bash # Required tools - Docker & Docker Compose - SQL Server 2019+ - Python 3.8+ - PySpark 3.x - Apache Airflow (optional, for orchestration) ``` ### Infrastructure Setup ```bash # Clone the repository git clone https://github.com/EsraaSolimanMubarak/Retail-ETL-Pipeline.git cd Retail-ETL-Pipeline # Start SQL Server container docker-compose up -d # Wait for SQL Server to be ready docker logs -f retail-sql-server ``` ### Database Initialization ```bash # Connect to SQL Server and run schema setup sqlcmd -S localhost,1433 -U sa -P ${SQL_SERVER_PASSWORD} \ -i sql_scripts/00_create_database_and_schemas.sql ``` ## Key SQL Scripts Execution Order The pipeline consists of 13+ SQL scripts that must be executed sequentially: ```bash # 1. Create database and schemas sql_scripts/00_create_database_and_schemas.sql # 2. Bronze layer ingestion sql_scripts/01_bronze_products.sql sql_scripts/02_bronze_sales.sql sql_scripts/03_bronze_stock.sql # 3. Silver layer transformations sql_scripts/04_silver_products.sql sql_scripts/05_silver_sales.sql sql_scripts/06_silver_stock.sql # 4. Gold layer aggregations sql_scripts/07_gold_sales_summary.sql sql_scripts/08_gold_inventory_metrics.sql sql_scripts/09_gold_product_performance.sql # 5. Rebuild pipeline (if needed) sql_scripts/12_rebuild_inventory_pipeline_final_fix.sql ``` ## Core ETL Patterns ### Pattern 1: Bronze Layer Ingestion (Raw CSV → SQL) ```sql -- Example: Bronze Products Table CREATE TABLE bronze.products ( product_id INT, product_name NVARCHAR(255), category NVARCHAR(100), unit_price DECIMAL(10,2), supplier_id INT, _loaded_at DATETIME2 DEFAULT GETDATE(), _source_file NVARCHAR(500) ); -- Bulk insert from CSV BULK INSERT bronze.products FROM '/data_source/000.Hypermarket Products.csv' WITH ( FIELDTERMINATOR = ',', ROWTERMINATOR = '\n', FIRSTROW = 2, TABLOCK ); ``` ### Pattern 2: Silver Layer Cleansing ```sql -- Example: Silver Products with Data Quality Rules CREATE PROCEDUR