
Retail Etl Medallion Pipeline
Stand up a Bronze/Silver/Gold retail ETL warehouse with inventory shrinkage, recipe conversions, and multi-branch sales without ad-hoc scripts.
Overview
Retail ETL Medallion Pipeline is an agent skill for the Build phase that designs and implements a Bronze/Silver/Gold retail data warehouse with TSQL, PySpark, and Airflow.
Install
npx skills add https://github.com/aradotso/data-skills --skill retail-etl-medallion-pipelineWhat is this skill?
- Medallion Bronze/Silver/Gold layers for sales, stock, and product CSV sources
- Retail business rules: shrinkage, meat/poultry recipe yield, supplier rebate tiers
- Multi-branch consolidation (Alex, Cairo, Giza) into analytics-ready gold models
- Stack: TSQL, PySpark, and Airflow for production-grade orchestration
- Hypermarket/retail dimensional modeling for inventory and sales analytics
- Three medallion layers: Bronze, Silver, and Gold
- Processes multi-branch sales (Alex, Cairo, Giza) plus stock and product catalogs
Adoption & trust: 1 installs on skills.sh; 1 GitHub stars; trending (+100% hot-view momentum).
What problem does it solve?
You have scattered retail CSVs and branch-specific sales but no layered pipeline that handles shrinkage, recipes, and supplier rebates for trustworthy analytics.
Who is it for?
Indie builders or tiny teams shipping a retail or hypermarket analytics backend who already accept SQL, Spark, and Airflow as the stack.
Skip if: Teams that only need a one-table dashboard, real-time streaming-only stacks, or non-retail domains with no inventory or multi-branch logic.
When should I use this skill?
User asks to build a retail data warehouse with medallion architecture, bronze/silver/gold layers, retail ETL with inventory tracking, or orchestrate retail ETL with Airflow and Spark.
What do I get? / Deliverables
You get a documented medallion ETL with bronze ingestion, silver business rules, gold aggregates, and orchestration hooks you can deploy and iterate on.
- Layered ETL design and transform logic for Bronze, Silver, and Gold
- Business-rule implementations for shrinkage, recipes, rebates, and branch consolidation
- Orchestration outline for scheduled retail pipeline runs
Recommended Skills
Journey fit
Canonical shelf is Build because the skill delivers a full data-engineering implementation (layers, transforms, orchestration), not ongoing ops tuning alone. Backend fits ETL, dimensional models, TSQL/PySpark logic, and Airflow orchestration rather than UI or agent packaging.
How it compares
Use instead of generic “upload CSV to BI” chat plans when you need explicit medallion layers and retail-specific transform rules.
Common Questions / FAQ
Who is retail-etl-medallion-pipeline for?
Solo builders and small teams implementing retail or hypermarket warehouses who want medallion layering, branch consolidation, and orchestrated ETL rather than manual scripts.
When should I use retail-etl-medallion-pipeline?
During Build when you are creating bronze/silver/gold layers, retail inventory and sales pipelines, Airflow-orchestrated ETL, or dimensional models for multi-branch retail analytics.
Is retail-etl-medallion-pipeline safe to install?
Review the Security Audits panel on this Prism page and inspect the skill package before letting an agent run shell, network, or dependency changes in your data environment.
SKILL.md
READMESKILL.md - Retail Etl Medallion Pipeline
# Retail ETL Medallion Pipeline Skill > Skill by [ara.so](https://ara.so) — Data Skills collection. ## Overview This project implements a production-grade **Medallion Architecture** ETL pipeline for retail/hypermarket data, handling complex business logic like inventory shrinkage, meat/poultry recipe conversions, supplier rebate tiers, and multi-branch sales consolidation. The architecture follows three data quality layers: - **Bronze Layer**: Raw data ingestion from CSV sources (sales, stock, products) - **Silver Layer**: Cleaned, standardized, and business-rule-applied data - **Gold Layer**: Aggregated, analytics-ready dimensional models The pipeline processes: - Multi-branch sales transactions (Alex, Cairo, Giza) - Product catalogs with recipe/yield conversions - Stock/inventory tracking across locations - Supplier rebate calculations ## Project Structure ``` Retail-Data-Warehouse/ ├── data_source/ # Raw CSV files (CRM/ERP exports) │ ├── 000.Hypermarket Products.csv │ ├── 001-003.*.Branch Sales.csv │ └── 004-006.*.Stock.csv ├── sql_scripts/ # TSQL stored procedures for each layer │ ├── 00_create_database_and_schemas.sql │ ├── 01-04_bronze_*.sql │ ├── 05-08_silver_*.sql │ └── 09-12_gold_*.sql ├── BI_Team_Analysis/ # Power BI dashboards └── docker-compose.yml # SQL Server container setup ``` ## Installation & Setup ### 1. Infrastructure Setup (SQL Server) Using Docker: ```bash # Start SQL Server container docker-compose up -d # Verify container is running docker ps | grep sqlserver ``` Or use an existing SQL Server instance (2017+). ### 2. Database Initialization ```bash # Connect to SQL Server and create database structure sqlcmd -S localhost -U sa -P $SQL_SA_PASSWORD -i sql_scripts/00_create_database_and_schemas.sql ``` This creates: - Database: `RetailDataWarehouse` - Schemas: `bronze`, `silver`, `gold`, `staging` ### 3. Load Raw Data to Bronze Layer Place CSV files in accessible location, then run: ```sql -- Execute Bronze layer ingestion procedures EXEC bronze.usp_LoadProducts; EXEC bronze.usp_LoadSales; EXEC bronze.usp_LoadStock; ``` Or execute all Bronze scripts sequentially: ```bash for script in sql_scripts/01_bronze_*.sql sql_scripts/02_bronze_*.sql sql_scripts/03_bronze_*.sql sql_scripts/04_bronze_*.sql; do sqlcmd -S localhost -U sa -P $SQL_SA_PASSWORD -i "$script" done ``` ## Key Architecture Patterns ### Bronze Layer (Raw Ingestion) **Purpose**: Land raw data with minimal transformation. Add audit columns only. ```sql -- Example: Bronze Products Table Structure CREATE TABLE bronze.Products ( ProductID INT, ProductName NVARCHAR(255), Category NVARCHAR(100), SubCategory NVARCHAR(100), UnitPrice DECIMAL(10,2), SupplierID INT, RecipeYield DECIMAL(5,2), -- For meat/poultry conversions LoadTimestamp DATETIME2 DEFAULT GETDATE(), SourceFile NVARCHAR(500) ); -- Bronze Load Pattern CREATE PROCEDURE bronze.usp_LoadProducts AS BEGIN TRUNCATE TABLE bronze.Products; BULK INSERT bronze.Products FROM '/data/000.Hypermarket Products.csv' WITH ( FIELDTERMINATOR = ',', ROWTERMINATOR = '\n', FIRSTROW = 2, ERRORFILE = '/logs/products_errors.txt' ); -- Add audit metadata UPDATE