
Data Engineering Medallion Pipeline
Stand up a local Bronze–Silver–Gold ELT stack with MinIO, Airbyte, PostgreSQL, DBT, and Airflow for reproducible data engineering prototypes.
Install
npx skills add https://github.com/aradotso/data-skills --skill data-engineering-medallion-pipelineWhat is this skill?
- End-to-end medallion flow: ingest raw to Bronze, clean to Silver, aggregate to Gold via DBT
- Stack covers MinIO S3 storage, Airbyte ingestion, PostgreSQL warehouse, Airflow DAGs, Grafana monitoring
- Includes DBT data quality tests as part of the documented pipeline
- Docker Compose oriented local deploy for solo builders proving ELT before cloud spend
- Trigger phrases include set up medallion architecture, orchestrate with Airflow, and build ELT with docker compose
Adoption & trust: 1 installs on skills.sh; 1 GitHub stars; trending (+100% hot-view momentum).
Recommended Skills
Supabase Postgres Best Practicessupabase/agent-skills
Lark Baselarksuite/cli
Convex Migration Helperget-convex/agent-skills
Neon Postgresneondatabase/agent-skills
Firebase Firestore Standardfirebase/agent-skills
Postgresql Table Designwshobson/agents
Journey fit
Primary fit
Medallion pipeline construction is primarily a Build activity where ingestion, modeling, and orchestration are wired together before production operate concerns dominate. Backend subphase fits warehouse transforms, DAG orchestration, and API-adjacent data services rather than mobile UI or launch marketing.
SKILL.md
READMESKILL.md - Data Engineering Medallion Pipeline
# Data Engineering Medallion Pipeline Skill > Skill by [ara.so](https://ara.so) — Data Skills collection. This skill enables AI agents to work with a complete data engineering pipeline implementing the Medallion Architecture (Bronze → Silver → Gold) using modern open-source tools: MinIO (S3-compatible storage), Airbyte (data ingestion), PostgreSQL (data warehouse), DBT (transformations), Apache Airflow (orchestration), and Grafana (monitoring). ## What This Project Does The data-engineering-medallion project provides a complete end-to-end data pipeline that: 1. **Ingests** raw data from MinIO object storage into PostgreSQL using Airbyte 2. **Transforms** data through three layers (Bronze/Silver/Gold) using DBT 3. **Orchestrates** the entire pipeline with Apache Airflow DAGs 4. **Validates** data quality with automated DBT tests 5. **Monitors** infrastructure health with Prometheus and Grafana 6. **Visualizes** business metrics in Power BI dashboards The architecture follows ELT (Extract-Load-Transform) pattern with clear separation of concerns: - **Bronze**: Raw immutable data from sources (JSONB format) - **Silver**: Cleaned, validated, and typed data - **Gold**: Business-ready aggregated metrics and KPIs ## Installation & Setup ### Prerequisites ```bash # Required docker --version # 20.10+ docker-compose --version # 2.0+ # 8GB RAM minimum, 16GB recommended ``` ### Clone and Initialize ```bash git clone https://github.com/LucasGoulartCouto/data-engineering-medallion.git cd data-engineering-medallion # Setup environment and start all services make setup make start # Verify all containers are healthy make status ``` ### Service URLs After startup, access these interfaces: - **Airflow**: http://localhost:8080 (admin/admin) - **MinIO**: http://localhost:9001 (minioadmin/[from .env]) - **Airbyte**: http://localhost:8000 (create account on first visit) - **Grafana**: http://localhost:3000 (admin/admin) - **Prometheus**: http://localhost:9090 - **DBT Docs**: http://localhost:8085 (after `make dbt-docs`) ## Key Commands (Makefile) ```bash # Infrastructure make setup # Create .env, directories, install dependencies make start # Start all Docker services make stop # Stop all services make restart # Restart all services make status # Check container health make logs SERVICE=airflow # View logs for specific service # DBT Operations make dbt-run # Run all DBT models (bronze → silver → gold) make dbt-test # Run data quality tests make dbt-docs # Generate and serve documentation make dbt-snapshot # Capture SCD Type 2 snapshots make dbt-clean # Clean compiled artifacts # Data Pipeline make upload-data # Upload sample data to MinIO make trigger-dag DAG_ID=bronze_ingestion_dag # Manually trigger Airflow DAG # Development make lint # Lint Python and SQL code make format # Format Python code with black make validate # Validate Airflow DAGs and DBT models # Cleanup make clean # Remove volumes and stop services make clean-all # Full cleanup including Docker images ``` ## Project Structure ``` data-engineering-medallion/ ├── airflow/ │ └── dags/ │ ├── bronze_ingestion_dag.py # Triggers Airbyte sync │ ├── silver_transformation_dag.py # Runs DBT silver models │ └── gold_aggregation_dag.py # Runs DBT gol