
Realtime Cinema Data Engineering Pipeline
Stand up a Kafka-to-PostgreSQL Medallion pipeline with Airflow orchestration and a Streamlit analytics front end for streaming cinema-style event data.
Install
npx skills add https://github.com/aradotso/data-skills --skill realtime-cinema-data-engineering-pipelineWhat is this skill?
- Medallion Architecture: Bronze, Silver, and Gold layers on PostgreSQL
- Apache Kafka producers and consumers for event ingestion at 1M+ event scale (learning target stated in overview)
- Apache Airflow for ELT orchestration
- Streamlit + Plotly for live visualization dashboard
- Docker Compose–based setup with Python 3.8+ virtualenv workflow
Adoption & trust: 1 installs on skills.sh; 1 GitHub stars; 1/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).
Recommended Skills
Journey fit
The skill is a full implementation guide for backend data infrastructure and orchestration during product build—not post-launch ops tuning alone. Backend subphase fits warehousing layers, producers/consumers, and ELT jobs rather than mobile UI or marketing.
Common Questions / FAQ
Is Realtime Cinema Data Engineering Pipeline safe to install?
skills.sh reports 1 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.
SKILL.md
READMESKILL.md - Realtime Cinema Data Engineering Pipeline
# CinéWorld Real-Time Data Engineering Pipeline Skill > Skill by [ara.so](https://ara.so) — Data Skills collection. ## Overview This project implements an end-to-end real-time data engineering pipeline using Apache Kafka for event streaming, PostgreSQL for data warehousing with Medallion Architecture (Bronze/Silver/Gold layers), Apache Airflow for ELT orchestration, and Streamlit for live visualization. Perfect for learning how to build production-grade streaming data pipelines that process 1M+ events. ## Installation ### Prerequisites - Docker and Docker Compose - Python 3.8+ - Virtual environment (recommended) ### Setup Steps ```bash # Clone the repository git clone https://github.com/BaidaneAyoub/realtime-cinema-data-engineering.git cd realtime-cinema-data-engineering # Create and activate virtual environment python -m venv myenv source myenv/bin/activate # On Windows: myenv\Scripts\activate # Install dependencies pip install -r requirements.txt # Start infrastructure (Kafka, PostgreSQL, Airflow) docker-compose up -d ``` **Important**: Wait 2-3 minutes for Airflow to fully initialize before proceeding. ## Architecture Components ### 1. Medallion Architecture Layers **Bronze Layer**: Raw JSON event data ingested from Kafka ```sql -- Bronze table stores raw events CREATE TABLE bronze_transactions ( id SERIAL PRIMARY KEY, raw_data JSONB NOT NULL, ingested_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); ``` **Silver Layer**: Normalized 3NF tables (Customers, Movies, Showtimes, Transactions) ```sql -- Normalized dimension and fact tables CREATE TABLE silver_customers (...); CREATE TABLE silver_movies (...); CREATE TABLE silver_showtimes (...); CREATE TABLE silver_transactions (...); ``` **Gold Layer**: Materialized views for analytics ```sql -- Business-ready aggregated data CREATE MATERIALIZED VIEW gold_cinema_analytics AS SELECT ... ``` ### 2. Kafka Event Producer Generate and stream synthetic cinema transaction events: ```python # producer/main_producer.py from kafka import KafkaProducer from faker import Faker import json import time import os fake = Faker() # Initialize Kafka producer producer = KafkaProducer( bootstrap_servers=os.getenv('KAFKA_BOOTSTRAP_SERVERS', 'localhost:9092'), value_serializer=lambda v: json.dumps(v).encode('utf-8') ) def generate_ticket_sale(): """Generate a synthetic ticket sale event""" return { "transaction_id": fake.uuid4(), "customer": { "customer_id": fake.uuid4(), "name": fake.name(), "email": fake.email(), "phone": fake.phone_number() }, "movie": { "movie_id": fake.uuid4(), "title": fake.catch_phrase(), "genre": fake.random_element(['Action', 'Comedy', 'Drama', 'Horror']), "duration_minutes": fake.random_int(90, 180) }, "showtime": { "showtime_id": fake.uuid4(), "cinema_location": fake.city(), "screen_number": fake.random_int(1, 10), "showtime": fake.date_time_this_month().isoformat() }, "payment": { "amount": round(fake.random.uniform(8.0, 25.0), 2), "payment_method": fake.random_element(['Credit Card', 'Cash', 'Gift Card']), "currency": "USD" }, "seats"