
Datatalks Data Engineering Zoomcamp
Install this skill when you want an agent to guide you through the free Data Engineering Zoomcamp labs, homework, and GCP/Docker/Terraform setup step by step.
Overview
Datatalks Data Engineering Zoomcamp is an agent skill most often used in Build (also Validate and Operate) that walks solo builders through the free nine-week DataTalks course—Docker, Terraform, Kestra, BigQuery, dbt, Sp
Install
npx skills add https://github.com/aradotso/data-skills --skill datatalks-data-engineering-zoomcampWhat is this skill?
- Maps the full 9-week Zoomcamp arc: Docker, Terraform, Kestra, BigQuery, dbt, Bruin, Spark, and Kafka with project-style
- Step-by-step environment help for Dockerized PostgreSQL, GCP free tier, and Terraform-for-zoomcamp exercises
- Module-oriented answers for cohort or self-paced learners (next cohort noted as January 2026 in SKILL.md)
- Hands-on command and setup patterns aligned to course modules, not abstract DE theory only
- Surfaces prerequisites up front: SQL, basic coding, Git, Docker, and a GCP account
- 9-week free Data Engineering Zoomcamp course structure
- Modules cover Docker, Terraform, Kestra, BigQuery, dbt, Bruin, Spark, and Kafka
Adoption & trust: 349 installs on skills.sh; 1 GitHub stars; 2/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).
What problem does it solve?
You enrolled in or self-paced Data Engineering Zoomcamp but get stuck wiring Docker, GCP, Terraform, or weekly homework without a single agent-aware guide to the official module order.
Who is it for?
Solo builders actively working through Data Engineering Zoomcamp who want an agent to interpret module goals and produce concrete Docker, GCP, Terraform, dbt, Spark, or Kafka steps.
Skip if: Teams that only need a one-off SQL snippet or a managed ELT vendor playbook with no Docker, GCP, or multi-week coursework commitment.
When should I use this skill?
User asks for help with Data Engineering Zoomcamp, DE zoomcamp environment setup, homework completion, module overviews, Docker/Terraform/GCP zoomcamp configuration, or running Spark exercises for the course.
What do I get? / Deliverables
After using the skill you have module-aligned commands, environment steps, and homework completion patterns so you can finish Zoomcamp labs and projects with less setup thrash.
- Module-aligned setup steps for Docker, GCP, and Terraform labs
- Homework completion guidance tied to Zoomcamp project structure
- Ordered module roadmap from containerization through streaming
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
The canonical shelf is Build because the skill’s job is hands-on pipeline and platform work (Docker, dbt, Spark, Kafka), not distribution or production ops alone. Backend fits data-engineering deliverables—warehousing, batch/stream processing, and orchestration—rather than frontend or generic PM docs.
Where it fits
Stand up the Module 1 containerized PostgreSQL database and verify Docker commands before moving to ingestion labs.
Configure Kestra workflows and connectors homework without guessing how orchestration fits the Zoomcamp repo layout.
Complete a module capstone-style project to demonstrate end-to-end pipeline skills before claiming job-ready DE ability.
Apply Terraform modules for Zoomcamp GCP resources so infra matches coursework before Spark or Kafka modules.
How it compares
Use as a structured course companion for the Zoomcamp syllabus, not as a generic dbt-or-Spark cheat sheet disconnected from module homework.
Common Questions / FAQ
Who is datatalks-data-engineering-zoomcamp for?
It is for solo and indie builders learning data engineering through DataTalks’ free Zoomcamp who want agent help on labs, tooling setup, and homework tied to each module.
When should I use datatalks-data-engineering-zoomcamp?
Use it during Build when implementing pipelines and backends, during Validate when completing course projects that prove your skills, and during Operate when practicing infra-as-code and streaming modules—for example Docker PostgreSQL in Module 1, dbt models mid-course, or Terraf
Is datatalks-data-engineering-zoomcamp safe to install?
Review the Security Audits panel on this Prism catalog page and the skill source before granting shell, network, or cloud access; the course itself expects Docker, Git, and GCP credentials you control.
SKILL.md
READMESKILL.md - Datatalks Data Engineering Zoomcamp
# DataTalks Data Engineering Zoomcamp > Skill by [ara.so](https://ara.so) — Data Skills collection. ## Overview The Data Engineering Zoomcamp is a comprehensive 9-week free course covering production-ready data pipeline development. It includes hands-on modules on containerization (Docker), infrastructure as code (Terraform), workflow orchestration (Kestra), data warehousing (BigQuery), analytics engineering (dbt), data platforms (Bruin), batch processing (Spark), and streaming (Kafka). The course operates in cohorts (next starts January 2026) but all materials are available for self-paced learning. ## Prerequisites - Basic coding experience - SQL familiarity - Python knowledge (helpful but not required) - Git installed - Docker Desktop or Docker Engine - Google Cloud Platform (GCP) account (free tier) ## Course Structure ### Module 1: Docker & Terraform **Set up containerized PostgreSQL database:** ```bash # Create network docker network create pg-network # Run PostgreSQL docker run -d \ --name pg-database \ --network pg-network \ -e POSTGRES_USER=root \ -e POSTGRES_PASSWORD=root \ -e POSTGRES_DB=ny_taxi \ -v $(pwd)/ny_taxi_postgres_data:/var/lib/postgresql/data \ -p 5432:5432 \ postgres:13 # Run pgAdmin docker run -d \ --name pgadmin \ --network pg-network \ -e PGADMIN_DEFAULT_EMAIL=admin@admin.com \ -e PGADMIN_DEFAULT_PASSWORD=root \ -p 8080:80 \ dpage/pgadmin4 ``` **Docker Compose for entire stack:** ```yaml # docker-compose.yaml services: pgdatabase: image: postgres:13 environment: - POSTGRES_USER=root - POSTGRES_PASSWORD=root - POSTGRES_DB=ny_taxi volumes: - ./ny_taxi_postgres_data:/var/lib/postgresql/data ports: - "5432:5432" pgadmin: image: dpage/pgadmin4 environment: - PGADMIN_DEFAULT_EMAIL=admin@admin.com - PGADMIN_DEFAULT_PASSWORD=root ports: - "8080:80" ``` ```bash # Start services docker-compose up -d # Stop services docker-compose down ``` **Terraform GCP setup:** ```hcl # main.tf terraform { required_version = ">= 1.0" backend "local" {} required_providers { google = { source = "hashicorp/google" } } } provider "google" { project = var.project region = var.region } # Data Lake Bucket resource "google_storage_bucket" "data-lake-bucket" { name = "${local.data_lake_bucket}_${var.project}" location = var.region storage_class = var.storage_class uniform_bucket_level_access = true versioning { enabled = true } lifecycle_rule { action { type = "Delete" } condition { age = 30 } } force_destroy = true } # BigQuery Dataset resource "google_bigquery_dataset" "dataset" { dataset_id = var.BQ_DATASET project = var.project location = var.region } ``` ```hcl # variables.tf locals { data_lake_bucket = "dtc_data_lake" } variable "project" { description = "Your GCP Project ID" } variable "region" { description = "Region for GCP resources" default = "europe-west6" type = string } variable "storage_class" { description = "Storage class type for your bucket" default = "STANDARD" } variable "BQ_DATASET" { description = "BigQuery Dataset" type = string default = "trips_data_all" } ``` ```bash # Initialize Terraform terraform init # Plan infrastructure terraform plan # Apply infrastructure terrafo