
Terraform Iac Data Engineering
Provision AWS S3, EC2, and IAM for data pipelines using reusable Terraform patterns and state discipline.
Install
npx skills add https://github.com/aradotso/data-skills --skill terraform-iac-data-engineeringWhat is this skill?
- Patterns for S3 buckets, EC2 processing instances, and IAM users, roles, and policies
- Covers Terraform state management tailored to data engineering setups
- Documented triggers: terraform data engineering setup, S3 and EC2 provisioning, pipeline infrastructure on AWS
- Installation section for Terraform CLI and AWS CLI on macOS and Linux
- Part of ara.so Data Skills collection focused on IaC for data workloads
Adoption & trust: 1 installs on skills.sh; 1 GitHub stars; trending (+100% hot-view momentum).
Recommended Skills
Azure Kubernetesmicrosoft/azure-skills
Github Actions Docsxixu-me/skills
Deploy To Vercelvercel-labs/agent-skills
Vercel Cli With Tokensvercel-labs/agent-skills
Turborepovercel/turborepo
Docker Expertsickn33/antigravity-awesome-skills
Journey fit
Primary fit
Data engineering infrastructure is first stood up during Build when pipelines need durable cloud resources. Terraform AWS modules are integration work connecting your codebase to cloud storage, compute, and identity for ETL workloads.
SKILL.md
READMESKILL.md - Terraform Iac Data Engineering
# Terraform IaC for Data Engineering > Skill by [ara.so](https://ara.so) — Data Skills collection. This project provides Infrastructure-as-Code (IaC) patterns using Terraform specifically for data engineering workloads on AWS. It demonstrates how to provision and manage AWS resources (S3, EC2, IAM) needed for data pipelines and processing. ## What This Project Does - Provisions AWS S3 buckets for data storage - Creates EC2 instances for data processing workloads - Manages IAM users, roles, and policies - Demonstrates Terraform state management - Provides reusable IaC patterns for data engineering infrastructure ## Installation ### Prerequisites 1. **Terraform CLI** ```bash # macOS brew install terraform # Linux wget https://releases.hashicorp.com/terraform/1.5.0/terraform_1.5.0_linux_amd64.zip unzip terraform_1.5.0_linux_amd64.zip sudo mv terraform /usr/local/bin/ ``` 2. **AWS CLI** ```bash # macOS brew install awscli # Linux curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" unzip awscliv2.zip sudo ./aws/install ``` 3. **Configure AWS CLI** ```bash aws configure # Enter your AWS Access Key ID # Enter your AWS Secret Access Key # Default region: us-east-1 # Default output format: json ``` ### Project Setup ```bash git clone https://github.com/josephmachado/iac-for-data-engineering-terraform-.git cd iac-for-data-engineering-terraform- ``` ## Key Terraform Commands ### Initialize Terraform ```bash # Initialize terraform (downloads providers, sets up backend) terraform -chdir=terraform init # Validate configuration files terraform -chdir=terraform validate # Format configuration files terraform -chdir=terraform fmt ``` ### Plan and Apply Infrastructure ```bash # Preview changes before applying terraform -chdir=terraform plan # Apply infrastructure changes terraform -chdir=terraform apply # Auto-approve without confirmation (use with caution) terraform -chdir=terraform apply -auto-approve ``` ### Inspect Infrastructure ```bash # List all resources in state terraform -chdir=terraform state list # Show details of a specific resource terraform -chdir=terraform state show aws_s3_bucket.data_bucket # Output specific values terraform -chdir=terraform output # Show current state in JSON terraform -chdir=terraform show -json ``` ### Destroy Infrastructure ```bash # Destroy all managed infrastructure terraform -chdir=terraform destroy # Destroy specific resource terraform -chdir=terraform destroy -target=aws_instance.data_processor ``` ## Configuration Structure ### Basic Terraform Configuration for Data Engineering **main.tf** - Core infrastructure definition: ```hcl terraform { required_providers { aws = { source = "hashicorp/aws" version = "~> 5.0" } } } provider "aws" { region = var.aws_region } # S3 bucket for data storage resource "aws_s3_bucket" "data_lake" { bucket = "my-unique-data-lake-bucket-${var.environment}" tags = { Name = "Data Lake Bucket" Environment = var.environment Project = "DataEngineering" } } # Enable versioning for data protection resource "aws_s3_bucket_versioning" "data_lake_versioning" { bucket = aws_s3_bucket.data_lake.id versioning_configuration { status = "Enabled" } } # Block public access resource "aws_s3_bucket_public_access_block" "data_lake_public_access" { bucket = aws_s3_bucket.data_lake.id b