
Iac Data Engineering Terraform
Provision AWS S3, EC2, and IAM for data pipelines using reusable Terraform patterns and state management.
Install
npx skills add https://github.com/aradotso/data-skills --skill iac-data-engineering-terraformWhat is this skill?
- Covers S3 buckets, EC2 processing hosts, and IAM roles with least-privilege patterns
- Documents Terraform and AWS CLI install via Homebrew plus aws configure
- Emphasizes reproducible environments and Terraform state for data infra
- Trigger phrases include S3/EC2 setup, pipeline IaC, and state management
- From ara.so Data Skills collection for data-engineering workflows
Adoption & trust: 1 installs on skills.sh; 1 GitHub stars; trending (+100% hot-view momentum).
Recommended Skills
Azure Deploymicrosoft/azure-skills
Azure Preparemicrosoft/azure-skills
Azure Storagemicrosoft/azure-skills
Azure Validatemicrosoft/azure-skills
Appinsights Instrumentationmicrosoft/azure-skills
Azure Resource Lookupmicrosoft/azure-skills
Journey fit
Primary fit
Build integrations is canonical because the skill’s core job is declaratively wiring cloud resources data engineers need before pipelines run. Integrations reflects Terraform modules and AWS service glue rather than application UI or pure analytics dashboards.
SKILL.md
READMESKILL.md - Iac Data Engineering Terraform
# IaC for Data Engineering with Terraform > Skill by [ara.so](https://ara.so) — Data Skills collection. This project demonstrates Infrastructure-as-Code (IaC) fundamentals for data engineers using Terraform to provision AWS resources including S3 buckets, EC2 instances, and IAM configurations. It provides reusable patterns for managing data infrastructure declaratively. ## What This Project Does - Provisions AWS S3 buckets for data storage - Creates and configures EC2 instances for data processing - Sets up IAM roles and policies with proper permissions - Manages infrastructure state with Terraform - Provides reproducible data engineering environments ## Prerequisites Before using this project, ensure you have: ```bash # Install Terraform brew tap hashicorp/tap brew install hashicorp/tap/terraform # Install AWS CLI brew install awscli # Configure AWS credentials aws configure # Enter your AWS Access Key ID, Secret Access Key, region, and output format ``` Set up required environment variables: ```bash export AWS_ACCESS_KEY_ID=$YOUR_ACCESS_KEY export AWS_SECRET_ACCESS_KEY=$YOUR_SECRET_KEY export AWS_DEFAULT_REGION=us-east-1 ``` ## Project Structure ``` terraform/ ├── main.tf # Main infrastructure definitions ├── variables.tf # Input variables ├── outputs.tf # Output values └── terraform.tfstate # State file (auto-generated) ``` ## Core Terraform Commands ### Initialize Terraform ```bash # Initialize the working directory and download providers terraform -chdir=terraform init # Validate configuration syntax terraform -chdir=terraform validate # Format configuration files terraform -chdir=terraform fmt ``` ### Plan and Apply Infrastructure ```bash # Preview changes without applying terraform -chdir=terraform plan # Apply infrastructure changes terraform -chdir=terraform apply # Auto-approve without prompts (use carefully) terraform -chdir=terraform apply -auto-approve ``` ### Inspect Infrastructure ```bash # List all resources in state terraform -chdir=terraform state list # Show detailed state information terraform -chdir=terraform show # Output specific values terraform -chdir=terraform output ``` ### Destroy Infrastructure ```bash # Destroy all managed infrastructure terraform -chdir=terraform destroy # Destroy specific resource terraform -chdir=terraform destroy -target=aws_s3_bucket.data_bucket ``` ## Key Configuration Patterns ### S3 Bucket for Data Storage ```hcl # main.tf resource "aws_s3_bucket" "data_lake" { bucket = "my-data-engineering-bucket-${random_id.bucket_suffix.hex}" tags = { Environment = "dev" Purpose = "data-engineering" ManagedBy = "terraform" } } resource "random_id" "bucket_suffix" { byte_length = 4 } # Enable versioning for data protection resource "aws_s3_bucket_versioning" "data_lake_versioning" { bucket = aws_s3_bucket.data_lake.id versioning_configuration { status = "Enabled" } } # Configure lifecycle rules resource "aws_s3_bucket_lifecycle_configuration" "data_lake_lifecycle" { bucket = aws_s3_bucket.data_lake.id rule { id = "archive-old-data" status = "Enabled" transition { days = 90 storage_class = "GLACIER" } expiration { days = 365 } } } ``` ### EC2 Instance for Data Processing ```hcl # main.tf resource "aws_instance" "data_processor" { ami = "ami-0c55b159cbfafe1f0" # Amazon Linux 2 instance_type =