
Terraform Data Engineering Iac
Provision reproducible AWS data-lake and processing infrastructure with Terraform instead of clicking through the console.
Install
npx skills add https://github.com/aradotso/data-skills --skill terraform-data-engineering-iacWhat is this skill?
- Provisions AWS S3 buckets for data lake storage
- Creates EC2 instances for pipeline and processing workloads
- Manages IAM policies for least-privilege access to data resources
- Uses Terraform state to track and roll back infrastructure changes
- Reproducible stack for common data-engineering AWS patterns
Adoption & trust: 1 installs on skills.sh; 1 GitHub stars; trending (+100% hot-view momentum).
Recommended Skills
Azure Deploymicrosoft/azure-skills
Azure Preparemicrosoft/azure-skills
Azure Storagemicrosoft/azure-skills
Azure Validatemicrosoft/azure-skills
Appinsights Instrumentationmicrosoft/azure-skills
Azure Resource Lookupmicrosoft/azure-skills
Journey fit
Primary fit
Data platform infrastructure belongs on the Operate shelf because it defines how pipelines run in production once you are past initial product build. Infra subphase covers IaC, state, and cloud resource provisioning that solo builders need before pipelines can run reliably.
SKILL.md
READMESKILL.md - Terraform Data Engineering Iac
# Terraform Data Engineering IaC > Skill by [ara.so](https://ara.so) — Data Skills collection. This project demonstrates Infrastructure-as-Code (IaC) fundamentals for data engineering using Terraform. It provisions AWS resources commonly used in data pipelines including S3 buckets for data storage and EC2 instances for data processing workloads. ## What It Does - **Provisions AWS S3 buckets** for data lake storage - **Creates EC2 instances** for data processing and pipeline execution - **Manages IAM policies** for secure resource access - **Uses Terraform state** to track and manage infrastructure changes - **Provides reproducible infrastructure** for data engineering environments ## Prerequisites Before using this project, ensure you have: 1. AWS Account with appropriate permissions 2. Terraform CLI installed 3. AWS CLI installed and configured 4. IAM user with S3, EC2, and IAM permissions ## Installation ### 1. Install Terraform ```bash # macOS brew install terraform # Linux wget https://releases.hashicorp.com/terraform/1.5.0/terraform_1.5.0_linux_amd64.zip unzip terraform_1.5.0_linux_amd64.zip sudo mv terraform /usr/local/bin/ # Verify installation terraform version ``` ### 2. Install AWS CLI ```bash # macOS brew install awscli # Linux curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" unzip awscliv2.zip sudo ./aws/install # Configure AWS credentials aws configure ``` ### 3. Set Up IAM Permissions Create an IAM user with the following managed policies: - `AmazonS3FullAccess` - `AmazonEC2FullAccess` - `IAMFullAccess` **Note:** For production, use fine-grained permissions instead of full access. ## Project Structure ``` terraform/ ├── main.tf # Main infrastructure definitions ├── variables.tf # Input variables ├── outputs.tf # Output values └── terraform.tfstate # State file (generated) ``` ## Key Terraform Commands ### Initialize Terraform ```bash # Initialize backend and download providers terraform -chdir=terraform init ``` ### Validate Configuration ```bash # Check syntax and validate configuration terraform -chdir=terraform validate ``` ### Format Code ```bash # Auto-format HCL files terraform -chdir=terraform fmt ``` ### Plan Infrastructure Changes ```bash # Preview what will be created/changed terraform -chdir=terraform plan ``` ### Apply Infrastructure ```bash # Create or update infrastructure terraform -chdir=terraform apply # Auto-approve without confirmation (use carefully) terraform -chdir=terraform apply -auto-approve ``` ### Destroy Infrastructure ```bash # Remove all managed infrastructure terraform -chdir=terraform destroy # Auto-approve destruction (use carefully) terraform -chdir=terraform destroy -auto-approve ``` ### State Management ```bash # List all resources in state terraform -chdir=terraform state list # Show detailed resource information terraform -chdir=terraform state show aws_s3_bucket.data_bucket # View state as JSON cat terraform/terraform.tfstate | jq -r '.resources[] | [.type, .name] | join(",")' ``` ## Configuration Examples ### Basic S3 Bucket for Data Storage ```hcl # terraform/main.tf terraform { required_providers { aws = { source = "hashicorp/aws" version = "~> 5.0" } } } provider "aws" { region = var.aws_region } resource "aws_s3_bucket" "data_lake" { bucket = "my-unique-data-lake-bucke