
Terraform Data Engineering Infrastructure
Provision version-controlled AWS data-lake and ETL infrastructure with Terraform instead of clicking through the console.
Install
npx skills add https://github.com/aradotso/data-skills --skill terraform-data-engineering-infrastructureWhat is this skill?
- Declarative AWS patterns for S3 data lakes, EC2 processing hosts, and IAM roles
- Reproducible dev, staging, and production environments from the same modules
- Secure access modeling with IAM policies tailored to analytics and pipeline workloads
- Version-controlled IaC aligned with data-engineering team workflows
- Focused on automate AWS infrastructure for data teams triggers from the skill metadata
Adoption & trust: 1 installs on skills.sh; 1 GitHub stars; trending (+100% hot-view momentum).
Recommended Skills
Azure Deploymicrosoft/azure-skills
Azure Preparemicrosoft/azure-skills
Azure Storagemicrosoft/azure-skills
Azure Validatemicrosoft/azure-skills
Appinsights Instrumentationmicrosoft/azure-skills
Azure Resource Lookupmicrosoft/azure-skills
Journey fit
Primary fit
Data platforms live in production infra; Terraform here is the canonical shelf for durable AWS provisioning and environment parity. S3, EC2, and IAM patterns are core infrastructure-as-code work under operate/infra, even when first applied during initial platform build.
SKILL.md
READMESKILL.md - Terraform Data Engineering Infrastructure
# Terraform Data Engineering Infrastructure > Skill by [ara.so](https://ara.so) — Data Skills collection. This project provides Infrastructure-as-Code (IaC) patterns for data engineering teams using Terraform to provision and manage AWS resources. It demonstrates how to automate the creation of data infrastructure including S3 buckets for data lakes, EC2 instances for processing, and IAM policies for secure access. ## What This Project Does - Provisions AWS infrastructure specifically designed for data engineering workloads - Manages S3 buckets for data storage and data lake architectures - Creates EC2 instances for data processing and ETL jobs - Configures IAM roles and policies for secure resource access - Provides declarative infrastructure definitions that can be version-controlled - Enables reproducible environment creation across dev/staging/prod ## Prerequisites Before using this project, ensure you have: 1. An AWS account with root or administrative access 2. Terraform installed (v1.0+) 3. AWS CLI installed and configured 4. IAM user with appropriate permissions (S3, EC2, IAM full access) ### Installing Prerequisites ```bash # Install Terraform (macOS) brew tap hashicorp/tap brew install hashicorp/tap/terraform # Install AWS CLI (macOS) brew install awscli # Configure AWS CLI aws configure # Enter your AWS Access Key ID, Secret Access Key, region, and output format ``` ### Setting Up IAM Permissions Create an IAM user with the following permissions for Terraform: - Full S3 access (AmazonS3FullAccess) - Full EC2 access (AmazonEC2FullAccess) - Full IAM access (IAMFullAccess) **Note:** This is for development/learning. In production, use least-privilege policies. ```bash # Create access keys for your IAM user aws iam create-access-key --user-name your-terraform-user # Configure AWS CLI with these credentials aws configure --profile terraform ``` ## Project Structure ``` terraform/ ├── main.tf # Main infrastructure definitions ├── variables.tf # Input variables (if present) ├── outputs.tf # Output values (if present) └── terraform.tfstate # State file (generated) ``` ## Key Terraform Commands ### Initialize Terraform ```bash # Initialize the working directory terraform -chdir=terraform init # Validate configuration files terraform -chdir=terraform validate # Format configuration files terraform -chdir=terraform fmt ``` ### Plan and Apply Infrastructure ```bash # Preview changes without applying terraform -chdir=terraform plan # Apply changes and create infrastructure terraform -chdir=terraform apply # Apply without confirmation prompt terraform -chdir=terraform apply -auto-approve ``` ### Inspect Infrastructure ```bash # List all resources in state terraform -chdir=terraform state list # Show details of a specific resource terraform -chdir=terraform state show aws_s3_bucket.data_bucket # Output current state terraform -chdir=terraform show ``` ### Destroy Infrastructure ```bash # Destroy all managed infrastructure terraform -chdir=terraform destroy # Destroy specific resources terraform -chdir=terraform destroy -target=aws_instance.data_processor ``` ## Configuration Patterns ### Basic S3 Bucket for Data Lake ```hcl # terraform/main.tf terraform { required_providers { aws = { source = "hashicorp/aws" version = "~> 5.0" } } } provider "aws" { r