
Iac Terraform Data Engineering
Provision reproducible AWS data-platform resources (S3, EC2, IAM) with Terraform instead of clicking through the console.
Overview
IaC Terraform Data Engineering is an agent skill for the Build phase that provisions AWS S3, EC2, and IAM for data platforms using Terraform best practices.
Install
npx skills add https://github.com/aradotso/data-skills --skill iac-terraform-data-engineeringWhat is this skill?
- Terraform patterns for S3 buckets, EC2 instances, and IAM roles aimed at data platforms
- State management and lifecycle flows including destroy paths for data infrastructure
- IaC templates oriented to reproducible, version-controlled pipeline hosting
- Prerequisites call out an AWS account and Terraform CLI before apply
- Trigger phrases cover pipeline provisioning, state management, and teardown
- Covers three AWS resource families: S3, EC2, and IAM
- Prerequisites list 2 items: AWS account and Terraform CLI
Adoption & trust: 1 installs on skills.sh; 1 GitHub stars; trending (+100% hot-view momentum).
What problem does it solve?
You need repeatable AWS storage, compute, and permissions for pipelines but only have ad-hoc console setup or stale runbooks.
Who is it for?
Indie data builders or full-stack solos standing up AWS-backed lakes, batch hosts, or pipeline sandboxes with Terraform for the first time.
Skip if: Teams that only need application code without cloud provisioning, or orgs standardized on Pulumi, CDK, or non-AWS clouds with no Terraform path.
When should I use this skill?
Set up Terraform for data engineering, create AWS infrastructure with Terraform, provision S3/EC2, manage pipeline IaC, or destroy data infrastructure.
What do I get? / Deliverables
You leave with version-controlled Terraform patterns and lifecycle guidance so data infrastructure can be applied, tracked in state, and torn down consistently.
- Terraform configurations for S3, EC2, and IAM aligned to data workloads
- Documented apply/destroy and state-handling workflow for the stack
Recommended Skills
Journey fit
Canonical shelf is Build because solo builders install this while wiring the stack that pipelines and agents will run on—not only during day-two ops. Integrations fits Terraform modules that connect storage, compute, and IAM into a version-controlled data-engineering baseline.
How it compares
Use for Terraform-first AWS data baselines instead of generic DevOps skills that skip S3–EC2–IAM wiring for analytics workloads.
Common Questions / FAQ
Who is iac-terraform-data-engineering for?
Solo and indie builders running data pipelines or agent workloads on AWS who want IaC templates rather than manual console provisioning.
When should I use iac-terraform-data-engineering?
During Build when you integrate cloud resources—e.g. creating buckets and roles before an ETL job ships, or during Operate when you refactor state management and controlled destroys for a data platform.
Is iac-terraform-data-engineering safe to install?
Review the Security Audits panel on this Prism page and treat any skill that touches AWS credentials and Terraform apply as high-impact before running in production accounts.
SKILL.md
READMESKILL.md - Iac Terraform Data Engineering
# IaC for Data Engineering with Terraform > Skill by [ara.so](https://ara.so) — Data Skills collection. This project provides Infrastructure-as-Code (IaC) templates and patterns for data engineers using Terraform to provision and manage AWS resources. It focuses on creating reproducible, version-controlled infrastructure for data platforms including S3 storage, EC2 compute instances, and IAM permissions. ## What This Project Does - Provides Terraform configurations for common data engineering infrastructure on AWS - Demonstrates IaC best practices for S3 buckets, EC2 instances, and IAM roles - Shows state management and lifecycle operations for data infrastructure - Teaches reproducible infrastructure provisioning for data pipelines ## Prerequisites Before using this project, ensure you have: 1. **AWS Account** with root or admin access 2. **Terraform CLI** installed ([installation guide](https://developer.hashicorp.com/terraform/tutorials/aws-get-started/install-cli)) 3. **AWS CLI** installed and configured ([setup guide](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)) 4. **AWS Credentials** configured via `aws configure` ## AWS IAM Setup Create an IAM user with appropriate permissions: 1. **Create IAM User**: Navigate to AWS Console → IAM → Users → Create user 2. **Create Inline Policy**: Attach a custom policy to the user 3. **Grant Permissions**: For development/learning, grant full access to: - Amazon S3 - Amazon EC2 - AWS IAM **⚠️ Security Note**: Full service access is NOT recommended for production. Use least-privilege policies in production environments. ## Project Structure ``` terraform/ ├── main.tf # Main Terraform configuration ├── variables.tf # Input variables (if present) ├── outputs.tf # Output values (if present) └── terraform.tfstate # State file (generated) ``` ## Key Terraform Commands ### Initialize Terraform Initialize the working directory and download provider plugins: ```bash terraform -chdir=terraform init ``` ### Validate Configuration Check if the configuration is syntactically valid: ```bash terraform -chdir=terraform validate ``` ### Format Code Automatically format Terraform files to canonical style: ```bash terraform -chdir=terraform fmt ``` ### Plan Infrastructure Changes Preview what Terraform will create/modify/destroy: ```bash terraform -chdir=terraform plan ``` ### Apply Configuration Create or update infrastructure: ```bash terraform -chdir=terraform apply ``` Terraform will show a plan and ask for confirmation. Type `yes` to proceed. ### Auto-approve (for automation) ```bash terraform -chdir=terraform apply -auto-approve ``` ### Destroy Infrastructure Remove all resources managed by Terraform: ```bash terraform -chdir=terraform destroy ``` ## Configuration ### Basic Terraform Configuration Example Before applying, modify `terraform/main.tf` to customize resource names: ```hcl # terraform/main.tf terraform { required_providers { aws = { source = "hashicorp/aws" version = "~> 5.0" } } } provider "aws" { region = "us-east-1" } # S3 bucket for data storage resource "aws_s3_bucket" "data_bucket" { bucket = "my-unique-data-engineering-bucket-12345" tags = { Name = "Data Engineering Bucket" Environment = "dev" ManagedBy = "Terraform" } } # EC2 instance for data processing resource "aws_instance" "data_processor" {