Infrastructure as Code with Terraform: Best Practices for Cloud-Native Teams
Infrastructure as Code is not just a convenience — it is the foundation of reproducible, auditable, and disaster-resilient cloud environments. Terraform has become the universal language for describing cloud infrastructure, but using it well in production requires discipline around modules, state, and team workflows.
Why Terraform in 2026?
Terraform by HashiCorp (now OpenTofu as the CNCF-governed open-source fork) remains the dominant multi-cloud IaC tool. Its declarative HCL syntax describes the desired state of infrastructure — provider resources, networking, compute, databases — and the Terraform plan/apply workflow calculates and executes the changes required to move from the current state to the desired state. This approach provides the core IaC benefits: version-controlled infrastructure changes, code review for every modification, repeatable environment provisioning, and documented infrastructure that survives team member turnover.
In 2026, the distinction between Terraform and OpenTofu is important to acknowledge. OpenTofu, the BSL-license-free fork initiated after HashiCorp's licensing change in 2023, is now fully stable and recommended for greenfield projects. The migration from Terraform to OpenTofu is generally straightforward. This guide uses standard HCL that is compatible with both.
Repository Structure: Modules and Environments
The most consequential early decision in a Terraform project is how to organize code across modules and environments. The recommended structure separates reusable modules from environment-specific configurations.
infrastructure/
├── modules/
│ ├── eks-cluster/ # Reusable EKS module
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── rds-postgres/ # Reusable RDS PostgreSQL module
│ ├── vpc/ # Reusable VPC module
│ └── alb/ # Reusable ALB module
└── environments/
├── dev/
│ ├── main.tf # Instantiates modules with dev config
│ ├── terraform.tfvars
│ └── backend.tf
├── staging/
└── production/
Modules are the abstraction layer. A well-designed module encapsulates a logical infrastructure component (a Kubernetes cluster, a database, a VPC) with clearly defined input variables and output values. Environments are compositions of modules, providing environment-specific variable values. This structure enables the same infrastructure design to be deployed identically across dev, staging, and production with only the sizing and configuration changing.
Remote State and State Locking
Terraform's state file is the ground truth for what infrastructure exists. Storing state locally is fine for learning but catastrophic for teams: simultaneous applies from different machines will corrupt state. Production Terraform must use a remote backend with state locking. The recommended pattern for AWS is an S3 backend with a DynamoDB table for state locking.
# backend.tf — remote state with locking
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "environments/production/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-lock"
}
}
State files often contain sensitive data (database passwords, certificates, private keys) written as outputs. Always enable S3 server-side encryption and restrict bucket access to CI/CD roles and senior engineers only. Consider using Terraform's sensitive variable marking to prevent secrets from appearing in plan output.
Writing Production-Grade Modules
A good Terraform module has three qualities: it is reusable across environments, it exposes only the variables that genuinely need to vary between uses, and it outputs the values that callers typically need to pass to other modules.
# modules/rds-postgres/variables.tf
variable "identifier" {
description = "Unique identifier for this RDS instance"
type = string
}
variable "instance_class" {
description = "RDS instance type (e.g. db.t3.medium)"
type = string
default = "db.t3.medium"
}
variable "allocated_storage_gb" {
description = "Initial storage in GB"
type = number
default = 50
}
variable "multi_az" {
description = "Enable Multi-AZ for high availability"
type = bool
default = false # default off; production sets true
}
variable "vpc_security_group_ids" {
description = "Security groups that control inbound access"
type = list(string)
}
variable "subnet_ids" {
description = "Private subnet IDs for the DB subnet group"
type = list(string)
}
variable "database_name" { type = string }
variable "master_username" { type = string }
variable "master_password" {
type = string
sensitive = true
}
# modules/rds-postgres/main.tf
resource "aws_db_subnet_group" "this" {
name = "${var.identifier}-subnet-group"
subnet_ids = var.subnet_ids
}
resource "aws_db_instance" "this" {
identifier = var.identifier
engine = "postgres"
engine_version = "16.3"
instance_class = var.instance_class
allocated_storage = var.allocated_storage_gb
max_allocated_storage = var.allocated_storage_gb * 5 # auto-scaling
storage_encrypted = true
multi_az = var.multi_az
deletion_protection = true
backup_retention_period = 7
skip_final_snapshot = false
final_snapshot_identifier = "${var.identifier}-final"
db_name = var.database_name
username = var.master_username
password = var.master_password
db_subnet_group_name = aws_db_subnet_group.this.name
vpc_security_group_ids = var.vpc_security_group_ids
tags = {
Environment = terraform.workspace
ManagedBy = "terraform"
}
}
The Terraform Workflow in a Team
Solo Terraform is straightforward. Team Terraform requires process discipline to avoid state conflicts, accidental applies, and configuration drift. The recommended workflow: infrastructure changes are proposed as pull requests containing Terraform code. CI runs terraform fmt --check, terraform validate, and terraform plan on every PR, posting the plan output as a PR comment. Human reviewers review both the code and the plan. Merging the PR triggers terraform apply automatically via CI. Nothing is ever applied manually — all changes flow through the PR pipeline.
Tools like Atlantis (self-hosted) or Terraform Cloud automate this workflow. Atlantis runs as a service that responds to PR comments with atlantis plan and atlantis apply commands, posting plan output and requiring approvals before applying.
Common Mistakes to Avoid
Storing secrets in tfvars files committed to Git: Use AWS Secrets Manager, HashiCorp Vault, or GitHub Actions secrets to inject sensitive values at apply time. Never commit passwords to version control.
Manual state manipulation with terraform state commands: State surgery should be a last resort. If you find yourself regularly manipulating state, it indicates a structural problem with your module design.
Not pinning provider versions: Always specify provider version constraints. A provider upgrade can introduce breaking changes that silently alter resource behavior.
Creating giant monolithic root modules: A single 5,000-line main.tf is unmaintainable and causes full plan recalculation for every change. Decompose into focused modules and separate environment stacks.
"Treat your Terraform code with the same rigor as application code: code review, automated testing, version control, and a clear promotion path from dev to production."
Key Takeaways
- Separate reusable modules from environment-specific compositions in your repository structure.
- Always use a remote backend with state locking; local state is not viable for teams.
- Enforce the full PR workflow — plan in CI, human review, automated apply on merge.
- Pin provider versions to prevent silent breaking changes.
- Never store secrets in committed files; inject them at apply time from secrets managers.
Related Articles
Discussion / Comments
Join the conversation — your comment goes directly to my inbox.