Infrastructure as Code with Terraform: Best Practices for Cloud-Native Teams

Cloud infrastructure managed as code with Terraform

Infrastructure as Code is not just a convenience — it is the foundation of reproducible, auditable, and disaster-resilient cloud environments. Terraform has become the universal language for describing cloud infrastructure, but using it well in production requires discipline around modules, state, and team workflows.

Why Terraform in 2026?

Terraform by HashiCorp (now OpenTofu as the CNCF-governed open-source fork) remains the dominant multi-cloud IaC tool. Its declarative HCL syntax describes the desired state of infrastructure — provider resources, networking, compute, databases — and the Terraform plan/apply workflow calculates and executes the changes required to move from the current state to the desired state. This approach provides the core IaC benefits: version-controlled infrastructure changes, code review for every modification, repeatable environment provisioning, and documented infrastructure that survives team member turnover.

In 2026, the distinction between Terraform and OpenTofu is important to acknowledge. OpenTofu, the BSL-license-free fork initiated after HashiCorp's licensing change in 2023, is now fully stable and recommended for greenfield projects. The migration from Terraform to OpenTofu is generally straightforward. This guide uses standard HCL that is compatible with both.

Repository Structure: Modules and Environments

The most consequential early decision in a Terraform project is how to organize code across modules and environments. The recommended structure separates reusable modules from environment-specific configurations.

infrastructure/
├── modules/
│   ├── eks-cluster/        # Reusable EKS module
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
│   ├── rds-postgres/       # Reusable RDS PostgreSQL module
│   ├── vpc/                # Reusable VPC module
│   └── alb/                # Reusable ALB module
└── environments/
    ├── dev/
    │   ├── main.tf         # Instantiates modules with dev config
    │   ├── terraform.tfvars
    │   └── backend.tf
    ├── staging/
    └── production/

Modules are the abstraction layer. A well-designed module encapsulates a logical infrastructure component (a Kubernetes cluster, a database, a VPC) with clearly defined input variables and output values. Environments are compositions of modules, providing environment-specific variable values. This structure enables the same infrastructure design to be deployed identically across dev, staging, and production with only the sizing and configuration changing.

Remote State and State Locking

Terraform's state file is the ground truth for what infrastructure exists. Storing state locally is fine for learning but catastrophic for teams: simultaneous applies from different machines will corrupt state. Production Terraform must use a remote backend with state locking. The recommended pattern for AWS is an S3 backend with a DynamoDB table for state locking.

# backend.tf — remote state with locking
terraform {
  backend "s3" {
    bucket         = "mycompany-terraform-state"
    key            = "environments/production/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"
  }
}

State files often contain sensitive data (database passwords, certificates, private keys) written as outputs. Always enable S3 server-side encryption and restrict bucket access to CI/CD roles and senior engineers only. Consider using Terraform's sensitive variable marking to prevent secrets from appearing in plan output.

Writing Production-Grade Modules

A good Terraform module has three qualities: it is reusable across environments, it exposes only the variables that genuinely need to vary between uses, and it outputs the values that callers typically need to pass to other modules.

# modules/rds-postgres/variables.tf
variable "identifier" {
  description = "Unique identifier for this RDS instance"
  type        = string
}
variable "instance_class" {
  description = "RDS instance type (e.g. db.t3.medium)"
  type        = string
  default     = "db.t3.medium"
}
variable "allocated_storage_gb" {
  description = "Initial storage in GB"
  type        = number
  default     = 50
}
variable "multi_az" {
  description = "Enable Multi-AZ for high availability"
  type        = bool
  default     = false  # default off; production sets true
}
variable "vpc_security_group_ids" {
  description = "Security groups that control inbound access"
  type        = list(string)
}
variable "subnet_ids" {
  description = "Private subnet IDs for the DB subnet group"
  type        = list(string)
}
variable "database_name" { type = string }
variable "master_username" { type = string }
variable "master_password" {
  type      = string
  sensitive = true
}
# modules/rds-postgres/main.tf
resource "aws_db_subnet_group" "this" {
  name       = "${var.identifier}-subnet-group"
  subnet_ids = var.subnet_ids
}
resource "aws_db_instance" "this" {
  identifier              = var.identifier
  engine                  = "postgres"
  engine_version          = "16.3"
  instance_class          = var.instance_class
  allocated_storage       = var.allocated_storage_gb
  max_allocated_storage   = var.allocated_storage_gb * 5  # auto-scaling
  storage_encrypted       = true
  multi_az                = var.multi_az
  deletion_protection     = true
  backup_retention_period = 7
  skip_final_snapshot     = false
  final_snapshot_identifier = "${var.identifier}-final"
  db_name  = var.database_name
  username = var.master_username
  password = var.master_password
  db_subnet_group_name   = aws_db_subnet_group.this.name
  vpc_security_group_ids = var.vpc_security_group_ids
  tags = {
    Environment = terraform.workspace
    ManagedBy   = "terraform"
  }
}

The Terraform Workflow in a Team

Solo Terraform is straightforward. Team Terraform requires process discipline to avoid state conflicts, accidental applies, and configuration drift. The recommended workflow: infrastructure changes are proposed as pull requests containing Terraform code. CI runs terraform fmt --check, terraform validate, and terraform plan on every PR, posting the plan output as a PR comment. Human reviewers review both the code and the plan. Merging the PR triggers terraform apply automatically via CI. Nothing is ever applied manually — all changes flow through the PR pipeline.

Tools like Atlantis (self-hosted) or Terraform Cloud automate this workflow. Atlantis runs as a service that responds to PR comments with atlantis plan and atlantis apply commands, posting plan output and requiring approvals before applying.

Common Mistakes to Avoid

Storing secrets in tfvars files committed to Git: Use AWS Secrets Manager, HashiCorp Vault, or GitHub Actions secrets to inject sensitive values at apply time. Never commit passwords to version control.

Manual state manipulation with terraform state commands: State surgery should be a last resort. If you find yourself regularly manipulating state, it indicates a structural problem with your module design.

Not pinning provider versions: Always specify provider version constraints. A provider upgrade can introduce breaking changes that silently alter resource behavior.

Creating giant monolithic root modules: A single 5,000-line main.tf is unmaintainable and causes full plan recalculation for every change. Decompose into focused modules and separate environment stacks.

"Treat your Terraform code with the same rigor as application code: code review, automated testing, version control, and a clear promotion path from dev to production."

Key Takeaways

  • Separate reusable modules from environment-specific compositions in your repository structure.
  • Always use a remote backend with state locking; local state is not viable for teams.
  • Enforce the full PR workflow — plan in CI, human review, automated apply on merge.
  • Pin provider versions to prevent silent breaking changes.
  • Never store secrets in committed files; inject them at apply time from secrets managers.

Related Articles

Discussion / Comments

Join the conversation — your comment goes directly to my inbox.

← Back to Blog