Terraform Workflow

Drift Detection & Remediation

● Advanced ⏱ 20 min read terraform

Infrastructure drift is the gap between what your Terraform configuration says should exist and what actually exists in the cloud. It happens constantly in production: an engineer makes a hotfix through the console, an automated process modifies a resource, a cloud provider changes a default setting. Left unaddressed, drift accumulates until a Terraform apply does something unexpected.

What Is Infrastructure Drift?

Terraform's state file is a snapshot of what Terraform last observed. Drift occurs when the real infrastructure diverges from both the state and the configuration. There are three flavours:

TypeWhat happenedTerraform sees
Attribute driftA resource attribute was changed outside Terraform (e.g., instance type modified in console)Plan shows an update to restore the configured value
Resource deletedA resource was deleted outside TerraformPlan shows a create (Terraform wants to recreate it)
Resource addedA resource was created outside Terraform, not in stateTerraform is unaware — no plan output until imported

The third type — resources created outside Terraform — is invisible to drift detection until you import them. The first two are what Terraform's refresh mechanism catches.

Detecting Drift with Terraform

Every terraform plan begins with a refresh step: Terraform calls the provider API for each resource in state and compares the live attributes against what state recorded. Any differences appear in the plan as changes.

terraform plan
# Terraform will perform the following actions:
#
#   # aws_security_group.app will be updated in-place
#   ~ resource "aws_security_group" "app" {
#       ~ ingress = [
#           - {
#               cidr_blocks = ["10.0.0.0/8"]
#               from_port   = 22
#               protocol    = "tcp"
#               to_port     = 22
#             },
#           # (config rule still present)
#         ]
#     }
#
# Plan: 0 to add, 1 to change, 0 to destroy.

Here, an SSH ingress rule that exists in the config was removed from the security group outside Terraform. A plan caught it. An apply would restore it.

⚠️
Large state files slow down the refresh

The refresh step makes one API call per resource in state. Configurations managing thousands of resources can take minutes just to refresh. Use -target for focused plans, or structure your configurations into smaller, focused root modules.

terraform plan -refresh-only

Terraform 0.15.4+ has a dedicated mode for drift inspection: terraform plan -refresh-only. It reads current resource attributes from the provider and shows what's changed — but it proposes only a state update, not real infrastructure changes:

terraform plan -refresh-only
# Terraform will perform the following actions:
#
#   # aws_instance.app will be updated in the Terraform state
#   ~ resource "aws_instance" "app" {
#       ~ instance_type = "t3.small" -> "t3.medium"
#         # (the instance was scaled up in the console)
#     }
#
# This plan does NOT make changes to real infrastructure.
# It updates the Terraform state to match current infrastructure.
terraform apply -refresh-only

This is useful when you want to accept the drift — for example, an operator scaled up an instance as an emergency fix, and you want state to reflect that before updating the config. Apply the refresh-only plan, then update the configuration to match the new state.

Automated Drift Detection

Running plan manually only catches drift when someone thinks to check. Automated drift detection runs plans on a schedule and alerts when drift is found.

GitHub Actions scheduled plan

name: Drift Detection

on:
  schedule:
    - cron: '0 8 * * 1-5'   # 8am weekdays
  workflow_dispatch:

jobs:
  detect-drift:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: "~1.9"

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
          aws-region: us-east-1

      - name: Terraform Init
        run: terraform init

      - name: Detect Drift
        id: plan
        run: |
          terraform plan -detailed-exitcode -refresh-only -out=drift.tfplan 2>&1 | tee plan.txt
          echo "exitcode=$?" >> $GITHUB_OUTPUT
        continue-on-error: true

      - name: Notify on drift
        if: steps.plan.outputs.exitcode == '2'
        uses: slackapi/slack-github-action@v1
        with:
          payload: |
            {"text": "⚠️ Infrastructure drift detected in ${{ github.repository }}. Review: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"}
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}

terraform plan exits with code 2 when there are changes, 0 when clean, and 1 on error — making it CI-friendly for drift alerting.

HCP Terraform drift detection

HCP Terraform (formerly Terraform Cloud) has built-in drift detection for paid tiers. Enable it per workspace — HCP Terraform runs refresh-only plans on a schedule and sends notifications when drift is found. No pipeline setup required.

Remediating Drift

Once drift is detected, you have three options:

OptionCommandWhen to use
Restore to configterraform applyThe config is correct; the manual change was wrong or unauthorized
Accept the changeterraform apply -refresh-only then update configThe manual change was intentional; update config to match
Ignore permanentlylifecycle { ignore_changes = [...] }Some attributes are intentionally managed outside Terraform

Restoring to config

The standard terraform apply will overwrite any manual changes with the configured values. This is the right call when the drift was unauthorized or accidental:

terraform plan   # review what will change
terraform apply  # restore the configured state

Accepting the drift

When the manual change was correct (e.g., a hotfix that needs to stay), accept it into state first, then update the config to match:

# 1. Accept the live attribute values into state
terraform apply -refresh-only

# 2. Update the configuration to match
# (edit main.tf to reflect the new instance type, tag, etc.)

# 3. Verify the plan is now clean
terraform plan   # should show: No changes.

Accepting Drift: lifecycle ignore_changes

Some attributes are intentionally managed outside Terraform and should never be overwritten. The lifecycle meta-argument's ignore_changes list tells Terraform to skip specific attributes during plan and apply:

resource "aws_instance" "app" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = "t3.small"

  lifecycle {
    # instance_type is scaled by the autoscaling group — don't restore it
    ignore_changes = [instance_type]
  }
}
resource "aws_autoscaling_group" "app" {
  min_size = 2
  max_size = 10

  lifecycle {
    # desired_capacity is managed by scaling policies at runtime
    ignore_changes = [desired_capacity]
  }
}

Use ignore_changes = all sparingly — it means Terraform will never update the resource after creation, which defeats much of the purpose of managing it with Terraform.

🧭
Document why you're ignoring changes

Add a comment explaining why each attribute is ignored. Future maintainers won't know if the ignore was intentional or an oversight, and may remove it — causing a surprise apply that overwrites production state.

Preventing Drift

Drift remediation is reactive. The goal is to prevent it from accumulating in the first place:

resource "aws_rds_cluster" "primary" {
  # ...

  lifecycle {
    prevent_destroy = true
  }
}

Key Takeaways