Terraform Workflow

Drift Detection & Remediation

● Advanced ⏱ 20 min read terraform

Infrastructure drift is the gap between what your Terraform configuration says should exist and what actually exists in the cloud. It happens constantly in production: an engineer makes a hotfix through the console, an automated process modifies a resource, a cloud provider changes a default setting. Left unaddressed, drift accumulates until a Terraform apply does something unexpected.

What Is Infrastructure Drift?

Terraform's state file is a snapshot of what Terraform last observed. Drift occurs when the real infrastructure diverges from both the state and the configuration. There are three flavours:

Type	What happened	Terraform sees
Attribute drift	A resource attribute was changed outside Terraform (e.g., instance type modified in console)	Plan shows an update to restore the configured value
Resource deleted	A resource was deleted outside Terraform	Plan shows a create (Terraform wants to recreate it)
Resource added	A resource was created outside Terraform, not in state	Terraform is unaware — no plan output until imported

The third type — resources created outside Terraform — is invisible to drift detection until you import them. The first two are what Terraform's refresh mechanism catches.

Detecting Drift with Terraform

Every terraform plan begins with a refresh step: Terraform calls the provider API for each resource in state and compares the live attributes against what state recorded. Any differences appear in the plan as changes.

terraform plan
# Terraform will perform the following actions:
#
#   # aws_security_group.app will be updated in-place
#   ~ resource "aws_security_group" "app" {
#       ~ ingress = [
#           - {
#               cidr_blocks = ["10.0.0.0/8"]
#               from_port   = 22
#               protocol    = "tcp"
#               to_port     = 22
#             },
#           # (config rule still present)
#         ]
#     }
#
# Plan: 0 to add, 1 to change, 0 to destroy.

Here, an SSH ingress rule that exists in the config was removed from the security group outside Terraform. A plan caught it. An apply would restore it.

⚠️

Large state files slow down the refresh

The refresh step makes one API call per resource in state. Configurations managing thousands of resources can take minutes just to refresh. Use -target for focused plans, or structure your configurations into smaller, focused root modules.

terraform plan -refresh-only

Terraform 0.15.4+ has a dedicated mode for drift inspection: terraform plan -refresh-only. It reads current resource attributes from the provider and shows what's changed — but it proposes only a state update, not real infrastructure changes:

terraform plan -refresh-only
# Terraform will perform the following actions:
#
#   # aws_instance.app will be updated in the Terraform state
#   ~ resource "aws_instance" "app" {
#       ~ instance_type = "t3.small" -> "t3.medium"
#         # (the instance was scaled up in the console)
#     }
#
# This plan does NOT make changes to real infrastructure.
# It updates the Terraform state to match current infrastructure.

terraform apply -refresh-only

This is useful when you want to accept the drift — for example, an operator scaled up an instance as an emergency fix, and you want state to reflect that before updating the config. Apply the refresh-only plan, then update the configuration to match the new state.

Automated Drift Detection

Running plan manually only catches drift when someone thinks to check. Automated drift detection runs plans on a schedule and alerts when drift is found.

GitHub Actions scheduled plan

name: Drift Detection

on:
  schedule:
    - cron: '0 8 * * 1-5'   # 8am weekdays
  workflow_dispatch:

jobs:
  detect-drift:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: "~1.9"

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
          aws-region: us-east-1

      - name: Terraform Init
        run: terraform init

      - name: Detect Drift
        id: plan
        run: |
          terraform plan -detailed-exitcode -refresh-only -out=drift.tfplan 2>&1 | tee plan.txt
          echo "exitcode=$?" >> $GITHUB_OUTPUT
        continue-on-error: true

      - name: Notify on drift
        if: steps.plan.outputs.exitcode == '2'
        uses: slackapi/slack-github-action@v1
        with:
          payload: |
            {"text": "⚠️ Infrastructure drift detected in ${{ github.repository }}. Review: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"}
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}

terraform plan exits with code 2 when there are changes, 0 when clean, and 1 on error — making it CI-friendly for drift alerting.

HCP Terraform drift detection

HCP Terraform (formerly Terraform Cloud) has built-in drift detection for paid tiers. Enable it per workspace — HCP Terraform runs refresh-only plans on a schedule and sends notifications when drift is found. No pipeline setup required.

Remediating Drift

Once drift is detected, you have three options:

Option	Command	When to use
Restore to config	`terraform apply`	The config is correct; the manual change was wrong or unauthorized
Accept the change	`terraform apply -refresh-only` then update config	The manual change was intentional; update config to match
Ignore permanently	`lifecycle { ignore_changes = [...] }`	Some attributes are intentionally managed outside Terraform

Restoring to config

The standard terraform apply will overwrite any manual changes with the configured values. This is the right call when the drift was unauthorized or accidental:

terraform plan   # review what will change
terraform apply  # restore the configured state

Accepting the drift

When the manual change was correct (e.g., a hotfix that needs to stay), accept it into state first, then update the config to match:

# 1. Accept the live attribute values into state
terraform apply -refresh-only

# 2. Update the configuration to match
# (edit main.tf to reflect the new instance type, tag, etc.)

# 3. Verify the plan is now clean
terraform plan   # should show: No changes.

Accepting Drift: lifecycle ignore_changes

Some attributes are intentionally managed outside Terraform and should never be overwritten. The lifecycle meta-argument's ignore_changes list tells Terraform to skip specific attributes during plan and apply:

resource "aws_instance" "app" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = "t3.small"

  lifecycle {
    # instance_type is scaled by the autoscaling group — don't restore it
    ignore_changes = [instance_type]
  }
}

resource "aws_autoscaling_group" "app" {
  min_size = 2
  max_size = 10

  lifecycle {
    # desired_capacity is managed by scaling policies at runtime
    ignore_changes = [desired_capacity]
  }
}

Use ignore_changes = all sparingly — it means Terraform will never update the resource after creation, which defeats much of the purpose of managing it with Terraform.

🧭

Document why you're ignoring changes

Add a comment explaining why each attribute is ignored. Future maintainers won't know if the ignore was intentional or an oversight, and may remove it — causing a surprise apply that overwrites production state.

Preventing Drift

Drift remediation is reactive. The goal is to prevent it from accumulating in the first place:

Enforce IaC-only changes — use IAM policies or cloud organization policies to deny console/CLI modifications to Terraform-managed resources. Route all changes through pull requests.
Run drift detection on a schedule — a daily or hourly plan catches drift quickly, before it becomes an incident.
Lock down the state bucket — restrict who can modify the state file directly. Unauthorized state edits create drift that doesn't show up in plan output.
Use prevent_destroy — for critical resources, add lifecycle { prevent_destroy = true } to block accidental deletion through Terraform or out-of-band.
Keep configurations focused — large monolithic configurations are harder to plan and review. Split into small, focused root modules that are applied independently.

resource "aws_rds_cluster" "primary" {
  # ...

  lifecycle {
    prevent_destroy = true
  }
}

Key Takeaways

Drift is the gap between your configuration and real infrastructure. It accumulates from console changes, scripts, and provider-side modifications.
Every terraform plan refreshes state from the provider and catches attribute drift and deleted resources. Resources created outside Terraform are invisible until imported.
terraform plan -refresh-only shows drift without proposing infrastructure changes — use it for auditing and for accepting intentional manual changes into state.
Automate drift detection with a scheduled CI/CD plan that exits with code 2 on changes and alerts your team.
To remediate: terraform apply restores config; terraform apply -refresh-only + config update accepts the change; ignore_changes permanently exempts an attribute.
Prevent drift at the source: enforce IaC-only change workflows, lock down state, and run scheduled detection.