Drift Detection & Remediation
Infrastructure drift is the gap between what your Terraform configuration says should exist and what actually exists in the cloud. It happens constantly in production: an engineer makes a hotfix through the console, an automated process modifies a resource, a cloud provider changes a default setting. Left unaddressed, drift accumulates until a Terraform apply does something unexpected.
What Is Infrastructure Drift?
Terraform's state file is a snapshot of what Terraform last observed. Drift occurs when the real infrastructure diverges from both the state and the configuration. There are three flavours:
| Type | What happened | Terraform sees |
|---|---|---|
| Attribute drift | A resource attribute was changed outside Terraform (e.g., instance type modified in console) | Plan shows an update to restore the configured value |
| Resource deleted | A resource was deleted outside Terraform | Plan shows a create (Terraform wants to recreate it) |
| Resource added | A resource was created outside Terraform, not in state | Terraform is unaware — no plan output until imported |
The third type — resources created outside Terraform — is invisible to drift detection until you import them. The first two are what Terraform's refresh mechanism catches.
Detecting Drift with Terraform
Every terraform plan begins with a refresh step: Terraform calls the provider API for each resource in state and compares the live attributes against what state recorded. Any differences appear in the plan as changes.
terraform plan
# Terraform will perform the following actions:
#
# # aws_security_group.app will be updated in-place
# ~ resource "aws_security_group" "app" {
# ~ ingress = [
# - {
# cidr_blocks = ["10.0.0.0/8"]
# from_port = 22
# protocol = "tcp"
# to_port = 22
# },
# # (config rule still present)
# ]
# }
#
# Plan: 0 to add, 1 to change, 0 to destroy.
Here, an SSH ingress rule that exists in the config was removed from the security group outside Terraform. A plan caught it. An apply would restore it.
The refresh step makes one API call per resource in state. Configurations managing thousands of resources can take minutes just to refresh. Use -target for focused plans, or structure your configurations into smaller, focused root modules.
terraform plan -refresh-only
Terraform 0.15.4+ has a dedicated mode for drift inspection: terraform plan -refresh-only. It reads current resource attributes from the provider and shows what's changed — but it proposes only a state update, not real infrastructure changes:
terraform plan -refresh-only
# Terraform will perform the following actions:
#
# # aws_instance.app will be updated in the Terraform state
# ~ resource "aws_instance" "app" {
# ~ instance_type = "t3.small" -> "t3.medium"
# # (the instance was scaled up in the console)
# }
#
# This plan does NOT make changes to real infrastructure.
# It updates the Terraform state to match current infrastructure.
terraform apply -refresh-only
This is useful when you want to accept the drift — for example, an operator scaled up an instance as an emergency fix, and you want state to reflect that before updating the config. Apply the refresh-only plan, then update the configuration to match the new state.
Automated Drift Detection
Running plan manually only catches drift when someone thinks to check. Automated drift detection runs plans on a schedule and alerts when drift is found.
GitHub Actions scheduled plan
name: Drift Detection
on:
schedule:
- cron: '0 8 * * 1-5' # 8am weekdays
workflow_dispatch:
jobs:
detect-drift:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: "~1.9"
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
aws-region: us-east-1
- name: Terraform Init
run: terraform init
- name: Detect Drift
id: plan
run: |
terraform plan -detailed-exitcode -refresh-only -out=drift.tfplan 2>&1 | tee plan.txt
echo "exitcode=$?" >> $GITHUB_OUTPUT
continue-on-error: true
- name: Notify on drift
if: steps.plan.outputs.exitcode == '2'
uses: slackapi/slack-github-action@v1
with:
payload: |
{"text": "⚠️ Infrastructure drift detected in ${{ github.repository }}. Review: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"}
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
terraform plan exits with code 2 when there are changes, 0 when clean, and 1 on error — making it CI-friendly for drift alerting.
HCP Terraform drift detection
HCP Terraform (formerly Terraform Cloud) has built-in drift detection for paid tiers. Enable it per workspace — HCP Terraform runs refresh-only plans on a schedule and sends notifications when drift is found. No pipeline setup required.
Remediating Drift
Once drift is detected, you have three options:
| Option | Command | When to use |
|---|---|---|
| Restore to config | terraform apply | The config is correct; the manual change was wrong or unauthorized |
| Accept the change | terraform apply -refresh-only then update config | The manual change was intentional; update config to match |
| Ignore permanently | lifecycle { ignore_changes = [...] } | Some attributes are intentionally managed outside Terraform |
Restoring to config
The standard terraform apply will overwrite any manual changes with the configured values. This is the right call when the drift was unauthorized or accidental:
terraform plan # review what will change
terraform apply # restore the configured state
Accepting the drift
When the manual change was correct (e.g., a hotfix that needs to stay), accept it into state first, then update the config to match:
# 1. Accept the live attribute values into state
terraform apply -refresh-only
# 2. Update the configuration to match
# (edit main.tf to reflect the new instance type, tag, etc.)
# 3. Verify the plan is now clean
terraform plan # should show: No changes.
Accepting Drift: lifecycle ignore_changes
Some attributes are intentionally managed outside Terraform and should never be overwritten. The lifecycle meta-argument's ignore_changes list tells Terraform to skip specific attributes during plan and apply:
resource "aws_instance" "app" {
ami = data.aws_ami.ubuntu.id
instance_type = "t3.small"
lifecycle {
# instance_type is scaled by the autoscaling group — don't restore it
ignore_changes = [instance_type]
}
}
resource "aws_autoscaling_group" "app" {
min_size = 2
max_size = 10
lifecycle {
# desired_capacity is managed by scaling policies at runtime
ignore_changes = [desired_capacity]
}
}
Use ignore_changes = all sparingly — it means Terraform will never update the resource after creation, which defeats much of the purpose of managing it with Terraform.
Add a comment explaining why each attribute is ignored. Future maintainers won't know if the ignore was intentional or an oversight, and may remove it — causing a surprise apply that overwrites production state.
Preventing Drift
Drift remediation is reactive. The goal is to prevent it from accumulating in the first place:
- Enforce IaC-only changes — use IAM policies or cloud organization policies to deny console/CLI modifications to Terraform-managed resources. Route all changes through pull requests.
- Run drift detection on a schedule — a daily or hourly plan catches drift quickly, before it becomes an incident.
- Lock down the state bucket — restrict who can modify the state file directly. Unauthorized state edits create drift that doesn't show up in plan output.
- Use
prevent_destroy— for critical resources, addlifecycle { prevent_destroy = true }to block accidental deletion through Terraform or out-of-band. - Keep configurations focused — large monolithic configurations are harder to plan and review. Split into small, focused root modules that are applied independently.
resource "aws_rds_cluster" "primary" {
# ...
lifecycle {
prevent_destroy = true
}
}
Key Takeaways
- Drift is the gap between your configuration and real infrastructure. It accumulates from console changes, scripts, and provider-side modifications.
- Every
terraform planrefreshes state from the provider and catches attribute drift and deleted resources. Resources created outside Terraform are invisible until imported. terraform plan -refresh-onlyshows drift without proposing infrastructure changes — use it for auditing and for accepting intentional manual changes into state.- Automate drift detection with a scheduled CI/CD plan that exits with code
2on changes and alerts your team. - To remediate:
terraform applyrestores config;terraform apply -refresh-only+ config update accepts the change;ignore_changespermanently exempts an attribute. - Prevent drift at the source: enforce IaC-only change workflows, lock down state, and run scheduled detection.