How Drift Detection Helps Maintain a Secure Infrastructure
Drift detection is essential to maintaining efficiency while minimizing risk and giving teams confidence in your infrastructure as code (IaC) provisioning workflow.
As HashiCorp’s CISO, I understand how infrastructure as code (IaC) practices enhance security and data governance compliance. But even with a standardized IaC workflow in place, including guardrails such as policy as code, IaC isn’t a panacea.
In the real world, infrastructure will continue to be changed and updated in response to an organization’s goals and unforeseen events. It’s hard to eliminate every instance of infrastructure state modification that isn’t tracked in your source of truth. Many organizations frequently end up with state files that don’t match the actual running infrastructure — a phenomenon known as “drift”. This can dramatically undercut the reliability and security benefits of IaC.
To maximize the value of infrastructure as code, it’s important to understand the causes of infrastructure drift, what the impact can be — especially on security — and the best ways to implement drift detection and remediation to help solve the problem.
» What Causes Infrastructure Drift
Drift can occur for many reasons. First off, there may be cases where everyone in the organization is not using the established IaC workflows. That can create unrecorded differences between the infrastructure defined in code and the actual current state.
Emergencies are another common cause. In the midst of a “break-glass incident,” response management teams sometimes decide to bypass standard procedures for patching the infrastructure in order to fix the problem as quickly as possible. These kinds of shortcuts can cause changes to the resources that are tough to track and resolve in the code.
In addition, basic systems updates on cloud or service-provider systems can also accrue over time, resulting in significant drift as your infrastructure rules and provider systems gradually grow apart. For example, simple API changes (often for third-party services) might affect your infrastructure without being tracked in code.
Finally, cascading effects can make drift detection even more complex. When changing or creating new infrastructure resources, for example, there could be unexpected associated resources that aren’t codified. This creates a cascading effect of changing resource states affecting one another without anyone being aware of it.
» The Impact of Drift on Infrastructure Security and Functionality
As cloud adoption grows, organizational resources and processes become increasingly complex, which can create inconsistencies around the state of the infrastructure. Without standard procedures, notifications, or guidelines for adjustments, even temporary changes or the smallest tweaks to infrastructure can have significant impacts on the business including unplanned downtime, audit findings, security incidents, rework, and unused resources.
Most importantly, unrecognized infrastructure drift creates multiple security risks that need to be addressed before they become real problems. Drift can dramatically increase the probability of critical data exposures, perhaps due to mission-critical systems left open to public access by mistake or other unknown resources being left unsecured.
Additionally, development teams unaware of production environment changes not reflected in the IaC systems will almost certainly have to contend with applications “suddenly” crashing and deployment projects that unexpectedly fail.
» From Drift Detection to Drift Remediation
So, how can organizations best handle drift detection, and what can they do to remediate the situation when drift is detected? Some companies opt to build in-house tooling that checks all states for drift at once and then sends reports via email to all users. But this makes it difficult to differentiate necessary changes from unneeded ones, since there’s no context behind the changes. Plus, it's up to you to make the manual changes to the resource or the recorded IaC state. This approach is too time-consuming to be scalable.
The underlying solution to these challenges comes down to answering two key questions:
- How do you ensure the actual infrastructure reflects the recorded infrastructure state and notifies the right people to take corrective action on any detected drift?
- Can the drift detection solution prioritize efficiency and simplicity by working seamlessly with the IaC platform while being a central source of information for the engineering and security teams?
» Centralized Visibility Is Critical for Drift Detection
Ultimately, teams concerned with drift should look for integrated drift-detection solutions. Ideally, this type of system would include all-in-one automated provisioning and central management so development teams can continuously monitor the infrastructure state to detect changes. Operating from a consolidated environment, the system should be able to send immediate notifications to the appropriate teams so they can take specific corrective actions any time a resource is altered.
For CISOs concerned with narrowing security gaps — both the kind they know about and the previously undetectable ones created by infrastructure drift — this type of solution can help strengthen the organization’s overall security posture without adding undue operational burdens.
Specifically, an integrated drift-detection approach can significantly reduce the potential for application downtime that could negatively impact user experience and, eventually, revenue. It can also empower teams to track and quickly address system changes, identify who made them and why, and record those changes for future reference or to adjust the standard workflow as needed.
Finally, a robust drift-detection system can boost operational agility by giving teams a consistent single source of truth from which they can collaborate. Working from the same information avoids the need to buy or develop custom tooling or deal with manual actions to refresh the state — all while granting superior visibility and accelerating time to resolution.
» Automate, Detect, and Alert
To recap, automated infrastructure provisioning offers significant productivity and security benefits. But what about when your infrastructure changes and the actual state isn’t reflected in the recorded IaC state? Drift is an unfortunate side effect of modern, dynamic infrastructure, where changes are made constantly.
To minimize the impact of infrastructure drift, you need a drift-detection system that gives your operations teams visibility and alerts the appropriate people to take action when needed. Working together systematically under a standardized process with centralized, automated tools promises to reduce risk, deliver greater system visibility and give teams the ability to resolve infrastructure issues more quickly.
HashiCorp Terraform provides built-in functionality for infrastructure automation with workflows to build, compose, collaborate, and reuse infrastructure as code and provides drift detection features. See Melar Chen’s blog post Drift Detection for Terraform Cloud is Now Generally Available for more information, or try Terraform Cloud for free to provision, change, and version infrastructure resources on any environment.
A version of this blog post was originally published on The New Stack.
Sign up for the latest HashiCorp news
More blog posts like this one
Fix the developers vs. security conflict by shifting further left
Resolve the friction between dev and security teams with platform-led workflows that make cloud security seamless and scalable.
HashiCorp at AWS re:Invent: Your blueprint to cloud success
If you’re attending AWS re:Invent in Las Vegas, Dec. 2 - Dec. 6th, visit us for breakout sessions, expert talks, and product demos to learn how to take a unified approach to Infrastructure and Security Lifecycle Management.
Speed up app delivery with automated cancellation of plan-only Terraform runs
Automatic cancellation of plan-only runs allows customers to easily cancel any unfinished runs for outdated commits to speed up application delivery.