FAQ

Terraform for platform engineers

Published 7:00 PM UTC Aug 19, 2024

Platform engineering is the new standard for infrastructure, that means Terraform is becoming the core engine for platform teams. Watch Armon Dadgar outline what platform engineers need to know about Terraform.

»Transcript

Hey. Today, I wanted to do a quick video talking about what's the value of Terraform for platform engineers and platform teams more generally. I think when we start by saying platform teams or platform engineers, it's almost helpful to start by describing the different personas and what are they trying to solve for?

»Application teams

I think, for most people, it's pretty obvious what the application teams want. So if we think about an app team, they're building their application or their service. And, fundamentally, what they care about is their own application lifecycle. They care about delivering new features, fixing bugs, being able to iterate rapidly on their application.

»Platform teams

But if you think about a platform team, whether that's sort of your cloud team, your DevOps team, your platform team, the ops team, different organizations will call it different things; they actually have a different set of concerns, largely around infrastructure lifecycle.

When we think about infrastructure lifecycle, it's a little bit divorced from the app itself. Obviously it supports the application; the application's running on top of a set of infrastructure. But there's usually a whole bunch of secondary partners that platform teams have to work with.

It might be their:

Security teams
Compliance teams
Finance teams

Because you're worried about a whole bunch of other requirements: GRC (governance, risk, control) requirements; is my platform PCI-compliant, for example.

I have security teams who are worried if my platform is secure. Are things patched? Am I taking care of vulnerabilities?

I care about my finance and FinOps teams because they're worried about things like the cost. Am I optimizing my cloud spending?

The platform team really has almost a primary customer, which is the app team. How do you enable them to build, deploy, manage their application? But you have a bunch of secondary customers as well to make sure the infrastructure is secure, it's patched, it's cost-effective, etc. So you have a bunch of these different constraints. When we think about them, well, what's the role of HashiCorp Terraform in enabling the platform engineers?

»Terraform for app teams

It's a little bit different than when you think about Terraform for the app teams.Terraform for the app teams is relatively simple. They're writing infrastructure as code to deploy their app. That's what they care about.

»Terraform for platform teams

When we think about it for platform teams, you're still doing the same thing. Obviously, Terraform itself doesn't change. We're still writing things as infrastructure code, but then we start to think about how to address some of these other concerns that the platform engineering teams have. And at the heart of this, it's about starting to think about how to industrialize your approach to cloud management so that these things get baked into the process and the app teams don't have to think about them.

»Standardizing Terraform modules

I think where most of this starts is driving a level of standardization around things like Terraform modules. I don't want my application teams to have to redefine how to provision a database, how to provision my Kubernetes cluster, or how to deploy a generic Java application. As a platform organization, I want to define those things as a shared set of modules so that my developers can come in and simply consume those. Oftentimes from a registry or library of these pre-written modules.

That's one piece of it: how am I simplifying the consumption of this and having a standardization of patterns?

»Policy as code

Then when we start thinking about things like GRCs, security, and these other pieces. What we want to enable those platform teams to do is still have self-service for the application teams, but bring some of those controls in so that you're not worried that a developer is taking the pattern and going way off rails and introducing, potentially, a GRC issue for you.

This is where things like policy as code come in. Whether you're using Sentinel, which is a HashiCorp policy language to define various controls, or you're using something like Open Policy Agent (OPA) or Rego, you can define policy as code, and those policies can span different kinds of constraints.

It could be a GRC constraint where you have to have these certain flags enabled to make sure my infrastructure is secure. It could be a set of security constraints; what resources you're allowed to use, or you can't set an S3 bucket to be public to the internet. That could be a security policy.

Similarly, you might have cost policies that restrict what instance types you're allowed to use, or how many clusters you're allowed to ask for, or how many nodes in a VM autoscaling group you can request, things like that. They could be cost controls.

So it doesn't really matter what type of policy it is, it becomes a consistent way that, as a platform team, we can impose that on the way that developers are consuming the infrastructure, we can enforce these controls without sacrificing some of that self-service that the application teams have. This becomes key to it.

»Third-party product controls

Oftentimes, we might be using external tooling for some of this as well. So we might be using Palo Alto Prisma Cloud, or we might be using Wiz, or we might be using Turbonomics to do optimization on these things. Then we think about how to create a integration surface area so that we can connect these external tools that are being used for security, or privacy, or cost, or these other things, into the workflow that the developers are using.

Within something like HCP Terraform (formerly named Terraform Cloud), we might use what we call run tasks where we can interpose between a Terraform plan and apply operation.

In between plan and apply, we know what Terraform changes are going to happen, but they haven't happened yet. We can flow those out to a third-party system. They can say, hey, you're actually violating a set of security rules, block that action. That way the developers or the app teams can get immediate feedback and the platform teams can maintain control and do it with a centralized type of approach.

There's a number of these different kind of capabilities, and when we think about how the platform teams care about something that the app teams might not, how do we enable the platform teams to think about those concerns and enact those concerns within a common pipeline?

»Internal developer portal

Then as we think about going one step further, ultimately for most application teams, it goes back to the fact that what they really only care about their app lifecycle. They actually don't even care about things like infrastructure as code. For many application teams, having to learn something like Terraform is sort of a distraction. They're like: "I want to work on my Java app. I couldn't care less how infrastructure as code works or how Terraform works. I still want to be able to consume these modules, but I don't have to necessarily learn Terraform."

This starts to get you into the realm of things like internal developer portals. So if we think about an internal developer portal, you'll often hear these sorts of abbreviated because it's a mouthful to an "IDP". The goal is really how do I simplify that consumption experience for the app teams so that they don't have to think about infrastructure as code or Terraform necessarily. They can focus on, hey, I have a Java app, it needs a Mongo database, it has a Redis cluster associated with it. Make that happen for me.

And that's really where we're focused with our HCP Waypoint offering. So Waypoint really is the HashiCorp internal developer portal, and the idea is to tightly integrate that with things like HCP Terraform. The goal being that our platform engineers can instead define these golden modules to say here's how a Java app or a Redis or a Mongo cluster work. Those are defined as Terraform. The platform teams tightly control how they work, how they're configured, how we address some of these considerations.

What we're exposing to the developers and the app teams is a higher level abstraction. They're just coming and saying, give me a Java app with this set of add-ons associated with it. I don't really have to care how it works. Ultimately, over time, what that enables is what we refer to as a set of "golden patterns".

These golden patterns are obviously defined through a set of modules, and that solves the Day 1 provisioning problem. Then you get to the Day 2 challenge, which is great, just because I've defined some infrastructure, might be my Java app, I have my Mongo cluster, I have Redis. I still have a set of Day 2 challenges: I have to deploy a new version of this Java app. I've deployed version 1, but now my development team wants to push version 2.

How do I build a new version? How do I deploy it? How do I manage that lifecycle? Maybe I need to create an index in my Mongo cluster. Maybe I need to purge the cache within Redis. So there's a set of these runbooks or Day 2 actions that somehow need to be exposed as well.

We refer to that as a set of "golden workflows". And these are 'Day 2' in the sense that after I've provisioned my infrastructure, I have a set of actions that I might need to invoke. And that's where Waypoint is designed to allow platform teams to define those things as well.

I might say: here's how to do a build of a new version of your Java app. Here's how to build an index for your Mongo cluster. Here's how to purge the cache for Redis. Each of these might be a set of defined actions, and these are defined by the platform team because they know how it should work given the definition of their infrastructure and given the set of concerns that they have around, i.e. how do I make sure that it's done in a secure way and that the user actually has the right to invoke that particular action.

And so we want to be able to expose those at a higher level, but then really enable platform teams to define and model them with Terraform fundamentally.

»Bringing everything together

Taking a step back, I hope this is a little bit helpful as you think about when we say Terraform for platform teams, what do we really mean? It's about separation of those two personas. What does an app team want out of infrastructure versus the concerns that a platform team or an ops team is solving for? Those include a bunch of other secondary problems around privacy, GRC, security cost, etc. And so it's really around enabling those platform teams to create standardization through shared module registries to drive those concerns through policy as code, to have common ways of integration with third-party systems to enforce those controls, to have a common system of record. That's also the other side of this.

I want a system of record, so I actually know the changes everyone made. What's all the infrastructure under management? Do I have visibility to all of my infrastructure to enforce those various concerns? And then ultimately over time, as we get more mature, how can we expose that to developers through an internal developer portal?

One example of it might be HCP Waypoint, but you might be building your own with Backstage. You might be integrating Terraform behind a system like ServiceNow. So it might be many front doors in terms of how a developer comes to consume it, but ultimately, as a platform team, I want a single way of managing it, enforcing policy, and enforcing controls around the whole thing.

That's really what we mean when we talk about Terraform for platform engineers. It's solving the set of concerns that the platform teams have rather than only focusing on the concerns that the application teams have. Hopefully, that was helpful and gives you some things to think about in terms of how to solve for these various challenges for all of the different people involved with building and managing infrastructure.

Sign up for the latest HashiCorp news

More resources like this one

3/15/2023Presentation

Advanced Terraform techniques

2/3/2023Case Study

Automating Multi-Cloud, Multi-Region Vault for Teams and Landing Zones

2/1/2023Case Study

Should My Team Really Need to Know Terraform?

1/20/2023Case Study

Packaging security in Terraform modules

View all resources