Managing Policy as Code With Terraform and Sentinel
In this demo session, you'll learn how to do Cost Estimation, enforce best practices, fix config, and implement Terraform foundational policies all using the Sentinel policy as code framework.
Speakers
- Kyle RuddyDirector, Technical Product Marketing, HashiCorp
Terraform provides cloud infrastructure automation with Infrastructure as Code, allowing organizations to codify their desired use of infrastructure and then enforce best practices for how that infrastructure is provisioned and de-provisioned. Watch this session to learn more about Terraform, cloud infrastructure automation, and approaches to managing infrastructure compliance with modules, Sentinel policies, and automated policy enforcement, and see live demos throughout.
Transcript
Hi, my name is Kyle Ruddy and welcome to this HashiConf session on managing policy as code with Terraform and Sentinel. (For more resources on Sentinel, check out our Writing and Testing Sentinel Policies Guide
As part of this session, we're going to dive into some of the new things that are available for Terraform. We already heard a little bit about what's happening during the keynote. But we're going to take an extra special look at some new things for Terraform and Terraform Cloud and Enterprise.
Then we're going to go into policy as code—how do we establish some guard rails for the way that we deploy our infrastructure as code. Then we have a special look at a brand new resource that's out there and available today called the Foundational Policies Library. We have a ton of demos that we're going to run through as well.
What’s New For Terraform
As of this session recording, Terraform 0.13 has been released in beta form—meaning that you can head right out to the Terraform GitHub repository and get access to the beta code and give us all kinds of feedback on what you like and what you'd like to see more of.
However, you should be looking forward to how we're making modules first-class citizens within the Terraform environment. This is due to some of the new additions—these arguments—that we can include in the module blocks. Arguments such as count
, for_each
, and—the one that I know a lot of people have been looking forward to—depends_on
.
This can enable us to add some strength around the way that we define our modules, how we integrate other private repositories—and other private modules from those repositories—to make our infrastructure as code that much more powerful.
We also have some new integrations into the way that we connect to Terraform Cloud. Now Terraform 0.13 will allow us to store our authentication token within the CLI—meaning that we don't have to establish that every single time that we create a Terraform configuration.
Then lastly, we have some cool integrations for taking a look and accessing third party providers—whether they're a community, whether they're a partner, whether they're provided by us at HashiCorp. It doesn't matter whether those are public through the Terraform registry or private through some of your repositories that you're using within your configurations and systems.
Terraform Cloud & Enterprise Updates
Then we also have some new updates for Terraform Cloud and Enterprise. When you log into Terraform Cloud and Enterprise, you’ll immediately notice this new “getting started” walkthrough. This is going to show you some of the high-level steps that you should take to be successful with Terraform within Terraform Cloud and Enterprise. This is going to take you through creating workspaces, planning and running your configurations—and so much more—to make the most out of your Terraform usage.
Then we have these things that are called run triggers. This allows you to start having some integration pieces between your workspaces. Conceptually, this allows us to build our workspaces in smaller, more deployable chunks. Then we can continue to build this pipeline process in between each of the workspaces to make the most of our CI/CDprocess.
Lastly, there has been a ton of improvements to how we manage users. We've consolidated our user management page onto one single section for each organization. We do email invites for new users. It's a great experience—it makes it a ton easier from the management side.
Policy as Code Recap
I'm assuming everybody here knows—or at least conceptually understands—what infrastructure as code means. Being able to take code to define your infrastructure in the ways that you want it to end up existing as. The cool part about that is that I can go through and write my code to be whatever I want it to be.
However, when you take a look at that from the business perspective, maybe that's not the best way to do it. Maybe we will have some organizational guidelines or some best practices within the organization and within the services that you're using and consuming as well that you'll probably want to comply with. That's where policy as code comes into play. Let's take a look at one of the first real big pillars here:
Embed
How do we take some of these best practices—such as from in AWS or Azure—how do we ensure that it's mandatory to use tags in our Terraform configurations? How do we restrict the instances that we're using within each one of these services to ensure that maybe development uses this certain size—production uses a different size? Then lastly how do we make sure that these resources are being disposed of in the time that is needed—giving it a time to live.
Enforce
Then we can look at some of our organizational guidelines—some of these best practices that have been defined internally. Such as continuing down that path of using tags and making sure that those tags are declared in the same way for each one of the resources that are being consumed. Continuing to ensure our resources that we're providing and provisioning are to the right specifications—to the right size or using the right formatting—for each one of those resources. Making sure that our development environments might not be running after hours or in certain situations where they're not going to be utilized to the fullest extent. And then lastly—the big one here—controlling costs. Making sure that we're falling in line with what the business has told us that we should be using.
View
Then we're getting into the visibility side of this. How do we ensure that we understand that across all of our deployed instances, they're consistent and using those same instance types—falling in line with all of those cost procedures and making sure that they're not running at inopportune times.
Embed infrastructure best practices
Here's where Terraform comes into play. We're starting to look at how we can use some of these private modules to enforce—and use some of these guidelines to the best of their capabilities. So we can create our own modules, we can share them within these private repositories and access them through Terraform, Terraform Cloud, and Terraform Enterprise.
We can also go through and validate that a lot of these things are as we think they are. That's the joy of being able to define all of this as code. Then lastly, being able to access them when we publish those out to our libraries for other users within our organization to continue to consume and make use of.
Enforce best practices with policies
Plus we have Sentinel. This is our policy as code engine. When we talk about policy as code, Sentinel is the workhorse that's behind it. Taking and allowing us to create all of these policies as code—defining them as code—and then applying them to our workspaces so that we can say, "Hey, these are the teams that should have access to these resources or this particular library."
We can validate that certain things are in the way that they should be for us to allow our users and our consumers to continue deploying resources as they need them. We can also allow them to go ahead, “deploy the resources you need”. We can already validate what they need to get their job done.
View infrastructure state and estimate cost
As we're talking about teams we'd be remiss if we didn't talk about state. Being able to manage and access and modify the state of the resources that are out there and deployed. One of the nice things about Terraform Cloud and Terraform Enterprise is that ability to manage that state across the team.
Plus, there's this nice addition for cost estimation, which would give you a rough guideline about what some of these resources are going to cost when they're deployed to our own environments.
Introducing the Terraform Foundational Policies Library
And one last slide before we dive into a bunch of demos here, and that's the Terraform Foundational Policies Library. This is a new resource that was released recently—just this year—it's out there; it's available on GitHub for you to check out immediately today. There are over 50 different policies that are already initially available. Things for compute, applying policies to databases, virtual networks, Kubernetes—some of your storage. All types of things are out there, and they're all based on the Center for Internet Security and the benchmarks that are established within there.
When you look at those benchmarks they’re already predefined by the CIS group, so we've created these policies and posted them out on GitHub—that’s covering all of the major cloud providers. That means as of today it's AWS, Azure, and GCP.
Demos
Enough with the slides, let's dive into some demos and show you the power of how we can make use of policy as code to enable our own workspaces for our organization. Within these demos, we're going to take a look at how we can visualize some of our infrastructure costs—through cost estimation. How we can use policies to discover some of the inopportune deployment practices—some inconsistent configurations within our environment that are trying to be deployed.
And then lastly, we're going to take a look at the Foundational Policies Library and see how easy it is to take one of those policies and apply it to one of our own existing configurations.
Cost Estimation
Here we are in Terraform Cloud. The first thing that I want to call out is this new bar across the top here. If you've logged into Terraform Cloud before, you might never have seen this. This is something that's new and helps guide anybody that's new to Terraform Cloud in their experience.
As we can see here, I've signed up for an account which was free. I have created my organization—in this case, it's TF TMM, and then we're ready to create some workspaces and start planning and applying our configurations. We can also check out the Git starting page that's over here. That'll take us through some additional information such as some basics on using Terraform Cloud, how we can set up and create our version control systems. Our VCSs— such as GitHub and GitLab or Bitbucket—whichever you're using within your environment. And then some getting started tutorials. Then we even have some more guidance around what we can do within each area.
But we're not going to do that today. Right now, we're going to create our first workspace within this organization. It's something that I've already created and provisioned. Let me switch over to my user account here. We're going to set up a workspace for our basic site demo here. Fairly simple—just pulling that Terraform configuration directly from GitHub. After a few seconds, we can see that our configuration was uploaded successfully into Terraform Cloud, and we're ready to start making use of it.
I've taken some liberties here, such as already filling in some of the variables. In this particular instance, we're deploying this website out to AWS. We need to have some basic things like which region we're going to deploy this resource to—as well as our public key, our secret access key and access key ID. Those are all things that are dependent upon your environment. From that point, we can hit our Queue plan—and this is the same thing as if you're in the CLI typing in terraform Plan
.
If we scroll down here a little bit, we can see our plan running. We're going to use Terraform 12.26. It's going to go through—check our configuration. We see that we have nine different resources that we're planning to add for this configuration as it's been written. Then after that, because our organization has cost estimation enabled, we'll see a rough estimate on what it will take to run this configuration.
After a few seconds, our cost estimation will pop up. We’re going to use a load balancer as part of the configuration that I imported. We can see it has an hourly cost of roughly two and a half cents, as well as an AWS compute instance of roughly again, two and a half cents. We can see some of the monthly cost estimates for that as well. Then when you start making changes to existing configurations, you'll start to see this delta over here to the right-hand side, start to change as well.
We now have an understanding of what we're deploying and how much that's going to cost if we were to deploy it. Then we get into our policy as code section here. We can see that our policy check hard failed. What does that exactly mean?
Enforcing Best Practices
Walking through this, we have two different policies that are being checked with this. We can see the one passed and one failed. This is our first policy set type—checking the AWS instance that's being used. This is checking to make sure that we're using a T2.micro instance. This is something that we could configure based on an environment or an organization's configurations. In this case, it's saying false—that is not what our configuration is using, so it failed our configuration plan here.
However, the next one that we're doing here is a timing piece. It's saying, "Hey, is it read-only Friday?" Because if it's read-only Friday, we don't want to deploy anything. We don't want to ruin anybody's Friday. And especially not anybody's weekend. These are some simplistic Sentinel rules. We can jump over to my other tab here, that's out on GitHub and check out some of these basic rules that I was able to create.
Just in 13 lines, we were able to tell Sentinel we want a desired instance type of T2.micro and then we use our main rule down here to say, "Hey, is this the configured instance type?" We can see if you step through this, that we're pulling information from Terraform Plan—from our resources—from the AWS instance of our web node and pulling out what's being applied. And we're storing that into a variable—and we want to match that to our desired instance type being that T2.micro and that's the policy that failed. The result of each one of these Sentinel rules is true or false. In this case, this one was false.
That one was a little more specific to the service that's being consumed. We can also take a look at that other policy rule that we created, which is a very simplistic—is it Friday? We're importing the time, resource, establishing what day of the week it is in a numeral value, and then saying, "Hey, if this thing is equal to five—because five is Friday—then fail. It's false. Don't let them provision that." The other important thing—if you notice back there—it was a hard failure. It stopped. We couldn't override it. We couldn't go forward any more than that.
So if we take a look at our sentinel.hcl file, this is where that enforcement level is made. Our AWS time is a soft mandatory—meaning that if you want to deploy those resources, you can give a reason and hit apply anyways. However, the instance type that's something that we're maintaining that hard consistency on. We don't want anybody to be deploying something that doesn't fall in line with what's been configured for their environment.
Advanced Sentinel rules
So those were some simple rules. Let's take a look at a little more advanced rules. We're back at our workspaces prompt here. We have a couple more workspaces that have been created out here. One is for development workloads, and the other one is for production workloads.
We can pick either one of these, and walk through what it looks like to deploy one of these. We've already created our variables and we're ready to go ahead and queue our plan. This is based off that prior workspace. The look and feel is going to be very similar to what we saw before. We're continuing to use 0.12.26. Once it performs our configuration, we can see that it's going to plan to add nine new resources. Then our cost estimation and our Sentinel policy checks are going to run to tell us whether or not we can continue to move forward here.
So if we look at our cost estimation here. We're using a little cheaper instance this time. We've followed in line with T2.micro instance. If we go down and check a look at our policies again, we can see that we now have three policies. However, one has only advisory failed. We can see that our AWS time has passed because today's not Friday, so we can go there. Our instance type has passed because we're using that T2.micro instance.
Then we have a brand-new rule that's in here that's taking a look at our costing. In this case, it's checking to see if our proposed total monthly cost is going to be less than $10. In this case, looking at our cost estimation, we can see that our load balancer alone is going to be more than that $10. That's going to advisory fail and, if we're okay with that, we can continue with our planning.
Now, if we go back to our policy rules—our repository out here—we can then switch our branch into the master branch. I've included another policy that's out there that's around costing. As part of this individual rule, we're making use of our tfrun
package here.
From that, we can start pulling some things about costing. In this case, we can use cost estimation and then pull out our proposed monthly cost and check—is it 10 or not? In your environment, you're probably going to want to use Sentinel variables so that you can put that down there. And then you can change that based upon each workspace that you're using. Because if I were to run this against production, we're probably going to get a much larger value.
The Same Workflow from AWS to Azure
Thus far, we've done some cool things with AWS services. Let's switch it up. Let's change over to say, Azure. Here we have a new workspace that's been deployed out here called hashicat-azure. This is something that's out there. It's a configuration that's available within the HashiCorp, GitHub repository. It allows you to deploy a web server that will show you pictures of cats—so therefore Hashi and cat on Azure.
We've already configured this with some of our variables that we're going to need to have. If you're familiar with Azure, you know you need to have some service provider type variables—meaning, your subscription ID, client ID and secret, and your tenant ID. Then for this particular configuration, we need our prefix.
In this case, we're going to use HashiConf because Hey—we're watching a HashiConf session right now. We'll go ahead and queue our plan here so that we can see how some of this changes as we move from AWS over to Azure. Our plan is running. Let's see how many resources we're adding—in this case we're adding eight different resources.
Unfortunately, I'll have to apologize. My environment here broke this morning, and the cost estimation is not working—we're going to see a minor error. You should not see that in your own environments. I believe this is isolated to mine at the moment. But the session must go on.
Let's take a look at our policies that we're using for this. We can see similar policies to what we saw in our demo basic site. Here we have two policies, one passed, one failed. Again, we're continuing to make use of the time rule, that policy—and we didn't have to change anything for that. There's no dependency on which service is being used or which providers are being used. It's simply looking at, “Is today, Friday or not?” Then our second rule here is around the Azure instance type. In this case, we're looking to see, are we using a standard A0VM size?
So we're going to go ahead and discard this run because I want to show you a little bit of power for Terraform Cloud. We'll discard this run saying, "Adding more later." So discard that run. Going back to an overview of our run—we've done just the one run so far that we've manually queued up in Terraform Cloud.
Let's move over to Visual Studio Code. This is my editor of choice. One cool thing with Visual Studio Code—there's this Terraform plugin. We can see down here at the bottom, we can do some enablements—form completion, some other IntelliSense as part of this plus on top of formatting and other things.
Anyways, it's cool to be able to access that directly within the editor of your choice. That is something that's brand new for HashiCorp—something that we recently picked up and took over ownership of.
Using the Foundational Policies Library to apply CIS benchmarks
This is our configuration. This is the main.tf. We can see some of the variables that I described within Terraform Cloud on our top couple lines here. And then the rest is directly coming from that HashiCat configuration. Now, we want to add a managed disk to this. For whatever reason, we're going to add that on here.
I'm going to cheat a little bit and bring over a resource block for our managed disk. We're saying, "Hey, we want this new disk of this name placed within our resource group that we created on the step before that—and we want it to be about a terabyte in size." We can save that. Save our file off to the left-hand side here, we can see through some Git integration with Visual Studio Code. Our main.tf file has had a change. We're going to commit that—or rather stage that in. Then we're going to add a commit message here—we're going to add a managed disk.
I committed that to the local repository. Going over here to our actions we can say, "Push." We can see some actions that are happening over here. That's going out to our GitHub repository where this is located and stored. Then if we go back to Terraform Cloud, we have a new run already in motion here. And we can see that this matches up to that commit message that I used within GitHub.
We can click on this and see where we're at here. We've already processed through the plan. Instead of those eight resources that we were adding before, we're now up to nine because of that new managed disk. It looks like that just wrapped up. Our cost estimation again is not going to succeed within my environment. It's going to go ahead and head on down to our policy check afterwards.
We didn't change anything within our configuration with regards to size. We still have one passed and one failed policy. But since we added that managed disk—we're a fairly security-conscious organization—so we want to apply some Center for Internet Security benchmarks.
Let's hit our discard run here—add in a nice little message. Then thanks to our Foundational Policies Library that we have a link to here at the top of your workspace. We can click on that. It's going to take us out to our GitHub repository. This is the location of all of those policies that have been pre-written for us to make use of. The .readme has all kinds of information about what's included—more information about each of those pieces—and walks through how we can make use of them. But I'm going to show you a quick way for us to do that immediately.
We can start by clicking on CIS—that's the Center for Internet Security—then we have access to the benchmarks for each of those services. In this case, we're working with Azure. We're going to click on Azure. We just add a managed disk, and we're trying to follow some of the CIS policies or benchmarks for storage—and they happen to have one that's for managed disks. Let's scroll down here a little bit—we can see the different policy or the different benchmarks that have been created for Azure within the storage area.
Storage is not where I want to be—I believe I want to be in compute. Yes, there we go. We want to make sure that all of our managed disks are encrypted—this is CIS 7.1 If you're using the benchmarks within your own environment, you can look that up and see what it's doing. To make use of this within our own policy set here we just copy it and head on over to our sentinel.hcl file. In this case, we should recognize some of these Azure time and Azure instance type and paste in that rule there.
The magic of what's happening here is due to our source parameter. The source parameter is on our GitHub repository where that exists—all of the hard stuff has been done and been placed out there. Let me go back to that. We can click on our 7.1. We can open that up and then we can see some of the test-based information for creating that. And then we can see the individual rule itself. These are a little more complex than the rules that I wrote. You'll notice that right off the bat—but they're extremely easy to run and apply to your own environment.
Going back to Visual Studio Code here—we can save our sentinel.hcl file. We can see some of the updates on the left-hand side here. We're going to add that to our change. We're going to update to or create our commit here saying that we're updating to apply CIS 7.1 to our policy sets. Then we can push that out to GitHub where Terraform Cloud is accessing the policy sets. That's the backend for those.
And you're not limited to GitHub. I should throw that out. You can use whatever VCS you prefer. Understand that there's no limitation or constraint on using GitHub there. We can queue up our plan and see—did we apply CIS 7.1 to our managed disk and the other desks within our configuration appropriately?
Our plan is going to continue to run. You've seen no changes have been made to our Terraform configurations. We're still adding nine new resources, and it's going to run through cost estimation, and it's going to hit our policy checks. There we go.
Now we can see—we can walk through our different policies here and take a look at each one of them. The bottom readout down here, checking to make sure that it's not read-only Friday while I've been presenting this the day has not changed—still not Friday, so we're still good from that aspect; still returned to true.
Then if we go up a little bit more, we see a little more verbose output here from our CIS 7.1 policy here. The result here is false—it failed, and we can check and run through and see some additional information about what exactly failed and get some additional information. And then our top policy here, policy number one goes back to our instance type. That is still not a standard A0—so that still continues to fail.
Presentation Recap
That was a quick run-through of some of the demos that are available for making easy use of policy as code within your organization today. To review—we went through, took a look at some cost and estimation for AWS, seeing the difference between different instance types, as well as the load balancer that's been configured. We also took a look at using policy as code through Sentinel to pull out that cost estimation in data to then make decisions—to approve or deny the configuration plan that's been run. We also checked some corporation guidelines—checking to look, based on whichever environment that we're working on, are they deploying the appropriate instance size or type within that environment for that configuration?
We took a look at some infrastructure uniformity—making sure that we can apply that across several different workspaces, taking look at whether or not that's development or production. And then lastly we wrapped up with a look at how we can make use of those Foundational Policies Libraries to apply CIS benchmarks to our configurations in a quick way.
Useful Resources
There are some resources that you're going to want to check out today. First and foremost, head on out to the Terraform GitHub repository, grab a copy of Terraform 0.13, get access to those new module arguments, get access to that new way to authenticate to Terraform Cloud—as well as that new way to reach out and grab some of the new community and partner providers within the public Terraform registry and also your own private registries.
Then check out Terraform Cloud. It's free, and a customer told me this recently—it's one of the best ways out there to share remote states. I think that's a fantastic resource—plus it gets you access to some of the cost estimation as well as policy as code to establish those guard rails to your infrastructure as code-based practices. And that's the last piece here, adding checks, adding those guardrails, making sure that—as people are deploying their infrastructure—they're following not only the service best practices but your organization's best practices as well.
We've created some great pre-written policies that are out there based on the Center for Internet Security benchmarks. That's out there on the Foundational Policies Library.
To wrap up—my name is Kyle Ruddy. I'm a senior technical product marketing manager here at HashiCorp covering Terraform. Some of my contact information is on the screen. Please reach out if you have comments, questions, if you want to see some more things about individual services or anything that you prefer, I'm more than available. Thank you very much and enjoy the rest of HashiConf.