Testing Terraform Sentinel Policies Using Mocks
This talk and demo will show how you can generate Terraform Sentinel mock data (mocks) from your Terraform Cloud and Terraform Enterprise plans and use them to test your Terraform Sentinel policies with the Sentinel Simulator.
While you can test Terraform Sentinel policies in your Terraform Cloud account or on your private Terraform Enterprise server by triggering runs against workspaces, doing so has some limitations:
- Each test takes longer since you first have to run a
terraform plan
- You might have to discard many runs from your workspaces
- Your workspace history might become cluttered
In contrast, using the Sentinel Simulator with mocks is faster since the tests will not run Terraform at all. It avoids having to discard runs from your workspaces. Additionally, you can copy and edit generated Sentinel mocks to test your policies against multiple combinations of resource attributes.
Finally, you can trigger automatic tests whenever you modify your policies. Note: Sentinel mock data cannot be generated from Terraform open source plans. Using the Sentinel Simulator to test Terraform Sentinel policies requires a subscription or license that includes the mock generation feature.
Speakers
- Roger BerlindSenior Solutions Engineer, HashiCorp
Transcript
As Katie mentioned, my name is Roger Berlind. I'm from the New York area, Westchester County. I'm a senior solutions engineer with HashiCorp. I work with all of the HashiCorp products, but I specialize in Terraform and Sentinel. My contact information is there, as well.
A little bit about my background in connection with this talk. I have written a lot of the Sentinel policies that are in one of HashiCorp's GitHub repositories called Terraform Guides. I've been doing that for a while and continually improving them. I also wrote a guide called Writing and Testing Sentinel Policies for Terraform, or something like that. That's something I'll mention later.
The primary objective of this session is to show how Terraform Sentinel policies can easily be tested with Sentinel Simulator using mocks generated from Terraform plans.
Now, I should say a little bit about what mocks are. I'm going to say more about them later. But for now, what you need to know is that a mock simulates the data that Terraform normally presents to Sentinel after you run a plan. The purpose of the mocks is to use them with the Sentinel Simulator. Some of you—probably a lot of you—might have attended a talk yesterday by our lead Sentinel engineer Chris Marchesi. You would've heard him talk about Sentinel and the Simulator.
Why should you use the Sentinel Simulator and mocks?
Before I go on and give some more details, I've talked about the objective. A natural question is why would you want to use the Simulator to test your policies?
Speed up creation of new Sentinel policies These mocks can be generated from the Terraform Cloud UI and the API. Before they existed, the only way to test a new Sentinel policy—or one that you were writing—was to run a plan against a Terraform Enterprise workspace. I see people shaking their heads. I'd run a plan, I'd wait for the plan to finish, and then I could see what the policy did. The policy check would be run automatically, and hopefully, it would work.
But even as many Sentinel policies as I've written, I still make mistakes. Then I would get some error and I'd have to go fix it. Then I would start all over again—and I’d have to wait for that plan to run every time. The plan generally takes quite a bit longer than the Sentinel policy checks. The policy checks are very quick, like a few seconds.
The main reason for using the Simulator with these mocks is you can speed up the creation of new Sentinel policies since you don't have to wait for the plans to run. When you're using the Simulator, you can use it on a machine that doesn't even have Terraform. It's separate and detached from Terraform.
Enable testing various combinations of possible attributes with a single command
I think of it as testing them simultaneously. It's probably one after another, but they're so quick that you do one command, and you're testing two, four, ten test cases. That also speeds up your process for writing new policies and for testing existing ones if you make changes to them.
I do want to do some level setting. As I mentioned, Chris Marchesi talked yesterday about what Sentinel is, but not all of you were necessarily there. I'll talk a little bit about Sentinel, then about how Sentinel is used in Terraform. That will include discussing some Terraform-specific Sentinel imports and also Sentinel Mocks. Then I'll lay out my methodology for writing and testing Terraform Sentinel policies using the Simulator. Finally, I'll do a demo of all this.
What is Sentinel, and how is it used in Terraform?
HashiCorp Sentinel is a framework for implementing governance as code in the same way that Terraform implements infrastructure as code. You're codifying what your policies are in a way where those policies are going to be run in the line of execution. Sentinel includes its own language, and it's embedded in HashiCorp's Enterprise products. Using Sentinel ensures that your governance policies are being checked—instead of sitting in a spreadsheet somewhere. It's nice to have policies in a spreadsheet or word document, but that doesn't help if they're not being enforced.
Sentinel does support fine-grain policies that use conditional logic. That means you can implement pretty sophisticated policies. Most of these policies are going to be focused on restricting what you can provision. Sometimes it can also be focused on looking at what you've already provisioned. But it's mostly proactively making sure that you don't provision VMs or some networks, subnets, load balancers, security groups—whatever it might be—various types of resources—across any cloud. Making sure the resources you are going to provision with Terraform in the apply step are meeting your policies. Obviously, as I've mentioned already, it does include a simulator.
How is Sentinel used in Terraform?
Sentinel policies are checked between the standard plan, and the apply steps within a Terraform run in Terraform Cloud and Terraform Enterprise. I should mention here, Sentinel, generally speaking, is used with HashiCorp's Enterprise products. In this presentation, when I say Terraform, I mean Terraform Cloud, the SaaS solution of Terraform. Or I mean Terraform Enterprise, which we used to refer to as Private Terraform Enterprise. The pTFE on-premise—your own dedicated server, either on-premise or in a cloud, any of the major clouds.
First of all, the Sentinel policies are run between the plan and the apply. Violations prevent runs from progressing to the apply step unless a user with sufficient authority overrides those failed policies. Certain people can do that—the owners of an organization and people with a particular permission called "manage policies".
Sentinel policies can evaluate the attributes of existing and new resources—and also data sources based on information associated with the current run. That information includes the plan itself, the configuration—which means the code—the current state of the workspace, and other run data. I'll talk a little bit more about that later. Ultimately this ensures resources comply with all of your policies before you provision them.
This is a diagram showing how this works in practice. What you're seeing here—starting going from left to right—is typically you're going to have some Terraform code loaded out of a VCS repository into a Terraform Cloud workspace. That will trigger a plan—like open source Terraform—you could do Terraform plan.
You trigger the plan. If the plan is successful—meaning that it didn't give any errors—then Terraform Enterprise/Terraform Cloud will automatically trigger the Sentinel policy checks. If those policy checks pass the users—or the people who are logged in are authorized to—will have the ability to click a button or use this CLI command or the API to do the apply. That applies what's going to provision the infrastructure. If any of the Sentinel policy checks fail, there are some people can override that in some cases.
This is a screenshot of a run within a Terraform Cloud workspace. It's showing how we have the plan and the policy check, and then the apply. We can even see who did it—R. Berlind—that’s me—13 days ago did the apply.
Sentinel imports in Terraform
I want to move on and talk about Sentinel imports and Sentinel Mocks. Sentinel does include several standard imports. Think of imports as modules that extend the capabilities of Sentinel. Some of the standard ones are things for manipulating strings, JSON documents, decimal numbers. You have some other things as well—time, SoC addresses, etc.
Terraform adds several additional imports that are unique for Terraform. These are Sentinel imports to be used with Terraform—and there are four of these, currently. The tfplan import gives access to the Terraform plan used by the current run. The tfconfig import gives you access to the configuration—meaning the code.
You'll notice the difference here—the plan is going to show you what the value of something was. Whereas the configuration is going to show you what the code snippet looked like. Maybe you had dollar-sign-brace-something—that can be useful sometimes. The tfstate import gives you access to the current state of that workspace. The tfrun gives you access to some workspace metadata and also the new cost estimate data that was announced yesterday morning. That's in there as well.
The most common of these is the tfplan import. This is the one that people can use to stop the provisioning of resources that would violate your policies—because you're able to restrict specific attributes of specific resources. This works with any provider—it doesn't have to be an official HashiCorp provider or provider on our website. Sentinel can restrict any attribute of any resource of any provider, even if you wrote the provider yourself. That's all made available to Sentinel.
Sentinel Mocks in Terraform
Okay, so now let's talk about the mocks. Mocks—you know—like mockingbird, they mock something else. The mocks simulate the data that—if you were running in Terraform Enterprise—it's the data the plan would make available to those Sentinel policy checks. You have this data, but it's human-readable, you can take those mocks, start to copy and edit them, and start to use them with the Simulator.
There is one mock for each of the four Terraform Sentinel imports. They can be generated from recent plans using the Terraform Cloud UI or the API. Then you can copy and edit them to simulate various combinations of resource attributes. Then you use these with the Simulator—and using mocks with the Simulator has the benefits that I mentioned in my why slide. It speeds up your testing. It also makes sure that you're testing multiple scenarios very easily. That includes while you're developing the policy and, later on, if you change the policy.
Writing and testing Sentinel policies for Terraform
I'm going to move on and talk about my eight-step methodology for writing and testing Sentinel policies with Terraform. This is based on this guide, which I'll show you later. I have links at the end that'll I'll share with you.
So, this basic methodology is eight steps.
The first step is to create a Terraform configuration that creates certain resources. You're going to create resources that you want to restrict with a Sentinel policy. It could be an AWS instance, Azure VM, vSphere virtual machine. It could be anything at all. Any resource.
Second step is to create a Terraform Cloud workspace that uses the configuration. It's going to map to a VCS repository that contains that configuration, most likely. You then run a plan against the workspace, generate the mocks from the plan inside the UI, write a new Sentinel policy. If you're going to test policies with a simulator, you'd better have a policy. Write a Sentinel policy that restricts those resources that you are creating in step one.
Now you can start to create some test cases and test your policy with the Simulator. You can then revise your policy and test cases until all of the test cases pass. You’ll see during the demo that the Simulator prints stuff green or red. Green is good, red is bad, and you want everything to be green.
Once you've gotten to that point, you can take your policy and deploy it to a Terraform Cloud organization. That's the methodology. I'm going to focus here on the ones that are in bold face. I'm not going to focus too much on the writing of the Sentinel policies. I want to focus more on—and I'm going to demo—running a plan, generating the mocks, creating the test cases and editing them, and then running the Simulator.
I could launch into the demo at this point, but I think it's better for you if I walk through this in slides. I'll walk through the process so you get a feeling for it. Also—this is good later on—if you download this presentation once it's available, you'll have that in the slides as well.
Run a plan and generate mocks
You can run a plan against a Terraform Cloud workspace or select one that you ran within the past seven days. We keep the mock data for that plan for seven days. You then expand the plan in the TFC UI—Terraform Cloud UI—click the, “Download Sentinel Mocks” button. At that point—after a minute—this is going to download a tar.gz file with four mocks; one mock for each of the imports. Then you can extract the mock files from the tar file, and start to copy and edit them. That's this first part.
I do want to point out—Chris had reminded me to add this in—that mocks can contain sensitive data. For instance, cloud credentials could be in there. Or Vault tokens. There could be sensitive data in there, so do treat the mocks carefully—handle them in a secure fashion.
This is a screenshot of the Terraform UI of a run in the UI with a plan that's been run. There we see the button that you click inside a plan. That was step four of the process there.
Write a Terraform Sentinel policy
Step five was writing the Sentinel policies. I'll show you briefly part of a policy during the demo. But I did want to mention that in this demo, I'm using two Sentinel policies. One is restricting Google compute instances to only have certain allowed machine types. N1-standard-1, n1-standard-2, and n1-standard-4. Then the second policy is a Sentinel policy that's restricting AWS S3 buckets. Every bucket that you provision with Terraform is going to have the private ACL and be encrypted with a KMS key. This is coming from the security team.
You start to get the feeling there are probably more than two kinds of policies, but broadly speaking, there are two kinds of policies. Some policies are focused on cost containment. Trying to reduce costs and avoid people accidentally doing things that cost a lot of money. The other one is often more security-focused. Making sure that things are encrypted. There are probably a lot of other things as well. But those are two of the primary use cases for Sentinel in Terraform.
Create test case directories
I included this second policy as is it's useful for illustrating how you test a policy that has multiple conditions. Walking through the process again, you've generated your mocks, you've downloaded them, you now need to create test case directories that use the mocks. Under the directory containing your policy, you create a test directory. Under that test directory, you want to create a directory with the same name as your policy, except without the .sentinel
extension.
The Sentinel Simulator is quite particular about the naming of these directories. You have to get this right otherwise your test cases won't be found. Like in the case of our strict GCE machine type Sentinel policy, you create a restrict-gce-machine-type directory
under the test directory. Then you can copy the tfplan mock file that you had downloaded and extracted from the workspace to the second directory. Then you can move on and create a test case that's going to be a pass test case.
Create a pass test case
I advise people to take that mock file, rename it to be mock-tfplan-pass.sentinel
, and then you create a test case called pass.json
that looks like this. This test case has two things in it. One is it's saying that we're going to be using a mock of type tfplan—this is the specific file we're going to be using. I should mention that the mocks have the Sentinel extension because the mocks contain Sentinel code.
Then we have a second condition in the test case, which says the main rule has to return true for this test case to pass. Remember, Sentinel policies have rules, and they always have a main rule.
Then we do the same thing with a failed test case. We make a copy of the mock file called mock-tfplan-fail.sentinel
, and we have a similar fail.json test case. The main difference here is that it's pointing at the copied mock file. Then we're saying that the main rule should give false. This can be a little bit confusing because, ultimately, with the Sentinel Simulator, you want all your test cases to pass. You'd like everything to be green. This means when you have a test case that's a failure test case, you want the main rule to return false so that failure of the main rule makes the test case pass. It's a little bit odd.
Edit the mocks
You edit your mock-tfplan-pass.sentinel
file so that it has legitimate values—machine types that are allowed. You edit the mock-tfplan-fail.sentinel
mock file so that the main rule will give false—which means we're going to use something like n1-standard-8. That's not one of the allowed things. Then you can test with the Simulator. You go up to directories, and these are the commands you can run. Things like sentinel test -run=gce
, or you can do the same thing and add on the verbose option.
The run argument is doing partial name matching on any part of the name of the policy. If you had multiple policies in a directory and you only want to run one of them, you can now indicate with a partial name match which ones you want to run. To test all of your policies, you would leave off the run argument, and that would test everything.
The demo
With that, we're ready to move on to the demo. Everything that I'm using in this demo is in this repository, which I'm going to show you now. This HashiConf 2019 repository over in GitHub, it's in my R. Berlind organization. We can see that there are several directories here. We have the AWS S3 bucket directory that has a main.tf file—Terraform code—that provisions an S3 bucket. I use that so that I could run a plan and generate mocks for that.
I have the corresponding GCP compute instance directory with a main.tf file that provisions to Google compute instances for the same reason—so that I could run a plan and generate mocks. Then I have my Sentinel directory with the two policies that I mentioned and my test cases.
Over here in the Terraform Cloud UI, I have an organization—this is on the SaaS solution. This is in the public offering, I have Roger Berlin, HashiConf 2019 with two workspaces. These workspaces have names that are exactly the same as those two directories with the Terraform code.
If we go look at one of these—the second one—the GCP compute instance, we can see in version control that this is pointing at that repository. We're saying this workspace takes its Terraform code from that repository, and specifically, it only reacts to changes within the GCP compute instance directory. This is pointing out that directory.
I'm going to queue a plan. I'll just call these mocks. These plans usually run quickly in Google. It should take about 30 seconds. We see I'm running a plan, the plan is queued, now it's running. You can see I'm using Terraform 0.12.6, and we see already that there are two compute instances to be added. If I scroll back up momentarily, I see the download Sentinel Mocks button, and I click that to generate the mocks and download them.
Fortunately, that was quick today. We see that it downloaded a .tar file, which I can now show in Finder on my Mac. I can double click it to extract the contents, and we can look at that. We see there are 4 mocks corresponding to the 4 imports.
We can take a look at one of them. Let's look at the tfplan mock. In fact, I'm only going to use the tfplan mocks today. I'm not using any other ones, although I've used all the imports and all the mocks in other policies.
Here I've opened up this file. We see it's provisioning a Google compute instance with demo_1, and it has various attributes. Some of the scrolling here isn't that great, but you can see the machine type is n1-standard-1.
I'm not going to use this file today. I already went through the process earlier before the demo—several days ago—of downloading these mocks, copying them, editing them. Everything I talked about in PowerPoint I've already done.
I want to move on and take a look at the policy itself. We import some different Sentinel imports—tfplan, and strings. There are some functions here—I don't want to go into the details of what the functions are doing—so I've intentionally closed them within the editor. But I have one called find_resources_from_plan
, which is a function that will accept type as an argument. It will return me all resources of a particular type across all modules within your Terraform configuration. That's what that one's doing.
Then I have a second function, validate_attributes_in_list
, which is taking three arguments—the type of resource, the attribute, and a list. This is saying for the resource of the given type, this attribute has to have a value. Or every single instance of that resource has to have the specific attribute in the particular list. We can see down here I have an allowed types list with n1-standard-1, 2, and 4—like I'd mentioned in the slides. We can see I'm calling that second function with Google compute instance, machine type and allowed types. That’s what that policy is doing.
Notice, this one is the mock-tfplan-fail—this is the fail mock. We can see it currently uses n1-standard-1 and n1-standard-2. But if we look—the fail one—they’re identical now. They're identical because I have started out with one thing left undone. I'm pretending here that I forgot to edit the mock, the fail mock. I haven't set it up to fail yet.
Having shown you those three files, I'm going to go over to my terminal where we can see what directory I'm in. We can see that I'm in R. Berlin HashiConf 2019 Sentinel. This is the clone of that repository. I've used Tree to show you that inside that Sentinel directory. I have the policies, and I have the various test cases.
Let's start running some Sentinel Simulator commands. sentinel test -run=gce
. The first one failed. This fail.json—it’s all in red. It failed because we expected the main rule to be false, but we got true. The pass test case was okay. That one was green.
So let's go back to the editor, and let's fix this guy. Let's do a find—I’m going to replace all instances of n1-standard-1 with n1-standard-8, and then I'm going to save this file. I am now using a value that's not allowed in my fail mock. When I go back and rerun that same command, both test cases are passing. The fail one is passing because the main rule is returning false.
We can do that again, but we'll add the verbose option. Now we see some additional details. Sentinel has a print function. Inside a Sentinel policy, you can do print—and print whatever you want. I happen to be in that policy—in the middle of one of those functions—I’m printing this message out.
When a resource has an attribute that's not in the allowed list, I'm printing out something saying, "Hey, it has this attribute and machine type with a value n1-standard-8 that is not in the allowed list.” I'm trying to give the person who violated the policy useful information that will let them know what resource they need to fix and why—why do they have to fix it? Well, because they used a machine type is not allowed.
I am going to close this and take a look at the test cases for AWS. I'm going to open up the 4 Sentinel mock files here. Here we can see, for instance, this is an S3 bucket, and ACL here is set to private. Remember, this is the pass test case. This is the one that should conform to the policy. Private is good—and we have server-side encryption stuff set up with a KMS key. That's all good—that one's fine.
This one is fail—KMS. This one, we can see that it's missing the server-side encryption. There’s no KMS key for this one. That's a test case. This one says fail ACL is the test case. The ACL is public-read. That's not allowed. It does have the server-side encryption with the KMS key. It's going to fail one of the conditions. This one is going to fail both conditions. This one has ACL public-read, and it has the server-side encryption as empty. That's our fourth test case.
When we go back to the terminal session—and I can do S3—we see all four of these test cases pass. If I go here and add the verbose flag, we get a whole bunch of messages. Specifically, we get this first test case fail ACL and KMS—it gives us the two error messages. It's saying that it's not private, and it doesn't have a KMS key. The second one—fail ACL—is saying that this bucket—this log message here—has an ACL public-read that is not private. You get the idea. Fail KMS is saying, there's a message saying that it's not encrypted with the KMS key. Then the pass one is okay.
Useful links
That's the Sentinel Simulator. That's the demo. I would like to come back to the slides. I want to give you guys some useful links. The links here are specifically for the Sentinel documentation, also the Terraform Sentinel documentation—Sentinel doc specific to Terraform. Also, documentation on generating and using the mocks.
I have a link to this GitHub repository. The fourth one is that Terraform guides repository and specifically the governance section of that. That has a lot of sample policies. Look specifically at the second generation policies that are obviously newer than the first generation policies—you’ll find those more modern, comprehensive, and more production-ready.
They're doing things like iterating across all resources, printing better error messages, doing things like giving you the full address of any resource that violates a policy. Think addresses like module.a.module.b.aws_instance.somename.0, whatever.
Then there's this link to this guide I mentioned for writing and testing Sentinel policies for Terraform. You can read it in HTML if you want, or you can download the PDF version. That has a lot of useful information. This is the second edition. I wrote one edition, and then I figured out how to make it even better.
That's basically it. I wanted to wrap up by thanking all of you for attending. I hope you found it useful and interesting. I know this was pretty deep and technical. But I hope it was useful to those people who either already are using Sentinel or who might be using Sentinel in the future. This will give you some ideas of how to go about testing them.
My contact information is here. I'll hang around up here in the front of the room for 10 minutes or so. And feel free to talk to me outside or email me, or you can reach me through the Slack. Thank you very much.