Seeding HashiCorp Vault With Terraform at Form3
This talk is going to take you through how to set up HashiCorp Vault securely using HashiCorp Terraform to give you a repeatable process.
Speaker: Kevin Holditch, Form3
» Transcript
Hello, everyone, and welcome to my talk on "Seeding Vault with Terraform." My name is Kevin Holditch, head of platform engineering at a company called Form3.
Before I get into my presentation, I just want to state that if you like any of the tech that you see in this presentation, we are hiring, so feel free to reach out to our recruitment team if you want to come and work for us.
So to set the scene for this talk, I need to explain a bit about what Form3 does as a company.
Form3 is a cloud-based API that sits over payment schemes all around the world. Banks can connect to that API to get access to those payment schemes.
As you can imagine, when you're moving millions of pounds (£), security is paramount. That's where the use case of HashiCorp Vault comes in within our platform.
A Form3 platform is an entire working version of Form3, and today we have around 10 development environments, 3 stagings, and 3 productions in different regions serving different customer bases.
In each of those platforms, we run a single Vault instance, and we have a microservice architecture at Form3. Each of those microservices has different needs for Vault.
Because we use Vault for secrets storage, each of those microservices needs an authentication mechanism to be able to authenticate with Vault and get access to the set of secrets for that microservice.
Some microservices also need the ability to write back to Vault if they're communicating with a third party and need to save sensitive information.
Each microservice has a different profile within Vault. Some of them just need read access to Vault, and some of them need write access.
But it's definitely not a one-size-fits-all problem; each microservice has a bespoke set of requirements.
» Seeding Vault with Initial Secrets
That outlines how we use Vault at Form3. What are the problems that we're facing then? What are the issues?
To configure Vault, you have to log in inside an internal environment, because the endpoint is not exposed onto the internet, and you have to initialize Vault.
Then you come across the problem where you have to seed it with initial secrets. When you're building a new environment, you have to get those secrets in there for your applications to use, and it's kind of the chicken-and-egg problem.
So how do you go about seeding Vault with those initial secrets? Vault is quite hard to keep in sync because it's a manual process to set it up.
You have to log in as administrator with a root token and configure it for your applications to use. That's quite a manual process.
So at Form3 scale, with around 15 versions of our platform, you would have to manually make that change you want to make.
If you use this manual process to add a new secret to 15 Vaults across 15 environments and keep those in sync, it's close to impossible to see what's configured in Vault.
That’s because it's not written down anywhere. It's just based on what someone's manually configuring when they log in to the environment.
And it's very slow to roll changes because it's a manual process. When you're at scale and you've got several environments, making a change to Vault such as adding a secret or changing a profile that an application is going to use means you have to make that change across, in our case, 15 environments, and that's very slow to do.
It's an insecure process, and at Form3 we're dealing with large volumes of money. We don't really want people to SSH into the internal network and have high-privilege access to Vault in order to make changes, especially when rollouts happen fairly often.
We really want to limit that access. And it can be very cumbersome to add and remove microservices. And at Form3, we're constantly building new products, adding new microservices. We might deprecate products, fold microservices together.
As those microservices change, we want to constantly change the configuration of Vault to match what we're currently running. It's bad practice to leave active profiles for microservices you might have removed.
We took a step back and we thought, "Vault is really good at solving the problem of saving secrets in a secure way and allowing access to those. But it's hard to configure Vault itself."
» Terraform to the Rescue
So we had to look at a tool that we're already extensively using at Form3 for our infrastructure: HashiCorp Terraform. We thought, " Terraform solves all these problems."
It allows you to define your infrastructure as code.
It's super fast, because you can run all of your Terraform workspaces in parallel.
It gives you an automated workflow, because you can use automated runners such as Terraform Enterprise or Terraform Cloud.
All your environments will be kept in sync, because that's what Terraform does.
It makes your source code into a reality by applying that to your infrastructure.
It eliminates drift, because your source code is being matched up to your environments.
You solve the problem of being able to see what's configured in Vault. When an engineer wants to know how a microservice stores its secrets in Vault, what access it has to Vault, they can see that all written down in code in Terraform.
» Allowing Terraform-Vault Communication
But the problem then is that Vault has only got an internal API. How do you get Terraform to talk to Vault?
The way we solved that problem was we bought Terraform Enterprise, and we set Terraform Enterprise up in a secure AWS VPC. Then in each of our Form3 platforms has their own Vault deployed, and those VPCs, which are per environment, are VPC-peered back to our company VPC, which is where Terraform Enterprise runs.
Now we can write our Vault configuration as Terraform code using the standard Vault Terraform provider, and we can have an instance of a workspace per Form3 platform or Form3 environment.
In this diagram we've got a dev workspace to configure our dev environment. We've got a staging workspace to configure our staging environment. We've got a prod workspace to configure our prod environment.
Because it's VPC-peered, we can set up a secure network route to allow Terraform access to that Vault endpoint, but no one outside of this internal network can touch Vault.
The only manual process with this setup is a one-time process at launch of a new platform. You have to log in and initialize Vault, which you would have to do anyway, set up a secure AppRole with fairly high-privilege abilities.
You give that AppRole ID and secret to the Terraform workspace and set those up as sensitive write-only variables. That way Terraform has access to the Vault running in your platform. It can talk to it over that internal secure network, and you don't ever need to manually configure it again.
» Tips for Making All This Work
For the secrets, it's really great to use Terraform variables. You can set default values for a Terraform variable, which means in most of your instances of your platform, say all of your development instances, you probably don't need real secrets.
You can use default values for your variables to set dummy values on a real development environment. Now, if someone's setting up a new development environment, they can apply that workspace and they don't have to set any variables because they can just use all of the dummy values.
On staging and production, where you probably need the real secrets, you can have your company administrator come in and set those values.
It also allows you to use sensitive variables in Terraform, which means they're write-only and someone can't read those back again. So it's very secure to be able to set secrets and add new secrets.
And it allows you to set different values for different platforms. If you've got one production that needs a certain secret and another production that needs a different secret, it allows you an easy way to do that.
» Tips for Not Leaking Vault Secrets
A few points on security that you need to adhere to so that you don’t leak your Vault secrets out to your engineers or anyone that has access to Terraform.
First, it's really important to limit who has access to Terraform state. Depending on how you set your Terraform code up, it is possible to leak your Vault secrets into the Terraform state.
I'd really recommend not giving anyone access to that Terraform state for those Vault workspaces. It limits the ability for anyone to gain access to those secrets.
It's also important to lock down the source repository branch. The reason for that is otherwise an engineer would be able to create a branch, change the workspace in Terraform to use that branch, and then output all the secrets and do malicious activities to exfiltrate your secrets.
You need to lock down the source repository branch in Terraform, and make sure you use a protective branch on your source repository vendor such as GitHub.
You can have a protective master branch, and then the only way to make a change to Vault is via an audited PR process that can be reviewed. No one can circumvent that.
It's also very important to limit who has access to the variables, so using only sensitive write-only variables is really good, because it means someone can't read those back. If an administrator sets them, someone else can't come in and read that value.
But someone could maliciously overwrite the value if they have access to it.
It's important to think about who has access to those variables.
What are the advantages of this approach? through a lot of them when we talked about Terraform, but this allows you to synchronize your changes with your source code across all of your environments at the same time.
It eliminates the drift between your environments, because no longer have you got this manual process within your pipeline for setting up environments. We've fully automated it using the power of Terraform.
Your configuration is now staged in code, so engineers can have a look at how Vault is configured and make changes. They might need to add extra secrets to their microservice, they might need to change what the microservice can do within Vault, so it might suddenly need write access to save a secret, in flight, for example.
They can make all those changes in an audited way, and then you can have a PR process to review that into production, and Terraform would apply that to all the environments.
It's also super fast for rollout now, because once a change is merged into the main line, Terraform would apply that to all the environments at the same time versus the world we were in before, where you had to manually add that change to each environment by hand.
» Testing Locally
Now we are managing Vault using Terraform. But we still had one small problem and that was, to be able to test our microservice and the configuration of Vault, we had to deploy it into AWS, we had to deploy into our environment.
What's really important in software engineering is to make that testing as close to the engineer as humanly possible, because that really reduces the feedback loop, allows you to find errors faster, and speeds up development time.
What we thought was, "Can we test this Terraform configuration locally on an engineer's machine without even applying it to AWS?"
That's the problem I'm going to talk you through now: how we solved that.
The first part to solving this is that part of our Terraform source code repository that has the code in to manage Vault, we also add a Dockerfile in there.
The Dockerfile is fairly simple. Literally all it does is take all the Terraform code, copy it into the Docker container, and then, when you start the Docker container, it runs a Terraform init and a Terraform apply.
As part of our build, when we merge a change into the main line of our Vault configuration repository, we build that new container with all of those changes into a registry.
It's also possible to make a branch build of that container. Now, each of the microservices that we have at Form3 has a lot of end-to-end tests that can be run as part of a test pack, and as part of those tests, we make heavy use of Docker and Docker Compose to spin up a set of Docker containers to represent the infrastructure that microservice needs.
In this example, I've got a microservice that needs PostgreSQL, Vault, and Consul. As part of the pre-test run for those end-to-end tests on the engineer's machine, we would start those 3 Docker containers.
We then fire up the Terraform container that has all of the code from our Vault configuration that I've just described. We use environment variables to point that at our Docker container Vault. By default it would use the real Vault in AWS.
Terraform will now apply that configuration to the Vault running in Docker. What's really great about this is we can now test all of our changes against that local Vault, and you can also add more changes in here if you want.
And you can use that Terraform project to manage Postgres, or manage Consul, or manage any of your infrastructure in Docker. So you could also apply this technique for that.
Once Terraform has run and configured Vault, all of our end-to-end tests for that application will run, and the app will get to exercise its access to Vault and make sure that's all configured correctly.
Now, a developer can take a branch of the Vault repo, they can make their changes, they can build a new Docker container with their changes in, and then they can test that locally on their machine against their application and make sure the application has got the access to Vault that it needs to do its job.
If there are any errors, if their application tries to talk to a part of Vault it hasn't got access to, that will fail but the test will fail on their machine, rather than having to deploy that into AWS to find that failure.
It also means you can exercise some of the more elaborate features of Vault, such as if your app gets credentials on the fly using Vault Dynamic Secrets backed by Postgres, it's possible to set all of that up and get that working locally on your machine as well.
» Key Takeaways
Vault is an awesome tool for managing secrets in the cloud. It allows you a unified way to store your secrets and have applications authenticate to get those secrets.
But it can become unwieldy if you try and manage Vault in a manual way, which is kind of encouraged from the outset because it's a security tool and you need to seed it with these secrets and set it up.
Terraform is a great way to manage infrastructure and keep things in sync. By using this technique of using Terraform to manage Vault, the 2 things work well together to keep all of your Vaults in sync and allow you a nice administrative flow to be able to set secrets into Vault and apply those across your estate.
And lastly, it's great to be able to use Docker containers to be able to test this configuration on an engineer's machine without the need to deploy to your environment, which speeds up that feedback loop and makes your development cycle much faster.
I'd like to thank you all for listening today. My name is Kevin Holditch. Thank you very much.