HashiCorp Terraform and Vault on Oracle Cloud Infrastructure
Learn how to use Oracle Cloud Infrastructure's single pane of glass to orchestrate and manage all your infrastructure and secrets using Terraform and Vault.
Oracle Cloud Infrastructure paired with Terraform and Vault provides speed, standardization, and ease of use, allowing operators to move fast and create repeatable production environments.
In this talk, speakers Alex Ng and Leon Kuperman from Oracle will demonstrate how you can integrate Terraform and Vault with Oracle Cloud Infrastructure. The talk will also feature a quick preview of Oracle's new Terraform feature that lets you generate all your Terraform configuration in HashiCorp Configuration Language (HCL) from existing deployments.
Speakers
- Alex NgPrincipal Member of Technical Staff, Oracle
- Leon KupermanVP Security Products - OCI, Oracle
Transcript
Leon Kuperman: Hey, guys, how you doing? We're going to be talking about a few cool things today. A bunch of demos we've got lined up for you today, and just a fair demo warning: Alex is going to do one live, so we're taking bets over/under for the demo gods.
A couple of tenets to start with. Want to describe our partnership between HashiCorp and Oracle.
It's been a couple of years in the making, and I think we've hit this interesting balance between supporting open-source projects and Hashi's implementation of those for commercial enterprise customers and the kinds of things that we're doing in Oracle Cloud Infrastructure (OCI) for cloud and cloud-native that I think are jelling really well.
We're going to show you some of that today.
Today we're going to show you our Terraform provider for OCI, and an interesting way of managing OCI resources. One of the things Alex is going to demo is how we reverse-engineer those resources back into infrastructure as code, if you've done things a different way.
Then I'm going to demo a couple of things around some announcements we've made for Vault. So we've got some OCI-native plugins, 3 of them that we're going to explore, and I'm going to give you a demo of a few of those.
Alex?
Deploying and managing infrastructure in OCI
Alex Ng: Thanks, Leon. Good afternoon, everyone. I'm here to talk about how we deploy and manage infrastructure in OCI.
One of the challenges around that is the astounding number of resource types that we support in OCI. These are things that encompass compute, block storage, load balancers, databases, and so on.
Being able to describe the relationships between all those things and how to deploy them is a challenge for us, and it's been solved with Terraform. We think Terraform is a great tool for doing that.
We also have an OCI provider, which allows users to provision their Oracle resources in Terraform, and this is something where we work closely with HashiCorp to release this functionality on a weekly basis.
It's tightly integrated so that any new resource in OCI that can be supported is supported in Terraform from Day 1.
On top of that, we also have another component called the OCI Resource Manager. This is our hosted resource management service. Again, we work with Terraform to allow customers to collaborate on their infrastructure as code and also to manage their security accesses to any infrastructure as code that they collaborate on.
The point of OCI Resource Manager is to reduce a lot of that operational burden of collaborating on infrastructure.
As you can see, we already have great tools for deploying resources in OCI, and Terraform has been a great partner in helping us achieve that.
Helping OCI customers with Terraform
But more specifically today, I want to address a gap in this story. And that is something that a lot of our OCI customers encounter when they first use our cloud. They have a lot of interest in Terraform, but they don't have enough of an understanding of the syntax, of the functionality, to know how to leverage it to build out their initial infrastructure.
For them, it's often easier to start deploying and learning about these cloud resources through our web portal. They click through a set of guided steps and use a graphical interface to build up their infrastructure.
Over time, though, this becomes a problem, a management problem, and also a loss of fidelity about their infrastructure. Because as they add more resources through the console, their infrastructure becomes more complicated.
They have maybe tens to hundreds of instances they may need to manage, and so they lose the recipe for how to rebuild that infrastructure and duplicate it. Which is where Terraform configurations come in. They also lose the ability to manage that infrastructure and deal with configuration drift, and that's where the Terraform state files come in.
Terraform is a tool to deal with these kinds of problems. But what can customers do about this already-deployed infrastructure that they've done through the console? The options right now are unappealing.
They can rebuild their Terraform infrastructure from scratch and rewrite all of their infrastructure that they've already deployed as Terraform configs, or they can write in-house solutions for running terraform_import
command on a per-resource basis.
Those solutions are time-consuming and not that appealing, to be honest.
How can we improve this experience for our Oracle cloud customers?
We think that the provider that I mentioned before contains a lot of the functionality that we can reuse to help rebuild and recover users' Terraform artifacts.
But before I go into how that works, I want to define some of the goals for how this should work. The Terraform artifacts that we should generate—and when I say "Terraform artifacts," I mean things like the state file and the configurations—they should be generatable by pointing at a customer's OCI compartments.
For those of you who are not familiar with what a compartment is, in OCI, a compartment is a grouping of logically related resources. Think about all the instances inside a subnet, inside a virtual cloud network. It's the unit of isolation for a customer's infrastructure.
The Terraform configurations that we generate from these compartments should be reusable and redeployable in other compartments. This is something that I think it would be useful if you needed to duplicate the same infrastructure again and again and again.
Also, one of the key ways in which we do this is that the configurations we generate have to preserve the relationships between the resources. That way, if I run a terraform_apply
again, all the ordering of how those operations and resources get deployed is preserved.
And finally, we believe that any already-deployed infrastructure in your compartments should be migratable to Terraform for management. This is where state files come in. The state files that we generate should reflect the actual state of those resources that you created in the web portal.
Preserving already-deployed resources
I'm going to dive a little bit into how this works inside our provider. I may use a lot of terminology that is really technical, but bear with me.
On the left-hand side, we have this example of a compartment with resources that are already deployed. These are the building blocks for the Vault infrastructure that Leon is going to talk about in a bit.
This compartment contains resource types that are dependent on one another. For example, you have a virtual cloud network, a subnet, and you have 1 or more virtual machines that are behind a load balancer.
These things are dependent on one another in the sense that I need a virtual cloud network first before I can build out a subnet, and I need a subnet before I can attach instances to it. And each of these things has an OCI, what we call OCIDs, but they are really just unique identifiers for these resources in our cloud.
How do we build out this inventory of all these resources in the user's compartment?
First, we start with this notion of data sources in our provider. We have data sources that allow you to discover resources inside a compartment. With that we're able to build out this list of resources that are related by traversing this dependency graph that I talked about earlier, wherein we found a VCN and then we should be able to find all the subnets under that VCN, and so on and so forth.
Once we have that, our next step is to build the resource representation, again using the data sources by populating all the required and optional attributes of those resources.
Once we have that, we can also come up with logical Terraform names for those resources. And once we have all that information, that gives us the starting point to create the Terraform HashiCorp Configuration Language (HCI) files, as well as the Terraform state files using terraform_import
commands.
Demo'ing the process
Let me dive a little deeper into this by showing you a demo. As Leon said, this will be a live demo, so fingers crossed.
For this demo, I've provisioned a set of resources in our web portal already. These are the same resources that I showed you earlier in the previous slide with the diagrams. This has a virtual cloud network, and within the virtual cloud network there's a subnet and 1 or more related networking resources. Each of these has its own OCID, or unique identifier.
Leon Kuperman: For those of you who haven't seen this, this is the OCI console. Next week we'll be announcing some pretty cool stuff at OpenWorld which will give you guys all costless access to these resources. Check it out. You'll be able to get some cool stuff next week.
Alex Ng: We have all these networking resources here that have already been provisioned. And under these networking resources, we have associated compute instances as well.
For this demo, I'll try to create the Terraform configurations and state files from this already-deployed infrastructure.
The way we do that is through our existing Terraform OCI provider. As I mentioned, this provider is something that we release on a weekly basis in HashiCorp. If you want to give this provider try, I can share the details with you later.
You can see that the extension of this command contains a command to run. Here we export the compartment that we want to generate these configs for. We have a name of the compartment that we want to target and just a place where we generate all the files.
This is traversing the predefined dependency graph that I mentioned earlier, wherein again a virtual cloud network is depended on by subnets, which are then depended on by instances, and so on and so forth.
When it's done, you will see a summary of all the different resource types that we were able to discover in that compartment. We have 10 core resources. Core is our grouping for compute block storage and networking resources.
We also have 6 load balancer-related resources. These are things like listeners for the load balancers, backends. I can also show you the files that were generated.
In the core file, you'll see that we have generated the HCL syntax for one or more instances that we were able to discover. The syntax is compatible with both Terraform v0.11 and v0.12, so if you haven't moved over to v0.12 yet, you can still use these generated configs.
You will also see that we have preserved the dependencies between these resources in Terraform. This is done through interpolation syntax, which references other resources.
In the case of this instance, for example, we know there's a dependency between this and the subnet. If I were able to run apply
on this, you would see that the ordering of those resources is going to be deployed in this order.
One other cool thing we can do with this is upload all of these generated Terraform config files into our OCI Resource Manager service. I can show that very quickly.
Going back to the web portal here, we have this notion of stacks in our resource manager. A stack is just a collection of all the config files and state files needed to recover and manage your infrastructure.
What we can do is create a stack here, but before we do that, we should zip up all of our configuration files. Once we have them all zipped up, we can upload them into the stack.
What you'll see is a set of parameters that the resource manager was able to detect or variables that they were able to detect from the Terraform configs. For example, here is a compartment OCID.
This is an identifier where you can change the OCID you want these configs to target. Say you wanted to duplicate this infrastructure in another compartment. Modifying this value will let you do that.
And if I were to create this stack, you'll soon see that I have the ability to perform different Terraform actions on these generated configuration files. Things that you probably already use, like terraform_plan
and terraform_apply
.
We can run a plan
command here. As this is running, I can show you another plan
command that I have previously run with the same configuration files.
What you see here is output that you would expect in a terraform_plan
command that you ran locally. You'll see that when running in this config, it's planning to add 18 resources.
So there's something missing in this demo right now, since I've already created those 18 resources in the console. Why is it telling me that I need to create them all over again? This is where the state file comes in.
Generating the state file
For the next part of the demo, we'll generate the state file. The command to do that is the same as before. All we need to do is add a flag to do that.
What it's doing right now is very similar to what we were doing before. We're discovering all of the resources in the compartment. We're also calling the import
command from Terraform to generate all of those resources.
As you can see, it's going through the dependency graph again. Once it's done that, you might be familiar with all these log messages. These are the same messages you would see if you ran a terraform_import
command.
The great thing about Terraform is that they expose a lot of the libraries that they use that implement their command functionality. We've worked with HashiCorp to leverage a lot of that functionality here so that we can import all of those resources we discovered into a single state file.
Now that it's done, we see that it's discovered the same resources as before. But the true test of whether you were able to recover the state is of course to run a terraform_plan
command.
The resulting output of this command is, I would hope, that no change in infrastructure is needed, because the state file should already reflect the state of these resources. And it shows that no changes are needed.
Another thing we can do with the state file is, because it's officially managed through Terraform, we can start making changes to the infrastructure outside of Terraform and see how it deals with configuration drift.
One way we can do that is through a command-line tool called the OCI command line. With this command, we're updating the display name of one of the virtual cloud network resources that we have in our compartment, and we'll see whether it's able to detect this configuration drift.
Again, I want to emphasize, this is something that's happening outside of Terraform. This could be anyone else who has access to your infrastructure making a change that Terraform is not aware of.
We'll name this "drifted VCN."
It's updated the display name of this resource outside of Terraform without Terraform's knowledge. If we were to run a terraform_plan
again, you will see that it's detected this drift from the configuration, and it's now saying that I want to move back to what my configuration said.
At this point, you can start adding new resources to your configs, start building out your infrastructure even more using Terraform. I think this is a great tool for that right now.
This functionality is currently in preview, but we do plan to release it as part of the official Terraform OCI provider in November. Once it's part of the provider, you can download it from the HashiCorp registry and try it out against the OCI resources.
If you're interested in accessing an early version of this functionality in our provider, please reach out to me. My email address is there.
And with that, I'll hand it back to Leon.
Leon Kuperman: Thank you, Alex. That was super cool.
New plugins for Vault
Let's talk about Vault and our new plugins. Today, we are announcing 3 new plugins for Vault that will allow Vault to work natively with OCI. It's all going to be part of Vault release 1.2.3.
We're doing 3 things. The first thing is, we're integrating Vault with our native key management service.
KMS, for those of you who don't know, is a cloud-based service that allows you to safely store your encryption keys. In our case, it's stored and backed by FIPS-compliant HSM hardware, so it's a hardware security module.
The cool thing about our HSM is it gives you a very high degree of tamper-proof capability so that you know that your keys can't be touched or mangled outside of the HSM.
We're offering an identity plugin that allows Vault to understand OCI identity and talk about things like principles, so IAM principles for us, our users, instances, like virtual machines, and also other types of resources like Docker containers, for example.
We're also announcing the ability to store your secrets securely and durably in our object store.
If you are familiar with S3, this is OCI's version of a key-value store that's highly available, highly redundant. The cool thing about this plugin is it enables HA for Vault.
If we can get time to do the demo, I'm going to show you guys how we do leader election with Vault using object store as the backend.
The KMS plugin allows for automated unseal of your Vault. As you start up and initialize Vault, you have to go through this initialization process. Normally that involves 3 people getting together and taking part of their key material, and you assemble it all together. That is an operational headache.
With unseal, we're allowing KMS to store the encryption key, the master key for Vault. And we'll show you in the demo how you can initialize it using our IAM and instance principles with no operator intervention, which is pretty cool.
That automated unseal is a big deal for us.
The last one is object store plugin. This is the reliable backend that will store your secrets in a durable way, and then will also allow for that high-availability scenario that we're going to demo.
I'm not doing the unseal demo live. Let's hit the video.
Auto unseal
This is our console, as you guys saw before. The first thing we need to do before we can show you the unseal capability is show you how to work with a KMS key, so we're going to do that right now.
We have to do that by navigating the menu here and going into "Security and Key Management." We already have a Vault demo key that we've created. What we need out of here are 3 attributes.
We need the OCID, which is our global ID. This is going to be the pointer to that key. Then we need the management and encryption endpoints.
Once we have those 3 assets, we can go into our plugin configuration file, which I'm going to show you in a second, and configure the key management plugin to use this cloud-based resource.
Let's take a look at how to do that.
I've got 2 nodes up here. Node 1 is my primary. This is going to be my master. Then I've got node 0, which is our failover in this case.
I want to start both of them, but before I do that, let's take a look at the config file. I've got my key ID that I've entered, and I'm also showing the crypto endpoint and the management endpoint, and all of these 3 things are really important for the plugin to function correctly.
Let's start Vault up as just a regular server. We're going to point to our configuration file. Great, it started up, but we can't use it yet, because it's still sealed. Let's take a look at the configuration. We'll show that to you in just a second.
We have both primary and secondary running. You see in the status that the thing is sealed, and so we can't really get it running. Now we're going to do an init. The init is normally where you would have to use that 3-person multi-factor algorithm to get operators to open up and unseal a Vault.
But here, because the key material has never left KMS, we're able to do this all automatically using a combination of instance principles and our KMS plugin. Now we've got an unsealed Vault, and we're ready to start working. So that's demo No. 1.
The auth plugin
I'm going to show you a little bit about how to use auth and identity within the plugin.
A little bit about IAM and instance principles. Before we can get to how Vault uses these principles to do its job, let's talk a little bit about what these principles are and how we use them.
What I'm showing you right now is something called a "dynamic group." A dynamic group is nothing more than a set of resources that we will reference as a security group with its own OCID. Things will be part of that dynamic group, but we'll be able to associate that group with something inside of Vault.
In this case, we're going to associate it with a policy, and I'll show that to you in a second.
We don't have time to go over a lot of the IAM right now, but if you ping us later, we'll show you where all of the documentation is for IAM.
Now let's talk about the plugin. The auth plugin is in GitHub along with the rest of the release we already showed you.
Let's take a look at a demo policy. We have a role here that associates the OCID for that dynamic group along with the demo policy. What I want to do is take a look at what it's capable of doing.
In our case, we're going to take a look at the demo policy, and it's going to show us that we've got some capabilities around create, update, read, list, and so forth.
But now let's start a session. Let's do something meaningful with our principles.
This is node 0, and node 0 has an instance principle. Already in OCI, node 0 is associated with this dynamic group. Theoretically we should be able to log onto Vault without providing any credentials, just because we're using the concept of this principle.
We're going to use our login command, and you're going to see that we're authenticated with a specific duration. But notice how I didn't have to give a root certificate or any other type of credentials. This is the power of IAM gluing things together for you. It's creating a trusted relationship between this node and the Vault.
What can we do with this? In our config file, we see that there's a secrets path. And what I want to do is go grab a secret. This shows that the authentication worked and we're able to do something meaningful within Vault.
We're going to dump a value out of the Vault, and we see that we have a database password of "mysecret" that we were able to get, again, with no credentials, no certificates required in real time, just based on our instance principles and management policy.
I think we have time for our last demo.
The object storage plugin
I'm going to show you guys a failover between an active node and a passive node in real time. What we're going to do is to use leader election to do that. We've got 2 buckets configured, a data bucket, and a leaderlock bucket.
The data bucket is just used for storing secrets. The leaderlock is what's going to be used for figuring out who needs to be the master and who needs to be in standby HA.
We've got some policy again in IAM that sets this all up for us, some detail that basically allows these buckets to be associated with the dynamic group that we were just talking about.
Again, object storage details are on the site. We can talk about it after this session. Let's do the demo now.
We've got 2 environments. We've got node 0, and we've got node 1, and you'll see that node 0 is configured for storage. It's HA-enabled and we've got the configuration value around where the leaderlock is going to go.
They're both in started mode, and the node on the left has the acquired lock, which means it is currently primary in the cluster.
What I want to do is, I want to figure out, How do we shut down node 1 and have node 2 pick things up nicely and automatically?
Let's take a look at Vault status. Here's the HA cluster. The left is on standby, and the right is currently primary. We hit control C on the primary and what happens? A couple seconds later, the HA picks up its config and says, "I just acquired my lock."
It did so through object store on the OCI backend, and all of a sudden we have a primary that is without any downtime whatsoever, all done because of that HA object start capability.
Those are the 3 quick things we wanted to show you about the plugin. Hope you guys liked that.
In summary
Just to summarize, a couple of high-level points that I want to leave you with.
Customers choose OCI for their enterprise workloads. Why? Because we have a very strong compute and cost performance. We have a very strong set of bare-metal instances that customers can leverage, and extremely predictable behavior over time, so you don't have this randomization of how your instances are running.
We have the world's best autonomous self-healing, self-patching database that requires no downtime. This is a really cool announcement, because we can take the power of Vault, where customers are using Vault to do encryption and secrets management and integrate that into an enterprise platform that customers need and are using today.
Alex, what would be the one point that you would leave folks with from a Terraform perspective?
Alex Ng: I would say that for folks who are already using Terraform, or any tool for that matter, which you love and know how to use it, OCI can provide those tools for you to deploy and manage your resources.
Leon Kuperman: Thank you, guys. Thanks very much.
Alex Ng: Thank you.