Cloud Native Azure Infrastructure Deployment Using Terraform
Hear Microsoft Azure's insights about how users are typically deploying cloud-native workloads, provisioning and configuring Kubernetes clusters, and packaging serverless solutions.
As more workloads start their life (or their new life) in the cloud, they need both the platform and the tooling capable of delivering deployments and upgrades that are repeatable, scalable, and fast. Microsoft Azure and HashiCorp Terraform are integrated to provide a great platform and provisioning experience for the cloud.
In this session, Microsoft will walk through common scenarios that they see their customers adopting in Azure for deploying cloud-native workloads, from provisioning and configuring Kubernetes clusters to packaging serverless solutions as HCL. They will also share the roadmap of what is next for Azure and Terraform.
Speakers
- Eugene ChuvyrovSenior Cloud Architect, Microsoft
- Mark GraySenior Program Manager, Microsoft
Transcript
Mark Gray: My name is Mark Gray. I'm a program manager at Microsoft. My job is to make sure that Terraform works great on Azure.
I have an engineering team that works with HashiCorp, primarily on the provider itself, making sure that the Azure RM provider enables all of the features and functionality of Azure, so that you can provision those things. In addition to that, we work on the ecosystem around Terraform in Azure: the Visual Studio Code extensions, the Cloud Shell integrations, and all of that kind of stuff.
I'm here with Eugene.
Eugene Chuvyrov: Hey, guys. Good afternoon. I'm Eugene Chuvyrov. I'm on the commercial software engineering team at Microsoft. Our team works directly with customers to enable them in our cloud.
This session today is going to be case studies of us working directly with customers using Terraform and what we've found, and giving you some artifacts to use readily.
Mark Gray: We'll talk about how a couple of customers are using Terraform in their deployments to Azure. And we'll also talk about some tooling that you guys can download and use based on best practices for deploying Kubernetes to Azure.
We'll go over Project Bedrock, a toolset that you can download and use. We'll talk about Axon and Oracle, and then we'll round up with Azure DevOps and how you can use that with Terraform and integrations.
Kubernetes in the cloud with Project Bedrock
Eugene Chuvyrov: Let's start with Bedrock. What is Project Bedrock? As I mentioned, I'm on the commercial software engineering team with Microsoft, where we work directly with some of the largest customers that you can imagine in our cloud.
Bedrock is a set of guidelines that we have put together to allow our customers to provide or build an enterprise-grade Kubernetes deployment in the cloud.
It's using the GitOps workflow. All of those buzzwords mean that, if you are looking at doing enterprise-grade Kubernetes deployments, you can use those guidelines, which are open source. We have iteratively worked with our largest customers to get a head start on that.
What are the basic principles of this Bedrock project? Leveraging cloud data momentum, we're going to provide you a set of guidelines to deploy complex microservices into this system. It's going to use Azure Kubernetes service, and we're going to capture best practices for deploying Azure Kubernetes service into the infrastructure on Azure.
The critical piece is that we use real-world experiences to build these guidelines out. And again, used by Microsoft customers.
The URL for it is https://github.com/Microsoft/bedrock. You can also Bing for it, or Google for it, whichever you prefer: Project Bedrock on Azure.
Those are the main components of Bedrock. There's Kubernetes; there's a Terraform piece, which is really why it's important for this session; it uses GitOps principles. Those are the 3 topics that we will cover. I will go over shortly.
And the other 2—higher-level definitions and operational tools—are also very important, which I will leave it to you to discover on your own.
Terraform's role in Project Bedrock
Why am I talking about this project at HashiConf, in this session on Terraform? The most important piece is that we provide you a set of templates that you can readily deploy today as part of this project.
As you probably know, Terraform provides a declarative way to deploy your infrastructure. What we packaged as this project is a set of predefined templates, a set of preconfigured environments for common enterprise scenarios that we see today. It also includes multi-operator support.
Preconfigured templates today include Azure single-region deployment, Azure multi-region, multi-region Key Vault, and a couple of projects we are working with today. As far as deployment, you get global redundancy out of the box.
Essentially, today you can do terraform plan
and terraform apply
, fill in certain variables, and get this readily enterprise-grade Kubernetes environment in your subscription.
An example of global redundancy would look like this. We have an API management server and a traffic manager and basically service deployed. And then you have 2 regions in Azure with an application gateway, with a Kubernetes cluster running in each region. And then certain software pieces are deployed for you that can allow you this GitOps-enabled Kubernetes cluster configuration management.
Let me just swap into Terraform code to show you exactly what you get as part of this project. Once you clone the repo, you get a set of Terraform templates. if I open up a cluster, go to environments, you can see there are multiple environments to provide here.
If I look at main.cf
, you can see that we referenced those modules from the GitHub repo, which means you can copy them locally. There's no dependency on any particular file, any particular folder. You can just copy it and work with it from within your environment directly.
You can see we deploy a basic virtual network and then provide a set of variables to provision another Azure Kubernetes service cluster, which will be the first step in your enterprise-grade Kubernetes deployment.
That's simple. As you can see, we try to modularize everything. But a very important piece that I wanted to mention is that with the Terraform 0.12 announcement just a few months back, certainly we're breaking changes between versions. If you use 0.11, you cannot necessarily just use 0.12 out of the box and everything works.
What we decided with this project, after several conversations, was to do separate releases. So a separate release for 0.12, and a separate release for 0.11.
Those customers who are still on the earlier versions of Terraform are still able to use this project. However, for your deployments, we are leveraging 0.12, and this strategy has worked out fairly well for us.
If you're considering switching or deciding between 0.11 and 0.12, all future investments of our efforts in this project will be based on the 0.12 version of Terraform.
GitOps workflow
Let me just cover a few other important pieces that we provide.
Besides providing you a set of enterprise-ready Terraform templates to provision Kubernetes infrastructure in Azure, after you provision this infrastructure, this deployment allows you to operate with this infrastructure easily using the GitOps operating model.
Side note: Kubernetes' declarative system specifies a set of resources, and then Kubernetes works to figure out exactly how the infrastructure should look. A textual manifest declares the desired state. What that means is, if we can declare a state via a source file, we can have this piece of software, that's this Weave Flux, and that pulls the state of the system toward what it should be and reconciles it with the current state.
This is where the GitOps workflow becomes enabled. The GitOps operation works by developers or operators submitting a pull request of how the Kubernetes infrastructure should look. The in-cluster daemon watches the repo and then reconciles the state of the Kubernetes world of what it is today with what it should be.
All of this is enabled by Project Bedrock. Hopefully you get the value of GitOps. It's a declarative code, operation via pull request, provides you a history, source control. All of those things you get out of the box, if you follow the guidelines that we provide for you.
To recap, with Bedrock, we provide you an easy-to-use Kubernetes deployment. You get GitOps workflow out of the box. There's a custom language that we use called Fabrikate, which allows you higher-level specification of Kubernetes configuration. You don't have to use that if you don't like. And then also we deploy some service mesh capabilities with that.
Hopefully you use this project or at least take a look at it. If you are in the very beginning of your enterprise Kubernetes journey, it could provide you a good set of guidelines. If you're somewhere in the middle, hopefully there's still some good value that you can derive from that. Over to you, Mark.
Axon's move to Azure
Mark Gray: Now we're going to dive into a very interesting customer use case that had success deploying to Azure with Terraform. The company's called Axon. I'll spend a few minutes talking about what Axon does because it's kind of interesting, from my perspective. Their whole philosophy is around building devices and software to protect life.
They work with the military and police to develop things like tasers and body cameras, to ensure that, not only are they protected themselves, but they don't need to use lethal force when dealing with bad guys.
As you can imagine, with body cameras, these videos have to go somewhere. There's going to be tons and tons of data that they collect, but just collecting that data and sticking it somewhere is not going to be useful. So they have ways of using AI to reason on this data so that they can use it for evidence in court for the police officers.
They deploy to Azure for a number of different reasons. They migrated to Azure from another cloud, I think, or maybe on-prem. They ended up migrating 20 petabytes of data over to Azure, and they're collecting 2-plus petabytes of data each month. The amount of data is just growing and growing and growing.
What they really like about Azure is that the storage just seems endless to them. They don't need to think about it. They keep shoving more and more data up to Azure, and Azure just keeps growing. That's a really great thing, from their perspective.
In addition to that, they like that it's global. They have customers worldwide, and they can deploy to any number of different regions and take advantage of that, just go where the customer is.
Another reason they like Azure is data governance. As you can imagine, all of this PII kind of data, they don't want that to get out. So they have strict governance that they apply to that, and they're relying on Azure to be able to do that as well.
Deploying new services with Terraform
They use Terraform to deploy new services. And those services, at least when they started out, were made up of a virtual machine, storage, and networking. They deploy to the different regions. They deploy from the commercial cloud, they deploy into the government cloud. They've grown since the initial deployment to adding things like configuring functions and logic apps and other first-party Azure-type things. And they're deploying all of that stuff using Terraform.
They have had a great experience with it. They use Terraform because they can do more with less. It makes things less complex. They don't have to worry about manually writing scripts and all of that kind of stuff.
And it's taking their time-to-deploy down significantly. When they were doing it using scripts and manually, it would take weeks, and now they're taking it down to hours. They're getting a lot of benefit out of Azure and Terraform together.
But it has not always been sunshine and rainbows, or super-smooth for them. They started out a year and a half, 2 years ago deploying to Azure with Terraform. And they have worked very closely with our team as well as HashiCorp to make sure that they had the coverage that they needed.
It was a bit painful to start, but they work closely with us, and we work closely with them to make sure that they were successful. So that is something that we're really proud of, that it's not always going to be super-smooth, but you're working through it and you get this stuff done.
Then it turns into this project where, right now, they can deploy any new software, any new service, any cloud environment very quickly and easily using Terraform.
Axon’s environment
I'll show you Axon’s environment conceptually and then show you some pieces of the code. Because they started out a couple of years ago, they started out on 0.11. Their new services they're moving to 0.12.
But the structure that they use from a Terraform perspective is interesting, and I'll show you the code and how they lay it out. They make use of a bunch of modules. They have their own modules that they build for doing things like configuring NSGs (Network Security Groups) and configuring the network and configuring load balancers.
They have modules for each of those things that take advantage of their best practices in the organization. And they have a central configuration that builds on those modules, and variable files they use for their different environments. All they do when they deploy is use a different variable file with the same config, and everything just worked magically.
They use Jenkins at the core of that. That's their orchestrator, to take the files and deploy those things. They also take advantage of Packer.
They have a small set of images that they use. Instead of building a different image and doing complex configurations on their machines, they build a small set of images, and those images are used in their VMs when they deploy them. And any configuration that they need to do after the fact, they use cloud-init to use a simple configuration to do that stuff afterwards.
And they've also built—what is really intriguing to me—a shell script. What wraps all of this stuff up is a shell script. Jenkins is running that shell script to do everything, from plan to apply to calling state in.
All they're doing is calling a simple shell script and that does things like connecting to the remote backend. All you have to do is say, "Run this thing; send it to this environment," and it goes and does it. It abstracts a whole bunch of that stuff away as well. It's kind of cool how they did that.
They're using Azure RM backend for the state. They're storing that back there and restricting the permissions so people don't have access to that. And they're deploying to Azure itself, the VMs and all that.
Cloud Shell, with Terraform inside
Let's jump into a quick demo to show you what their code looks like.
I'm going to show you a couple of things here. I have pulled, not their exact code, but a model of how they saved their code. I've pulled that up into my Azure Cloud Shell. How many of you have used the Azure Cloud Shell? It looks like about half of you.
The Azure Cloud Shell is just a command line, either PowerShell or Bash, inside of the Azure portal.
You open that up and you have access to everything in Azure. What's nice about it is a lot of tools that you need for infrastructure management are already there, including Terraform. You open the Cloud Shell, Terraform's there.
It's the latest version. We update on a regular basis. Every time the Cloud Shell is updated we get the latest version of Terraform. It always has the latest every 3 weeks or whenever that thing is updated.
Eugene Chuvyrov: If you guys haven't worked with Cloud Shell, the cool thing about it is that you login to Azure Portal, click the Cloud Shell button, and you're automatically authenticated, ready to provision infrastructure. It's great for ad hoc development efforts, or just iterating on something.
Mark Gray: And you can have access to that from Visual Studio Code as well. There's a plugin for Visual Studio Code, or you can get access to the Cloud Shell right from within Visual Studio Code. This is essentially Axon's setup that they have from a configuration perspective within Terraform.
They have that script that I was telling you about.
Eugene Chuvyrov: It looks like you're showing a Visual Studio Code interface built-in.
Mark Gray: Yes, I am. And you'll see in a second that we have IntelliSense in there. If I come into Terraform and do a tf-deploy.sh
, you can see some of the options that they have.
When you need to run this, you tell it what you want to do—you want to plan, apply, destroy. There are some defaults for that and the environment you want to deploy it to. It's going to take what's in this main folder and run that configuration. That's just a set of configuration that's broken out into their individual consumable pieces that take advantage of the modules that are over here.
And then deployments has their different environments in the tf-var file. The tf-var file has all of the different variables that are passed into that configuration itself. It gives them a lot of flexibility. If they're going to deploy to another cloud, they add a new tf-var file, run that script, and it just gets deployed. They do all of that through Jenkins, but you can see conceptually how they do that there.
In the code editor in Cloud Shell, we have good syntax highlighting. If you need to make changes to the code, you can come in here and make the changes in a really efficient way.
The Terraform connection in the Oracle-Microsoft cloud link
Eugene Chuvyrov: All right, perfect. So that was Axon. And the next customer we're going to talk about is Oracle.
Microsoft and Oracle linked up their clouds. Today in the keynote we saw how you could link up the VM cluster with EKS cluster using Consul service mesh gateways. Well how about we take it one step further? Why don't we link up to 2 clouds?
So there's Oracle Cloud and there is Microsoft cloud. Got a little bit of coverage at TechCrunch. That was for a couple, 3 months ago.
Why would we do that? Why would we link up 2 clouds? The idea is to let each cloud do what they're best at.
For Oracle Cloud, that's about managing Oracle databases. It's about management, patching, etc. For us, that's running your applications, running your infrastructure on top of Oracle. If we link up the clouds in a very fast fashion, we should be able to provide you the best of those experiences. The idea is, we use the private infrastructure backbone that allows this high-speed connectivity.
Why are we talking about this in a Terraform session? Well, when we talk about linking different clouds, the most natural tool of choice that comes to mind would be Terraform. Because Terraform is multi-everything, as you've seen in the session this morning in the keynote.
Couple quick points: This is the first time any cloud provider has done this, linking 2 clouds. Why did we use Terraform? Clearly, multi-cloud, cross-cloud deployment model. Both companies can speak Terraform. We probably speak very few other languages together.
It allows us to iterate quickly and with the community involvement; community involvement is critical for us. We welcome community contributions. If you guys contribute Terraform provider for Azure for our modules, thank you very much. Please continue doing so.
Terraform is open source. A lot of issues we run into are out there in the open, and there's a chance somebody will pick it up and help us solve the issues.
As a result of those efforts, you can do a few things in Terraform via the open-source model that you cannot not even do with PowerShell, Azure CLI. Just because somebody jumped ahead and said, "You know, this particular piece of functionality, we're going to go ahead and address that."
Another factor in linking these clouds is integrated identity and integrated support, where customers can dial in one support number.
The cloud link architecture
This is what the basic architecture of the solution looks like. The blue is Azure; the red is Oracle. As I mentioned before, in Oracle you have the database tier. The Oracle actually manages this for you. Your database resides on Oracle Cloud and then via high-speed connectivity you're able to talk to those little figures in blue there, which is the app tier and the network gateway that enables this connectivity model.
In looking at this diagram now, I'm thinking from this morning, or just in general, certainly there's an opportunity to potentially evaluate service mesh gateways and see how they perform relative to the configuration that we have today.
Because in this infrastructure today you have to be very careful about planning your address space. The address space in Oracle Cloud cannot overlap with the address space in Azure. This is certainly requires the coordination and planning on both sides. Something that could be eliminated if we use Consul today. That's a definitely a project for the future.
In the components overview, the Azure-OCI Cloud Inter-Connect, which is the basic networking pieces that allow the clouds to talk to each other. And in addition to the basic interconnect, we're releasing in the open source the scripts to provision Oracle applications in Azure. Oracle JD Edwards and Oracle E-Business Suite are going to be added in this repo, but you can also search for or just go there directly.
All right, I'm going to step through the code really quick. All of this is open source, again.
In an ideal world, all you would need to do is call this repo on terraform plan
, terraform apply
, and everything is provisioned for you. Unfortunately, we're not quite there yet because the connectivity pieces have to happen on both sides.
From the Azure side, you can see here's InterConnect-1, and you can see a very simple setup procedure from the Terraform standpoint, but otherwise a fairly complicated thing to set up. You set up what is known as an Azure ExpressRoute circuit.
The ExpressRoute circuit is something that will enable high-speed connectivity via private backbone with Oracle Cloud. Once we're on this terraform plan
and terraform apply
, we have an ExpressRoute circuit set up. The Oracle Cloud, on their end, they have to provision their high-speed connectivity by Oracle FastConnect.
When this linkage happens, that's when you have the high-speed connectivity available to you. You can see, this stuff was automated via Terraform, and Oracle needs to come in on the backend and do it manually a bit at this point. Once this happens, the circuit becomes complete.
Then there is a step 2 here in InterConnect. All we're doing is setting up the additional pieces of infrastructure via this additional piece of Azure-specific infrastructure that allows this conductivity to be fully realized.
And this is Azure RM, Azure Virtual Network Gateway, and the public IP being set up. Once these 2 steps are completed, then the connectivity is established and this magic of linking up the 2 clouds is realized.
The main point there is Terraform allowed us to do this. It allows to talk the common language, allowed us to link up the clouds. I'm not saying it wouldn't have been done by any other means, but it would certainly have been a much more difficult conversation to have.
Azure DevOps and Terraform
Mark Gray: We wanted to talk a little bit about Azure DevOps. Like I mentioned before, I'm part of a team that focuses on making sure some open-source software works well with Azure. Jenkins and Spinnaker and Ansible and Terraform are all products that we make sure work well.
That being said, we do have a first-party CI/CD solution that a lot of Microsoft and other customers use. And I think one of the things with Azure DevOps that I don't know if a lot of people know is how cloud-agnostic it is.
It obviously runs in our cloud, but it can deploy to any cloud or on-prem or anything like that just because it runs in the cloud. If you have access to your on-prem, it can deploy there as well. There are a lot of great things about that.
From a Terraform perspective, a lot of customers that use Azure DevOps or Terraform with Azure want them to work well together. There are a number of different projects on adding extensions to Azure DevOps.
Those extensions let you use Terraform easily with Azure DevOps. And some of the things that those extensions do are things like setting up your remote state easily and allowing you to save the state there, so it's in a secure place.
One of the things that you'll get into as soon as you start using Terraform in any cloud is making sure that you deal with secrets correctly. And a lot of times you have a chicken-and-egg problem where, just to get set up, we need to have a secret somewhere.
With Azure DevOps, you can use a cloud connection where you can get that set up so people can just create their pipelines and do their deployments. So that bootstraps that bit as well.
But Eugene's going to show how we use Azure DevOps as part of the Azure modules deployments.
When any employee of Microsoft creates a module that we're going to release, we have a pipeline that goes through and runs tests for those modules, so that we can release them to the public with confidence.
Eugene Chuvyrov: Since our team works directly with customers, Azure DevOps or DevOps tooling or CI/CD tooling of some kind always factors into the conversation along with infrastructure scope. Most frequently we use Azure DevOps for that just because, well, it's of course a first-party product.
Second, the amount of free stuff you get with Azure DevOps is very significant. There are unlimited repos, private repos, that we give you as part of Azure DevOps. For open-source projects, we'll allow you to run continuous builds and then up to a certain number of contributors in GitHub repos also are free.
This is what the interface looks like. This is the demo we are running downstairs at the booth. There's a YAML pipeline that defines the basic stages of the build. We defined some variables, set the staging state, and say, "This YAML pipeline will control what gets released."
How does this YAML look? What's important is we have a Terraform task. Everything in Azure DevOps is centered around tasks, and there's a Terraform task that performs basic things for you. In this particular case, it says Terraform Init, we pass on certain things, certain variables, initialize Terraform.
The next task does terraform plan
, where we also pass in some variables and get the output back. And then finally we do the terraform apply
also via the tasks. As a result of this deployment you will have infrastructure deployed to Azure in your subscription.
As Mark mentioned, we have adopted this similar approach for testing the modules that we develop. Many of you are familiar with Terraform Registry.
Generally we try to stick to that, and I know we could do better about sticking to it. It's introduced in the testing for all the modules that were released. As part of the Azure DevOps, we're not deploying any infrastructure, essentially.
We're writing a set of tests that we are executing in a Docker container against the infrastructure we define, and we provide the output of succeed or fail. That's also available via Azure DevOps, and I can definitely go in-depth on how we do this at our booth downstairs.
Thank you for coming.
Mark Gray: Yeah. Thank you, everyone.