How AGL Uses HashiCorp Terraform Enterprise and Sentinel to Enable Cloud Native Capabilities
This is a guest blog case study written by Lachlan White. He works as a DevOps Architect at AGL Energy, Australia’s largest private developer of renewable energy assets. Lachlan will be speaking at HashiDays Sydney (April 6-7), HashiCorp's first-ever conference in the Asia Pacific region.
Like most large enterprises, AGL has a long history of using data centers as a leading Australian energy provider. We began to make our shift to the cloud about five years ago. Then, we became a launch customer of Microsoft Azure in Australia. We had been experimenting with Cloud infrastructure in their South-East Asia location as a proof-of-concept, but after the launch in Australia, we quickly became one of the largest consumers of Azure in the southern hemisphere.
Our initial move into the cloud was like many others. It’s exciting with so much opportunity for innovation and learning. So, our teams tried to go at it, all at once. The result was the inevitable accumulation of technical debt, differences in practice, and battles for control, which reduced delivery and extended lead times. This also meant no enterprise-wide endorsed best practices for how to provision, secure, connect, or run applications and infrastructure. That is when we decided that we needed to take control of managing these moving parts ourselves.
» Establishing Consistency
Three years ago, we started creating an in-house capability in the cloud, and defined how that capability would enable a better experience for our customers. We launched our Customer Experience Transformation (CXT), one of the largest ever customer transformation programs for a utilities company in Australia. As one of the largest electricity and gas providers in the country, we are always looking at how we can disrupt the industry with technology for the benefit of our customers. We are constantly focused on how iterating our digital footprint can enable an improved customer experience, and the use of cloud technology as a delivery vehicle gives us a large amount of flexibility in how we achieve this.
As the program finished the results were positive for our customers. However, internally, we had varying levels of maturity with regards to cloud technology. Our digital teams were far ahead, while some of our traditional infrastructure teams had not progressed as far. We then started to look for a way in which we could build on top of the lessons of our Customer Experience Transformation, but also enable quicker and more mature adoption of cloud technologies.
» Consistency at Scale
Last year we started the AAA Program (Automate, Accelerate, AGL). It was created to look at how we could use the cloud at a large scale, but in a much more efficient and collaborative way than we previously had.
HashiCorp products helped us do that. We realized that while we have a large footprint in one cloud provider, we also have a small footprint in Amazon Web Services and there is a difference in terms of the skill set needed in both of those. We wanted a way to minimize that difference in a strategic way, which is why we landed on using Terraform.
Automating the provisioning of infrastructure at the scale of a utility company isn’t an easy undertaking. We have a number of business-critical systems with high uptime requirements. Terraform is supporting us in provisioning all of our infrastructure, from the tiniest APIs to our largest SAP systems. By using tools such as Terraform and Packer, we have been able to achieve consistency of delivery by creating reusable modules that can be used at scale across the enterprise. It has enabled us to be free from the bottlenecks of hardware and poorly configured environments by enabling the consumption of cloud at scale, without compromising on consistency.
» Constantly Evolving Best Practices
When we started the acceleration project, the core technical team had about 10 to 15 people who were trying to build out this new way of working. We had a strong relationship with Microsoft throughout the process, but were really pushing the boundaries of its products. The team was a mix of people with deep Microsoft backgrounds as well as others with a lot of AWS experience, leading to interesting discussions about “Amazon fixes this problem in this way” and “Microsoft fixes it in this way”. This led us to ask “but what’s the best way?” Is there an abstraction to achieve those goals?
Sentinel was one of the ways in which we were able to realize the kind of capability that we wanted.
A simple example that Terraform Enterprise showcases in its documentation is how to ensure “tags” on resources:
import "tfplan"
main = rule {
all tfplan.resources as r {
r.attr contains "tags" and
length(r.attr.tags) > 0
}
}
» Confidence in Compliance
We also have a body of people involved in approving the use of services. Each service goes through a process to ensure it meets our security and compliance needs before it is rubber-stamped to show it meets our enterprise standards. Our developers are always going to be more aware of new features and services that are announced by cloud providers, and they want to use them as soon as possible if it’s going to provide additional capability.
Our previous way of working meant that there was a long process of manual reviews and meetings to ensure a service was up to AGL standards. When a service was approved, it was really up to the engineers or developers deploying to services to keep it to standard, constantly referring to the list of standards and approved services that can be utilized on the cloud. From a cultural perspective, it’s not an approach that matched how we want to deliver capability. Instead, we’ve whitelisted all approved Terraform Modules within a Sentinel policy.
For example, we want to ensure a team deploys the approved module for a web app in Azure. We can do this with the below policy:
import "tfconfig"
import "strings"
// CONSTANTS
tfe_url = “mycompany.domainname.com.au”
valid_module_source = [
tfe_url + “/myCompany/infrastructure/azurerm/modules/web-app”
]
// RULES
check_valid_modules = rule {
all tfconfig.modules as _, module {
Valid_module_source contains module.source
}
}
// RUN
main = rule {
check_valid_modules
}
If our developers are only using pre-approved modules, then the policy passes without issue. If they’ve introduced a new resource that we’ve not yet approved then the policy check will fail with an error.
This workflow has enabled a central point of governance across our infrastructure deployments. It ensures that we have a single point of compliance no matter how the deployment is occurring. As long as Terraform Enterprise is called, our Sentinel Policies are enforced, and because these policies are driven through code it’s not a bottleneck. It’s all automated, so we get the compliance without the degradation in delivery time.
As you can see among other controls, we can use Sentinel to govern policy automatically for us on the deployment cycles through Terraform Enterprise. If we needed to enforce a similar policy across multiple Clouds, we would simply be able to add another module that looks at AWS Infrastructure.
» Collaboration and Centralization of Governance
If a new service has been introduced that our developers want to use, they can open up a pull request on the policy to have the new service assessed. If it meets our security and compliance requirements it will be included in the whitelisted set.
To ensure our developers and engineers can affect change, we need to uplift the operating model surrounding them. We open the doors in terms of who can contribute and then we centralize the governance of it.
This is really at the heart of what open source is, enabling everybody to contribute equally. Within large enterprises, there are controls that are needed, but if we can automate the management of the majority of these controls, we can enable an amazing culture and development experience. Taking advantage of tools like Terraform Enterprise enable us to do so at scale.
» Next Steps
We’ve spent the last year building capability in terms of how we want to build, develop, and innovate in cloud at AGL. The next few years are the fun part, we can start to build on top of these foundational capabilities and really reap the rewards. These benefits are passed onto our customers through quicker innovation and therefore quicker capability development.
From a technology point of view, I see a lot of value in continuing to use Sentinel. Taking advantage of the recent updates such as Cost Estimation will be a lovely addition to the governance pieces we are building.
There are some exciting pieces of tooling coming out of HashiCorp these days. Vault and Consul are very appealing to me and are definitely something I am playing around with at the moment.
But I’m still enjoying looking at ways to extend Terraform. The great thing about Terraform is if there’s not a provider that you want to use, you can build the provider. If the provider is there you can add to it. It’s just an amazing open source community. So, what’s next is really only limited to whatever your imagination is willing to drive.
Sign up for the latest HashiCorp news
More blog posts like this one
Fannie Mae’s process for developing policy as code with Terraform Enterprise and Sentinel
Learn how to implement the policy as code development lifecycle used in the highly regulated cloud environments at Fannie Mae.
New Terraform integrations with Crowdstrike, Datadog, JFrog, Red Hat, and more
12 new Terraform integrations from 9 partners provide more options to automate and secure cloud infrastructure management.
Terraform delivers launch-day support for Amazon S3 Tables, EKS Hybrid Nodes, and more at re:Invent
The Terraform provider for AWS now enables users to manage a variety of new services just announced at re:Invent.