Monitoring as Code with Terraform Cloud and Checkly
Learn how to use the HashiCorp Terraform Provider for Checkly to automate infrastructure-monitoring setup and configuration.
This guest post was written by Hannes Lenke, co-founder and CEO at Checkly, the active monitoring platform for developers.
HashiCorp Terraform Cloud enables you to seamlessly provision infrastructure as code, consolidating configurations in source control and bringing transparency and replayability to a previously manual workflow. The same approach can be used to define the way our APIs and web apps are monitored.
In this blog post, you will learn how to use Terraform together with Checkly to provision your monitoring setup as code.
» Manual Provisioning is a Bottleneck
But before we dive into how to use Terraform and Checkly together, it’s important to understand the issues created by manual provisioning of monitoring checks.
The story of the world’s fourth-largest retailer, Germany’s Schwarz Group, which operates the Lidl and Kaufland grocery store and hypermarket chains, mirrors that of many Checkly customers. At Schwarz Group, several teams relied on manual procedures to manage their monitoring checks across many websites and connected backends, even when Terraform was already being used to manage infrastructure. This clash of approaches presented multiple challenges.
» Handling Checks at Scale Produces Large Overheads
The need to provision monitoring checks for multiple large APIs and websites meant internal users had to spend large amounts of time going through repetitive manual flows. With changes being rolled out to the target applications on a daily basis, the cost was significant.
» Low Transparency Makes Cross-Team Collaboration Harder
Manual flows meant users had to create tickets in order to have new monitoring resources provisioned for them, or request permission in advance to apply the changes themselves. In turn, the central IT team needed to work through different UIs and flows based on the service provider, and the resulting monitoring configurations then lived on separate platforms.
This made it difficult to maintain consistency across the entire infrastructure while avoiding duplication of effort across teams. It also complicated the task of auditing changes, making it difficult to review wrongly configured monitoring checks, thereby lengthening an important feedback loop.
» Non-Agile Workflow Slows Delivery
Eventually, the speed of checks-provisioning could not match the pace at which the target applications were evolving. This was the result of a mismatch of approaches: the CI/CD workflow through which the websites and APIs were iterated upon on one side vs. the fully manual approach on the other. In response to these issues, Schwarz Group’s central services team went looking for an approach that mirrored the existing infrastructure-as-code (IaC) workflow.
» Monitoring as Code
Applying lessons learned from IaC, monitoring as code (MaC) brings check definitions closer to the source code of the application by having them written as code. A declarative approach means that the user does not need to specify how the provisioning happens, or which specific actions and calls need to be made, but rather what the final results should look like.
This method allows check definitions to live in source control, which in turn boosts cross-team visibility, as sharing access to a repository is often simpler and cheaper than sharing seats on different monitoring-service providers. Additionally, having a history detailing every change increases transparency and makes it easier to roll back changes in case of incidents.
With software such as Terraform taking over the provisioning of monitoring checks, hundreds or thousands of checks can be created or edited in a matter of seconds. This is a game-changer for development, operations, and DevOps teams, allowing them to reallocate time spent on manual configuration towards improving the coverage and robustness of their monitoring setup.
According to Andreas Lehr, Team Lead at Schwarz Group IT, "Checkly integrated with Terraform enables us to quickly create, modify, and deploy API and browser checks for a broad and diverse audience of internal customers. The codified workflow ensures full transparency, thanks to built-in auditing and documentation!"
To summarize, MaC is revolutionizing the way monitoring is configured by providing:
- Better scalability through faster, more efficient provisioning
- Increased transparency and easier rollbacks via source control
- Unification of previously fragmented processes in a CI/CD workflow
Just like the Schwarz Group, any Terraform user can reap the benefits of monitoring as code. In the next section of this post, we will guide you through configuring a MaC setup with Terraform Cloud and Checkly.
» The HashiCorp Terraform Verified Provider for Checkly
The HashiCorp Terraform Verified Provider for Checkly allows users to configure API and synthetic monitoring checks as part of their existing infrastructure codebase. These checks then run on a schedule or on-demand to monitor single functionalities or end-to-end user scenarios over time, alerting the responsible contacts as soon as any misbehavior is detected.
» Monitoring APIs as Code
Here’s how it works: A Checkly API check makes an HTTP request to an API endpoint and examines the response, ensuring it is both correct and quick enough, according to parameters specified by the user. If these conditions are not met, the user is alerted through channels such as OpsGenie, PagerDuty, Slack, and SMS.
As an example, let’s create an API check against a demo website. The goal is to ensure the users of the webshop can request a list of available books. The first step is to add the Checkly Terraform provider, which we will use to define every aspect of the check, to our Terraform file. In this tutorial, we will do all our work in the main.tf
file:
variable "checkly_api_key" {}
terraform {
required_providers {
checkly = {
source = "checkly/checkly"
version = "0.8.1"
}
}
}
provider "checkly" {
api_key = var.checkly_api_key
}
We also need to add a resource for the API check. Let's keep things simple and specify a few key parameters, including the name, schedule, locations, and assertions:
resource "checkly_check" "webstore-list-books" {
name = "list-books"
type = "API"
activated = true
should_fail = false
frequency = 1
double_check = true
ssl_check = true
use_global_alert_settings = true
degraded_response_time = 5000
max_response_time = 10000
locations = [
"eu-central-1",
"us-west-1"
]
request {
url = "https://danube-webshop.herokuapp.com/api/books"
follow_redirects = true
assertion {
source = "STATUS_CODE"
comparison = "EQUALS"
target = "200"
}
assertion {
source = "JSON_BODY"
property = "$.length"
comparison = "EQUALS"
target = "30"
}
}
}
We have two assertions against the response here:
- We are asserting that the status code is 200
- We are checking the number of items returned as part of our response to make sure all the data we expect is being sent back.
Terraform users can have changes in a configuration stored in GitHub automatically applied to a linked Terraform Cloud workspace as soon as they are merged into the master branch. We want every successful push to master on our Git repository to be automatically applied to our Terraform Cloud workspace. For this reason, under Settings > General
, our plan is set to Auto apply
.
The Apply Method section of the Terraform Cloud workspace's General Settings.
We also need to create a free account on Checkly. Once that is done, we can fetch our Checkly API key from our Checkly Account Settings…
...and add it as an environment variable on Terraform Cloud, under the Variables section, with the key TF_VAR_checkly_api_key
:
We can now commit our changes. As soon as we have them merged into the master branch, the current run will appear on our Terraform Cloud dashboard:
Once that is done, we will see our new API check appear on our Checkly dashboard:
The check will now run every minute, monitoring the status of our endpoint from the locations we selected. Should it fail, it will immediately alert us on our channel(s) of choice:
» Monitoring E2E scenarios
In order to make sure our web app is functional for end users, we need to monitor key user journeys on our frontend as well. Checkly leverages Puppeteer and Playwright to automatically go through the most important flows of your web app, just like a user would. As soon as one breaks, it will alert you, just like with API checks.
Let's look at an example: we want to keep an eye on the login flow of our online bookstore, so we write or record the following script using Playwright:
const { chromium } = require("playwright");
const browser = await chromium.launch();
const context = await browser.newContext();
const page = await context.newPage();
await page.goto("https://danube-webshop.herokuapp.com/");
await page.click("#login");
await page.type("#n-email", "user@email.com");
await page.type("#n-password2", "supersecure1");
await page.click("#goto-signin-btn");
await page.waitForSelector("#login-message", { visible: true });
await browser.close();
Let's save the file in scripts/login.js
, and then reference it in our main.tf
file:
resource "checkly_check" "login" {
name = "Login Flow"
type = "BROWSER"
activated = true
should_fail = false
frequency = 10
double_check = true
ssl_check = false
use_global_alert_settings = true
locations = [
"us-west-1",
"eu-central-1"
]
script = file("${path.module}/scripts/login.js")
}
Let's also commit and merge these changes to have them reflected on Checkly:
Our check is now fully configured and will run every 10 minutes as specified, informing us if anything goes wrong with our login flow.
We can keep going and add as many checks as we want: Checkly and Terraform Cloud scale seamlessly together, and many users manage thousands of checks.
» Conclusion
Monitoring as code extends the benefits of infrastructure as code to monitoring, making complex real-world infrastructure more resilient and delivering a better end-user experience through better observability.
As shown in the Schwarz Group example, MaC with Terraform Cloud and Checkly help improve reliability through an efficient and transparent workflow, which is a force multiplier for large IT organizations.
That is it. No matter the size of your team or business, you now know all you need to get started and combine the power of Terraform Cloud and the Checkly Terraform provider. Happy monitoring!
Sign up for the latest HashiCorp news
More blog posts like this one
Fannie Mae’s process for developing policy as code with Terraform Enterprise and Sentinel
Learn how to implement the policy as code development lifecycle used in the highly regulated cloud environments at Fannie Mae.
New Terraform integrations with Crowdstrike, Datadog, JFrog, Red Hat, and more
12 new Terraform integrations from 9 partners provide more options to automate and secure cloud infrastructure management.
Terraform delivers launch-day support for Amazon S3 Tables, EKS Hybrid Nodes, and more at re:Invent
The Terraform provider for AWS now enables users to manage a variety of new services just announced at re:Invent.