The coldest Monday with a $1 million cloud bill: Terraform to the rescue
How a cryptomining attack led to one startup's overhaul of its cost management systems with the help of Terraform.
Imagine this "cold monday." You're working at a small startup and you wake up to a $1 million+ bill from Google App Engine accrued over the weekend. What was it?
This was the experience at a former startup for the KubeCloud founder. A compromised service key led to a cryptomining attack.
» What was the issue?
While this talk doesn't look at the practices around preventing service key theft (hint: check out secrets management best practices), it does talk about the other guardrails you should have in place to put cloud cost monitoring and limitations in place. These were the issues the speaker established:
Poor cleanup processes for long-running resources (see: ephemeral workspaces)
Risky complexity due to the usage of many shell scripts
No org-wide restrictions on resource count and type (see: Sentinel)
Their first steps included setting billing alerts and converting some of those shell scripts to Terraform code, but the big fix was quotas.
» Quotas and Terraform
The startup started building a large matrix of cloud vendor quotas. But they needed something to manage all these quotas in an automated way.
In this talk you'll see a demo of how Terraform was used as a state engine to manage these quotas. While it's not a typical use case, it worked very effectively, essentially creating an in-house dashboard for cloud cost quota management and visibility.
Here's an example main.tf for this quota managing use case in Terraform: