Streamlining secrets management at Canva with HashiCorp Vault
One system to rule them all, and stop the sprawl.
Canva was using Amazon KMS and other secrets management tools, but that didn't stop them from having secret sprawl. Getting a secrets management tool is a requirement to better security, but you'll see very little security and developer experience improvement if you don't have a strategy to go along with the tools.
» The prior state of Canva
Sprawl: Secrets were scattered across hundreds of AWS accounts, 1password, encrypted app properties, and even some developer laptops. It's not a secure picture.
High-effort rotation: Annual secrets rotation was also a huge effort for the company, lowering engineering productivity when rotation time came around. (the sprawl was part of the cause here, but not the total cause)
"We had to stop what we were working on and divert engineers away from priorities."
Reuse (not the good kind): Some groups started using the same secret in multiple environments to reduce effort, but that introduced even more security risks along with a bigger outage blast radius.
Too many touchpoints: There were also a lot of unnecessary touchpoints (read: manual effort) and elevated permissions required in the process.
Audits took a lot of time
Hard to integrate: Previous solutions couldn't integrate with what they needed.
The results of the issues above are pretty obvious: Lots of lost hours of work, and lots of risk.
» Searching for a standard org secrets manager
That's when they centralized secrets management around HashiCorp Vault. It met all their needs in the checklist below:
In fact, Canva looked at three other secrets management vendors, and Vault was the only one that met all of their requirements.
» Canva with Vault
Once they adopted Vault, Canva platform engineers went to work preparing their security platform to provide an extremely reliable and easy developer experience. Testing and observability were very important.
They set up a dashboard to monitor Vault clusters based on SLO userflows and even implemented Vault chaos testing.
Why go through all this effort? It's all about building developer trust:
"We were on a mission to change the culture. For the changing culture to succeed, we needed our engineers to trust us, and the way that happens is by building good reliable products. You saw the road map and how much we focused on testing, observability, and chaos testing before even thinking about moving anything into the [Vault] cluster. That was all by design to ensure that we don't impact the trust once we begin migrating people's secrets onto our platform."
The best way to improve the developer experience for secrets management, is to remove its visibility and manual touchpoints from the experience. It should be invisible. The developer should barely realize they're managing secrets.
"They'll just get some sort of key with a click or two, and then plug that key into their target client. The secrets management system should take care of issuing the secret to the correct client, and integrate with a wide array of products and all the major cloud providers."
» Vault's impact at Canva: By the numbers
They closed a whole category of risk in the business by removing direct engineering access to secrets kept in Vault.
87.5% reduction in processes around secret provisioning. In the past it involved a 12-step runbook, talking to several teams, and hoping your permissions work across the board. Now you talk to one team and things just work.
1.2 million secrets issued by Vault in May 2024, and growing.
100% of secrets can be attributed back to an owner with access to a complete audit trail in seconds.
Greater developer, security, compliance, and auditing team satisfaction.