Agile with compliance: Terraform and Vault at Deutsche Börse AG
See how Deutsche Börse built compliance as code and DevSecOps into their cloud deployment workflows with Terraform and Vault Enterprise.
» Transcript
My name is Christian Tuffers. I've been working at Deutsche Borse since 2016. I'm the co-founder of the Cloud Center of Excellence at Deutsche Borse. Before that, I spent a long time with Accenture. I've been in the public cloud area for 10 years.
I have the opportunity to drive the Deutsche Borse into the public cloud. Many of you might know the Deutsche Borse. If you still watch traditional television shortly before eight, you have this buzzer before eight start — and you still see in the background the traditional floor trading. That's, to be honest, only for clover because all the trading today is fully digitalized.
» About Deutsche Borse Group (DBG)
We, at Deutsche Borse, see ourselves as a market infrastructure provider. We provide the markets — the most known ones are Xetra, which is for securities trading, and Eurex, which is for derivatives trading. But we also operate the largest European Energy Exchange with EPEX and EEX. We also have the largest gas markets, also a lot of different trading venues, foreign exchanges, and so on.
Trading is at the core of our business, but we also have what we call pre-trade and post-trade to have an end-to-end value chain. Pre-trade means we want to give our customers all the information they need to make their trading strategies. This is a lot about data, data analytics, having access to real-time data, and access to pre-calculated indices. The DAX is the most known index but we calculate over 7,000 indices in real-time, all the time.
Then we have this post-trade area — everything which comes after the trading has happened. This is about settlement, about custody. This is about the value-added services we provide to our customers — fund services, liquidity management, and collateral management.
» DBG IT roadman: Cloud is key
Everything is powered by IT. As you can imagine, a trading venue is not a common business. There aren’t that many trading venues in the world. Most of the applications are custom-built. These are not the kind of software packages you can buy from the market.
» Public cloud offers proven benefits
We are a heavy IT company. So the cloud was a super important piece for us to look at. In the beginning, when we started like many people looking at it more from the agility but also cost perspective. Interesting enough, also quality — there's a nice example which I'll tell later.
But for us, the cloud is the basis for innovation topics. If you look back at the value chain which we had — so the trading, the pre and the post-trade. The growth we are currently doing is not necessarily in the trading part because trading is a fixed business from this perspective. At the moment, we grow a lot by acquisitions. If you follow the news, there's a public offer out there to take over another company.
There’is a lot in this pre-trade and this post-trade. This is a lot about data and analytics. For example, enriching what we can give to the clients, enriching that with machine learning information, or having a strong analytics platform where the customers can directly access our feeds and conduct analytics on the fly. This is super powerful.
» Public cloud lays the foundation for innovation
It’s the same for the post-trade. If you have services — things like fraud detection, tax services, whatever, liquidity management. If you can power that with AI and machine learning it can be very powerful.
Then a hot topic at the moment, digital assets, blockchain. A lot of innovation areas at the moment are happening about tokenization of assets and having a digital as a trading venue.
Deutsche Borse will not close their datacenters, so we are not going all in. If you see the value chain, the core trading part today still — from a technology perspective — it’s not possible yet to put it in the cloud. We require a latency that so far cannot be matched by the cloud service providers. We are always in very close collaboration with them because they like technology changes and challenges. But at the moment, there are still some significant differences that we would require to run Xetra, for example, in a public cloud.
Having said that, we use the cloud for testing. Especially for our flagship trading platform, we spin up hundreds of instances every night using spot instances to run random tests to ensure the system's quality — so quality has become an important driver for cloud adoption for us as well.
» DGB’s way into the cloud
I think I mentioned this morning — we are, I guess, similar to many other companies in adopting the public cloud journey. We really started in 2016 onboarding the first cloud service provider. Obviously, in IT, the benefit of cloud was quickly visible — so a lot of developers started spinning up development test instances there.
We adopted the additional hyperscalers. Today we have all three US-based hyperscalers in our portfolio. From the beginning we put a strong emphasis on infrastructure as code first with the cloud-native flavors. But then also Terraform Community in order to create landing zone concepts for them to follow.
Then we followed the normal stuff. We started then with the production workload. The first really big milestone was for us to have material workload in the cloud; We are a highly regulated business. We are underwriters, for example, critical infrastructure. To push something out which is part of a core business process requires a lot of approvals inside the company but also close collaboration with our regulators.
So, it was less a technical challenge but more a process challenge to establish this process to get approvals from everyone — to understand how we can notify regulators, and so on. At the end of last year we also announced a strategic partnership with Google. This will be our main cloud service provider going forward, where a lot of the innovation topics will be put.
We are not getting rid of the multi-cloud strategy; we will still use multi-cloud. We also need to have an exit strategy for regulatory purposes, for example, but going forward our strategic partner will be Google.
» Collaborative cloud audit group
A lot of topics here are around security and compliance. One thing which is not technical but which helped us a lot to be where we are today is the Collaborative Cloud Audit Group. This is something that we initiated in 2017 for the first time.
We spoke with our regulators and said we want to make use of public cloud — US-based hyperscalers. They were skeptical, but not negative about it. They were really supportive, but they said whatever you do, you need to stay in control. You need to have full control of the outsourcing chain, which means you also need to have an unrestricted right to audit — and execute that unrestricted right to audit.
You can imagine when we went to an American hyperscaler for the first time and said we want to conduct a fully-fledged audit.— That was not compatible with their culture, I would say. We had in-depth discussions and workshops with the cloud service providers and our regulator — and we created this concept which means we founded a group of finance institutes here in Germany. In the meantime, Dutch, Italian, and Spanish banks have also joined to conduct joint audits on the cloud service providers.
This also helps us because we are a pretty small institute, and if we have to audit someone like AWS or Microsoft, the audit scope would be far too big for us. Now, in this collaborative audit group, we have over 40 finance institutes, and they're running on a yearly basis. They conduct audits with all the three hyperscalers. And, as everyone is participating and is taking over certain control domains, we can have a fully-fledged audit. So, we really demonstrate to our regulator that we are in control.
This is not so nice for me because, as Cloud Center of Excellence, I'm the first line of defense. Therefore I then inherit all the findings into my team. So, I then have to chase the cloud service providers to get that fixed, whatever we find. But it works really nicely.
» Our cloud control framework
You’ve already seen regulatory requirements is a big topic for us. We are part of the critical infrastructure. We are under very close scrutiny or observation, especially by the BaFin, because we own banking licenses in Germany with our Clearstream bank but also with our Eurex clearing. We also have the Luxembourg regulator, which is the CSSF, because we operate Clearstream in Luxembourg. And everything is under the European Banking Authority — EBA authority. This is only a small set; there's more.
We also have the MAS in Singapore. We have FinMa in Switzerland. So, a lot of different regulators which we have to somehow satisfy. If you look at their requirements, then combine it with the common industry frameworks which have out there for the long term. We've focused a lot on the ISO 27000. Now we are moving more to CCM because that's more compatible with what the cloud service providers have as well. Then together with our organization-specific, this defines our cloud control framework.
In this cloud control framework, you always have these two aspects. This is the security and compliance of the cloud, which is the responsibility of the cloud service provider and where we have now with the Collaborative Cloud Audit Group. Also, a good vehicle to monitor that on top of the reports they're providing, on a regular basis.
The second part on the right is about security and compliance in the cloud — what we are doing in the cloud where we are responsible for. This means we have to bake cloud into all the normal day-to-day IT operations on one side. And on the other side, we have to develop the platforms with the corresponding policy configurations. But then —and this is the focus here — is to also have a standardized way of how we want to deploy into the cloud — how we want to do DevSecOps.
» Cloud control plane
This has then been translated into what we call the cloud control plane. This is a little bit of a flow of from the left to the right — so from change over to run. This is all what we would like — or what we already provide — to all the different people. There will be infrastructure as code blueprints. There are base images that we are providing as part of our image bakery. There are reference architectures, and then there's the important part which was already alluded to — the policy as code.
This is all happening before something actually gets deployed. Then the deployment itself is happening where a lot of security topics are covered. Then you have whatever comes afterward with, for example, drift detection, logging audit, and monitoring our cloud estate with the CNET solution. Everything is then supported by a common observability layer. Not everything is in place today at Deutsche Borse. We are working on it, but we are getting better and better, I would say.
» Platform building blocks
This is a different viewpoint. It's also alluding to what Dave mentioned before — really leveraging up the maturity of cloud adoption. I would say similar to many, we started a little bit blue-eyeish in the cloud adoption.
Everyone still wants to first get traction into the cloud and wants to learn things. On one side, then cloud service providers have also adopted certain things — for example, how they handle networks, identity, and access management in the cloud. But also newer services are coming out in the security domain —also things like Terraform Enterprise, for example.
This has really driven the notion that we also, from a Cloud Center of Excellence, have to change. Where in the past, we were merely a provider of giving cloud access to the different teams and making sure everyone can jump into the cloud and start developing and testing, and putting workloads in there. Now it's really about having a platform view. Having platforms for change, a standardized delivery factory, standardized delivery vehicles, standardized tool set.
But then also in the different areas of infrastructure as a service — platform as a service. Also moving into a much more platform approach. This is pretty much in place. We are also now working on the next step — we want to have a centralized platform for dealing with recovery situations.
Today, we still leave it for individual applications to decide on their backup and recovery strategies. But going forward, I think it becomes obvious that this doesn't make too much sense. Because if one application is falling over to one region, and the other one to the other region, then afterwards, we are not able to connect back to each other. It makes much more sense to have this as a centralized vehicle.
» Centralized provisioning
When we started the partnership with Google, this was also an opportunity for us to change the rules of playing in the cloud. On one side because now we have a large transformational program which has been set up as part of this partnership. NI would also say the awareness, security, and compliance is much different than it was a couple of years ago. In the beginning, I think many of you might know it, cloud was a little bit seen as a playground for IT — for developers. Now there's a broad interest to have this on the highest level of security and compliance.
We said from the beginning that whatever goes into the cloud in the future has to go via a standardized toolchain. In our case, it means we put Terraform Enterprise in here because — you are the experts, you know it — it has the large advantage that we can have a policy check at that point in time.
So, before the things actually reach the cloud, we can already check if certain rules are fulfilled. As you might know, once it's in the cloud and you want to change things, you have to run behind the people. Of course, you can always threaten —Look, if you don't do it, we’ll turn down your machine, or we’ll put a deny policy on top — or whatever.
But then, do you really want to do it in production and be called at night if there's an incident? So, we had some tough discussions on these topics. It's much better to prevent it before it reaches the cloud, and this is what Sentinel policies help us to do. A lot of development effort is now going into the development of this the policies.
I think the other big topic for us— and we saw that mentioned by Dave before. When you have some history in the cloud — a history of service accounts, service principals, and all this stuff — and it's pretty messy. Especially when you were not very strict at the beginning to have tight control over how you manage this stuff.
Now it's getting better because the tooling is getting better. You can onboard them with a lot of tools. But having things like HashiCorp Vault in place from the beginning would have saved us a lot of headaches, I would say. Where today, we have legacy in the cloud, it's like this.
Then super nice is drift detection. We very often have regulators with us, and I sit in meetings with them have to explain how asset management works and the cloud, and so on. Something they like: is what they call in German the [foreign language 00:18:34]. I tried to translate it into English, but it's pretty difficult. I'm not sure if there's an exact notion of it, but it’s to compare what you would like to have versus the reality. This is a fundamental principle for the. If you can demonstrate this to them that you have a working [foreign language 00:18:52], a lot of the topics are already smoothed out with the regulators.
Drift detection is, for me, the answer to that. Drift detection allows me now, here's the code, here's the [foreign language 00:19:03], this is how it should be — and the reality of how it is in the cloud. Drift detection will show me if there's any deviation. This, for us, is a super nice feature.
» Policy as code
For the policies, there are different flavors you can use. Our focus at the moment is mainly on one and four. One is really about making sure the infrastructure and services we are providing in the cloud are following certain standards and definitions — or security baselines. Four is where we would try to codify certain regulatory policies. For example, the most simple one is that certain applications can only live in the EU, for data residency purposes. This, at the moment, is the focus.
There can be more. In the future, five could be interesting. For example when we start going more into these FinOps practices to make sure that whatever's provisioned there is also then a fitting from a cost perspective.
» Policy as code benefits
Having policy as code is obviously a super benefit. In the past, what happens if policies are written as Word documents, or in Excel files? It's very cumbersome for people to understand what's meant by the policies, how to translate it into code, and also how to check if you're compliant or not.
For me, the biggest benefit for policy as code is that I have this also in GitHub; it's publicly available for everyone. There's a clear process of who is allowed to change these policies. We also have clear version control in there — and everyone has access and can download these policies or play or run their stuff against it.
» Temporary credentials
I think this is the future — I loved the announcement today. Having, for example, Vault as a managed system where you have only these secrets in there without wanting Vault infrastructure. I think that's super key. I don't want to have this hassle with all the credentials because we, as a platform, are getting called up if there's credential exposure or not. It always comes down to the Cloud Center of Excellence — Hey what did you do? What mess did you create? And this is, for me, also super important.
» Drift detection
Whenever there is a drift, you have to decide what you do with it. Sometimes there can be a good reason for it. Perhaps there was an incident, and someone didn’t have the time to go through the full chain — had just to make some changes in production, all fine.
But whatever we do, whenever there is a drift in the first case, it's creating a security incident. It's then up with the SOC and CERT team to chase — and probably then it can be solved. But with this approach, it's always ensured that nothing gets undetected. There's a clear process if someone has an incident and he flags this incident afterward to get it back into the normal development cycle.
» Lessons learned and next steps
» Training and communications
I think training and communications are super important from the beginning. I mean, we already had many Terraform skills in the company. A lot of teams were already adopting Terraform open source, but there's still a shift to move to Terraform Enterprise. There you have to take everyone with you. Also, there are other tools out there. I mean, people also like to use other tools. When we made this announcement that — sorry, you have to stop now. Then you have to offer a lot of training to them to get them onboarded as well.
» Standardize the workflow
We had intense discussions. What's the best branching strategy, the best work workspace approach, and so on. It's important to not give too many choices. Otherwise, you end up again with a hassle of different options and set-up configurations that you don't want. I see some smiling here — I guess a lot of similar experiences.
» A robust platform
I work With Lighthouse Application Teams. It's always important. And we put a lot of emphasis on making sure the platform itself is super robust. We still use enterprise today — we have Terraform Enterprise and Vault Enterprise. We are thinking now about having a power-led track, purely SaaS-based. Our reasoning is that on one side we have the super highly regulated stuff, which is then covered by the enterprise stuff.
But we have a lot of acquisitions, a lot of smaller companies that are pretty independent from the IT perspective. So, they're not integrated into our networks, for example, and identity access management integrations are also not so super. We are thinking about onboarding them on a cloud-based chain, and still giving them the opportunity to leverage the same policies: For example, leverage the same approach for everyone, but give them a little bit more freedom on the other hand.
» Central capacity to help onboarding
You cannot just set up the platform and then tell the teams off your go now. So, we really established a core team — with Accenture in this case — which is always there and helping us onboard all the different product IT teams.
This is where we are todaye there are always things we can improve on. There are a couple of things also on the automation of drift detection, which has recently gone live. We have this with the SOC and CERT team now, but there's more automation we would like to do. For example, identifying the asset owners, the right support teams, and so on.
We are still continuing to roll this out into our organization, so we have all the different teams of all the different areas onboarded on this one. And we are still thinking about having additional features in there like image scanning, for example. So a couple of things where we can still improve. I think it's exciting times and we are pretty well on our cloud journey and always looking forward for the next steps. Thanks.
» Q&A Session:
Emcee:
I think we can allow a few questions — one or two, before we go into the lunch break
Audience member 1:
You still want to use some cloud-native capabilities, but what is your vision on being open enough for potential vendor changes in the future, but also not having to completely reinvent the deal of being too locked in?
Christian Tuffers:
To repeat the question. How do we balance multi-cloud and still have cloud-native tooling?
At the moment, I would say all three clouds are pretty equal. We have had this announcement with Google recently, so we are now pushing stronger into Google, but so far, it was pretty even, I would say. For me especially, I would love to make this control plane around these cloud platforms. Then I can have Google in there, I can have Microsoft in there, and I can have Adobe is in there. When I have this shift left and a standardized toolset for everyone on one side. On the other side, everything feeds into a common security posture, a security monitoring solution, or into a cloud-based, task-based team solution.
Then on the other side, I'm also capturing against everything. Then if I have this, then in the middle, I can be more flexible, and I can also exchange. Then people can also have their own GCP organization; they can have their own Azure tenant or whatever, as long as they follow the principles.
Audience member 2:
You said you're multi-cloud, and you also have something in your own DTs?
I heard that BaFin is very sensitive about not having everything in one Configuration Management Database. How do you handle this, feeding everything to one CMDB?
Christian Tuffers:
Did you say they don't want it in one single center?
Speaker 5:
No, they're very sensitive — they want it.
Christian Tuffers:
Exactly, that's correct — that's true. We have a central Configuration Asset Management Database. At the moment, we are — it has been a long discussion and quite some development effort —feeding event-driven, all the information, whatever happens in the cloud, into a central configuration management system.
I'm not such a big fan of it, and I would love to have a more cloud-native approach going forward, perhaps. But at the moment, at least, this satisfies the requirement. Let's say once you have satisfied the requirements, and ticked some boxes, now we have time to think a little bit more strategically about what we want to do. I think that leveraging what Cloudnet providers have as asset management solutions might also be a good option going forward at least.
Audience member 3:
You offer a lot of services: How do you offer these services to your application teams? Do you have any portals, like backstage or something else, that they can use these platform capabilities?
Christian Tuffers:
No, not so much. We have the normal communication channels, pretty standard — I don't know, SharePoint, Teams, channels, or whatever. But what the teams can do in the cloud is pretty much up to them. We don't offer them a service where they can order a machine or whatever. They have access to Terraform Enterprise with a private module registry, for example, with storage bucket modules — or whatever. This is in a kind of a service catalog for the things they can do in the cloud.
We don't want anyone that wants to do something cloud-related to have to come to us because then we would be a super bottleneck. They should have certain freedoms; they can also develop their own modules. Then we do some checking, but they have the possibility to also develop their solutions freely.
There is a service-white list, of course. We're not allowing all services depending on the application criticality. There is some risk assessment we do. What encryption is supported? Is it a global/local service? Whatever — typical things, but that's it.
Emcee
Last question.
Audience member 4:
First of all, thank you very much for the very interesting presentation. You are working in a very regulated market. I can imagine enabling cloud services in such a market is a bit of moving the mountains a meter to the left. You mentioned, if I remember right, a collaborative audit group. Could you explain more about how that works and how you enabled that to get the regulatory parts on board?
Christian Tuffers:
This was a major achievement, and it's a win-win-win situation. On one side, for the cloud service providers, because not all the finance institutes are knocking on the door saying we want to have an audit.
On the other side, for us because, as mentioned, from a sizing perspective — we also have some auditors. But they're occupied with other audits, and we cannot send them all to audit the cloud service providers, which are super large.
And a third also for the regulator itself. The regulator, like BanFin, has a high interest in having a consistent view. If we do an audit with Microsoft and Deutsche Bank is doing an audit, it would not be completely in sync. It does not really help. So, we organized this kind of workshop with a regulator, cloud service provider, and a finance institute to come up with this concept.
Now, there's a natural interest for every finance institute because you're obliged to execute this, and to do this — you have to do it. This is self-running. There's always a chair, and I think each year it changes which finance institute is sitting on top. Then there's always a question: Which finance institutes want to participate in the Google audit? Who wants to participate in the AWS audit?
They find together, they define the audit scope, and in the meantime, also define a common methodology. That's very important because I think auditors are like architects — put three in a room, and everyone has a different view of what they want to do and how they do it. But they agreed on a common approach.
Also, the cloud service providers get used to it. It's a common process now. Every year, there's an AWS audit, a Google audit, and a Microsoft audit. There can also be audits for additional ones if there's sufficient interest. If someone wants to audit Oracle Cloud or IBM Cloud and there are sufficient participants in there, this is all covered in the same framework. As mentioned, in the meantime, I think about 50 institutes in all the big banks are in there.
Emcee:
Christian, many thanks for all those insights you shared with us. I think it's been super interesting. Big round of applause for Christian, please.
Christian Tuffers:
Thanks.