How Deutsche Börse uses Terraform Enterprise to accelerate cloud migration speed and improve security posture
See how Deutsche Börse organized its business-led cloud transformation and used Terraform Enterprise to execute it.
» Transcript
After being here last year, I've already given a glimpse of what we are doing at Deutsche Börse. Also, thanks to Susan for setting the stage because I think I would like to talk about is exactly how to get from the 78% who have adopted cloud into the 10% who are now getting the business value out of it. Of course, we also want to be a part of the 8% who do it right.
I think this is our ambition. We at Deutsche Börse see ourselves—on one side, yes, we are a financial services company—but we also see ourselves as a technology company. Technology is in our DNA. If you look at all our core business processes around trading, clearing, and settlement, everything is fully digitized and running on application-purpose, specific, custom-developed systems that we develop and operate.
» Introducing Deutsche Börse Group
My name is Christian Tueffers. I'm responsible for the cloud center of excellence at Deutsche Börse, and I had the pleasure to join this cloud journey from day one. Quickly, I mean, most of you will know Deutsche Börse because we are the largest central stock exchange here in Germany.
I think it’s important to understand that financial trading is one part of the trading platform. Of course, it's our core, the trading areas. But if you look at the last couple of years, other trading areas have really emerged.
We operate the largest power exchange in Europe, for example. Then, all the topics around digital assets, crypto, and FX trading. These are also all platforms we try to accommodate. We really want to provide a broad, very diverse, portfolio to our customers where they can trade all asset classes.
Next to the trading, which is the core of our systems, we have what we call pre-trade. This is what you see on the left upper side. This provides our customers with all the data and information they need to implement their proper trading strategies. The most known one is that we provide the indices—for example, the DAX 40. But we have over 7,000 real-time indices, which are calculated all the time. We also want to provide real-time data to our customers. A lot of our business is providing our customers with access to up-to-date, real-time information.
Then, on the lower part, we have what we call post-trade. Everything that happens after the trade has been executed, trades have to be settled. We also have an application called Vault, which is about vaulting the physical securities as well.
It gets digitized, of course, going forward. But a lot of Vault is also giving our customers the ability to manage all their collateral and liquidity and provide value-added services around taxes, tax calculation, and other kinds of things. We have the ambition to provide an end-to-end value chain to our customers. And, as mentioned, information technology, or IT, is at the heart of all these systems.
» Our multi-year cloud strategy
I already showed this a bit last year. This is our journey into the cloud. We started in 2016. I think this looks pretty familiar to you guys. Typically, you start with a proof of concept. We shifted some development and test workloads into the cloud, and we started doing some multi-cloud activities very early, as well.
I think we did a lot of things wrong as well. There were a lot of lessons learned. That's quite natural. I think we also did a couple of things right. We established a cloud center of excellence at a very early stage. We called it Cloud Team at the beginning. Later on, it was renamed. From the beginning, we said access to the cloud is via predefined landing zones. So, we provisioned the landing zones. We enabled the networking, we baked in the security. And then we hand this over to the different teams, which then, according to their roles and rights, can provision the cloud infrastructure.
We were super early adopters of Terraform at that point in time within our team, and we also spread this into the organization. This was quite helpful later on when we decided to move to Terraform Enterprise because there was already a pretty good broad overall knowledge of Terraform in the company. Also to highlight we are pretty proud of initiating the collaborative cloud audit group, which, for us, was fundamental to moving regulated workloads into the cloud.
What does it mean for us? In the meantime, in this collaborative cloud audit group, there are over 40 financial services institutes across Europe, and they jointly do full-fledged audits of cloud service providers. We now have an audit scheme. We audit all three Hyperscalers every year, and then, obviously, we work with all the audit items we find.
These six, seven years we have there were mainly driven by IT. So, it was really IT for IT. We, in the cloud center of excellence, had to do a lot of legwork and groundwork, speaking to the different teams: What's the advantage of moving to the cloud? Why do you move? Do you have security concerns? This is how we can help you. The approach was, from that perspective, very opportunistic. It's more of a bottom-up approach; and with this, we got around 30% of our workloads in the cloud.
I think, honestly if we had continued this way—yes, we could have achieved a bit more, but probably not much more. That's why there was a strategic decision at the beginning of last year where we announced we now have a strategic partnership with Google Cloud.
A strategic partnership is not only announcing that you work together, but there's also a commercial construct behind it. On one side, we commit to putting large parts of our workloads into Google Cloud. On the other side, we get back discounts, that’s clear. But also, on the other side, a large investment into driving our next transformation.
» Business-led cloud transformation
So, this was the kickoff of our Hyperion transformation program. The big difference is —and this is the link back to what Susan mentioned—we are not speaking about a bottom-up IT for IT kind of transformation, now we have a business-led transformation. Hyperion as a program is really steered from the top. Our CEO now has quarterly reviews with the Cloud CEO from Google, and the program management also sits directly under the CEO.
» Cloud migration
This is also reflected in the structure of the transformation program. We have mainly four work streams, and only one of them is still only IT—and this is the first one, which I'm also heading. This is the cloud migration work stream. This is, I would say, the bread and butter work stream.
This is what we need to do to get to the commit. This is where we have established a migration factory where we systematically take all the applications, analyze them, and put them into the Google Cloud. With that, we have the ambition to move over 70% of our complete IT estate into the cloud overall.
Quick side note: Why is it not 100%? If you look at our IT landscape, we still operate ultra-low latency trading systems. And as of today, from a technical perspective, you are not yet able to put that into the public cloud. We work heavily with the cloud service providers on this because they accept the challenge. But as of today, that's not yet possible.
From that perspective, in this work stream, we also track the overall program business case. We are the main driver for the overall commit, and also the main driver to get the workloads into the cloud. We are also responsible, for example, to close down datacenters and for the overall business case. We will continue to work there and you’ll see that later on in the cloud security topics.
» Data and analytics
The next three work streams are really the more interesting ones because they will drive our business value going forward. The first one is data and analytics. That's also the more common one. Here, it's about making access to all our data—which we have in all the different systems—available to everyone. We do this by putting a data mesh on top, commercialize that by establishing a DBG marketplace. This will then be the one-stop shop where our customers can get access to all our data—and underpin that with a centralized management and governance structure for data.
» Digital asset program
The next one is more important. Now we have all our existing trading systems, let's speak about the trading of the future. This means we will build a completely new trading platform for digital assets in the cloud and cloud-native from that perspective. And not only doing so, but also making sure that—with all the knowledge we have to run trading platforms —this also fulfills all the regulatory requirements
With this, if you look at all the existing crypto exchanges, this will be a big change. Our ambition with this digital asset platform is that we also fulfill the regulated market operations requirements. We can also offer this to all our institutional grade customers. Then, we need to make sure that we can support all kinds of digital assets, being crypto sports and crypto derivatives, and connect the on- and off-chain world.
» Our digital securities platform
This is underpinned by our digital securities platform. So, here we are speaking about how we can digitize the complete issuance process for securities from beginning to end. This is something we already started a couple of years ago and we are now using the opportunity to build this out in a full-fledged manner. Also, here, having access to centralized and decentralized market infrastructure. With this, we come to a completely new world where we have same-day issuance of new securities, improving the process and dramatically reducing the costs of issuing new securities.
» Cross-streams support transformation
These four are the main work streams of the Hyperion program and are underpinned by a couple of cross-streams, that drive the overall security and compliance posture:
» Cloud security
Cloud security is looking into how we want to do things fundamentally a little differently than we did in the past. We want to make sure that whatever our teams are doing in the cloud is secure by default. Everything should be baked in already. For example, in the landing zones we are providing. In the way they deploy into the cloud. And in the way they develop the applications going forward. This should then be moved to an approach where we want all our teams to be compliant as a default and in a permanent state.
As a financial service institute, in the past, we were constantly chased by findings. We always get the regulators coming in and have lots of findings that we have to work on. We want to turn this around. We want to show them what we have is compliant per se.
It is also very important because we want our security teams to understand the cloud so all the services can be moved into it. So, we need to move away from supporting security with on-prem tools in the cloud and have all the security services in the cloud as well. This also works very closely with the cloud governance stream that we have established. It's about having one overarching cloud compliance control framework where all the bits and pieces work together.
» Cloud governance
This is also about streamlining how we bring workloads into the cloud. In the past, that has been internally a very complex process. You need partly regulatory approval, you need to have approval from all the different second-level functions that you have in your company.
So this is now also about streamlining this—and making sure the ambition we have to put 70% of our workloads into the cloud is not hampered by missing internal processes. And also streamlining all the communication around our Hyperion program with our different regulators, which we are supporting.
» Culture and enablement
Last but not least, and this, to be honest, was also a lesson learned. We did not have this workstream there on day one, and we suffered a bit. The lesson learned is that it is super important to have it directly in the beginning. So, also, having a cultural and enablement work stream in there, which ensures everyone in the company is aware of the transformation program and what it means for everyone and for the company itself.
So, we have multiple formats for communication, newsletter, workshops, all this kind of stuff. But then also making sure we have a standardized way of approaching the re-and up-skilling of the people. So, putting in learning journeys, working with the different teams to see where there are skill gaps that can be addressed. And last but not least, to look into how we adapt our hiring strategy to find the right skills on the market.
» Cloud control framework
So, this, in a nutshell, is the Hyperion program. If you look a little into cloud governance and the cloud control framework, this is the canvas in which we operate. On one side, the controls are derived from all the regulatory requirements that we are getting. For example, from the BaFin, the CSSF—which is a Luxembourg regulator—because large parts of our business are located in Luxembourg. Obviously, the EBA, ECB, and you can name multiple more there.
Then, you have the industry standards; the most important ones for us are actually the ISO27000 and the CCM. Then, of course, we have our own internal specific policies, which we are looking at.
This all defines the cloud control framework, which we base on CCM, the main vehicle we're leveraging there. If you look into it, there's always two parts. One part is of what the cloud service provider is doing is—so the controls and security of the cloud. This is also where what I mentioned before—the collaborative cloud audit group, plays a large part. With that one, for example, we also ensure that the cloud service provider is fulfilling all our demands or regulatory requirements that we have.
But then, there's also a large part where we—as the customer and the consumer—are responsible for what we bake in the cloud controls, into the existing processes. Things like risk management, business continuity management, and so on. But on the other side also make sure that we do our homework in defining all the policies in our organization and Microsoft Azure—that the landing zones are compliant and that we also have the vehicle to monitor the different clouds.
Now—and this is also why it's a little highlighted here—is where the HashiCorp tool set comes into the picture about the DevSecOps. We also want to ensure we streamline the way we provision cloud infrastructure and deploy applications into the cloud.
» Centralized DevSecOps pipeline
It was not directly connected to Hyperion, but Hyperion helped us set the scene. The standard now is that we made a strategic decision that the only way cloud infrastructure can be provisioned going forward is via Terraform Enterprise.
Terraform Enterprise has a couple of advantages for us. It can be connected to our internal identity and access management system, so we have a proper system in place. It's also, in a way, an internally managed system. It's highly available where we have a DR capability in place.
It helps us to streamline, control and funnel everything that goes into the cloud via this one place where we can then turn on policies with policy as code—where we can later on also monitor whenever we have something that is not compliant with drift detection and have corresponding alerts.
» Overview of Terraform private registry modules
So, it is super important for us to do this on one side to increase the security posture because we want to make sure that whatever goes into the cloud is already complying from that stage. But the second thing is we also use it to accelerate our move into the cloud, as I mentioned before. This is because Terraform already allows you to develop many pre-baked modules.
We used the private registry module here and developed it for every, in that case, Google service. We developed a Terraform module, and we didn't develop it out of the blue because there are already existing Terraform modules out there from the cloud service providers. But we adapt them to bake in the security.
We do it with our internal group security, which develops a security baseline for every Google service, which puts out what security requirements are. This is then turned into code—where possible this is then already baked into the predefined modules.
We took the most important services first, typically GCE, GKE, cloud functions, cloud SQL, cloud storage, and so on. To have all the core modules available, and then afterward, we're looking for a more demand-driven approach. So, when we now have someone developing a new service and requires a new Google service, then this goes through a full process.,
Our enterprise architecture team first analyzes whether the service makes sense in our portfolio? And if that's OK, then it goes to group security, which develops a security baseline. If that's OK, then it comes to us, and we develop the Terraform module for this.
We tried to make this as quick as possible, but it's also important for the developers to understand that this is not a one-day process. Typically, if we have some proper demand management in place, it should not be a blocking factor. So, it’s very important because we want to accelerate cloud adoption and don't want to hamper it. So, working closely with the different teams is key to understanding if something new is coming up.
» Overview of Terraform Sentinel policies
In parallel with the modules we are developing, we also develop the corresponding set of Sentinel policies. If you look at the different security requirements, some of them can be baked into the predefined module, some not, and some of them translate. For example, in Google into a GCP auth policy, but other ones we then translate into Sentinel policies.
The same approach here. We make sure that this is first done for all the most used services, and afterward, we have to see which additional services require them. We have established a policy lifecycle process, which hopefully makes it quite agreeable for the different teams to adopt this.
Once we have developed the policies, we put them into an advisory mode for a couple of weeks to give every team a chance to adopt them and learn what they need to change. Typically, after three or four weeks, we turn them into mandatory mode.
Mandatory has two flavors in HashiCorp; you can have soft and hard mandatory. For us, hard mandatory is the more important one. This is the really interesting one because this drives things like enforcing encryption, must bring your own key, and certain other things. It is super important that, if you do this, you make sure you have an exception management process in place. This was also a bit of the lesson learned in the beginning—because there are always exceptions, how you treat them and manage them is important.
With the security group, we put a parallel exception management process in place. This is, on one side, having a structured and audited workflow where you know exactly what the exceptions are. But also, on the other side, making sure that the exceptions we decide on are baked into the overall system so you don't do it each time for every new deployment.
For example, if you have to be exempt from one policy, you want to make sure that this is covered for the complete workspace in Terraform and not for every deployment you do. There was also some custom work and adoption that we had to do to get this in place.
» What’s next?
This is where we are at the moment. All in all, I would say we have developed around 70 policies at the moment. We assessed that we need about 150 policies with the current Google services we're using. New services are always coming, so this needs to be an ongoing process.
It’s also super important for us at the moment that this Hyperion program very much focuses on Google as the main cloud service provider. But we are still using a multi-cloud strategy and we still need it because we always need an exit strategy.
So, the challenge for us now is that we need to make sure that in the next couple of months, we bring a second cloud service provider to exactly the same level. We need the same approach, same set of pre-baked modules, same set of Sentinel policies, and the same set of controls at the end, for a second cloud service provider.
I've also shown that we focus on providing basic Terraform modules for each Google service. I think there's a high need and interest in having some more value-added abstracted Terraform modules— which, for example, combine multiple resources under the hood. That's something we have to look into now. But this will also be more demand-driven to see what people need.
The exception management process, I already mentioned. At the moment—because it's a process—you have seen that we have been working in the cloud for multiple years, and a lot of teams are used to working with the cloud with different kinds of tools.
On one side, it is convincing them to move into one approach. It's always the stick and the carrot. Convincing on one side, but on the other side, closing all the loopholes to make sure that the only way is now via Terraform Enterprise.
This will be done on multiple levels—or network controls, for example—to ensure that this can only happen from certain areas. Or using identity and access management so that only a certain service account can provision infrastructure. There are multiple ways of doing that.
Training and knowledge are super important, and establishing some community of practice is something we are also working on. Right now, we are still doing it from a central perspective, developing the policies and the Terraform modules. But I would love it if this would be taken over later by a community approach where different experts, and all the different teams can contribute.
Obviously, the control will still sit centrally, so only one team approves certain policies and does certain quality checks. But, to make sure that we are not a blocking factor from a capacity perspective, we need to have more hands and shoulders working on this together.
I think that's what I wanted to talk about. Thanks a lot.