Case Study

How Vault Enterprise is used for the London Stock Exchange to reduce risk and improve DevEx

Published 12:00 PM UTC Sep 26, 2024

Learn about how HashiCorp Vault secrets management is used at the London Stock Exchange.

»Transcript

Akash Gupta:

Me and Paul are going to talk about how we implemented HashiCorp Vault within LSEG (London stock exchange group) and how it helps us reduce security risks and improve the dev experience. My name is Akash Gupta. I'm the platform engineering lead for LSEG. My main responsibility is to build, develop, and design enterprise CSC tools for LSEG. I also help teams onboard on the DXOne platform.

Paul Cavanaugh:

Good morning. My name's Paul Cavanaugh. I'm the Field CTO for a boutique DevSecOps and DevX company that works closely with LSEG.

Akash Gupta:

This is our DXOne core engineering team. We have David Jones who is Head of Engineering Ex and Productivity. We have Naveen, Sam, Lakshmanan, and Huseyin, who is our RSC HashiCorp Engineer, with us.

»What are we going to talk about today?

We are going to talk about how we are implementing DXOne in the LSEG. We'll talk about some of the opportunities we got to improve the security push for LSEG. We'll spend some time on our Vault capabilities, Vault security engines, and auth methods we have enabled.

We'll spend some time on how we have implemented the Vault platform in the organization, some of the secret journey and the milestones. We'll spend some time on how we onboard the applications team on the DXOne platform—and how it's a seamless onboarding experience. In the end, we will talk about some of the key learnings and adoption strategy.

»Introducing the London Stock Exchange Group

Paul Cavanaugh:

Many of you, I'm sure, are aware that the London Stock Exchange has a proud history of over 300 years. It's grown through merger and acquisition. It's diversified as well. Today, it now has a global footprint, and some of the most critical services and economic functions in the world depend on the London Stock Exchange and its technology services.

As an example, we service around 45,000 clients in 170 countries. That's a global footprint. We service 400,000 individuals across those 45,000 clients. We have around 26,000 colleagues in around 60 countries. More than 50% of those are actually in the software engineering function or operations. So, we truly are a technology shop. In terms of criticality, over $442 Billion daily happens in the FX platforms. $367 Billion happens daily, on average, in terms of fixed income.

As an example, we hold pricing information on around a hundred million securities—that’s real-time and historic. One of the databases that powers that has 87 trillion data entries and we need to be servicing that in real-time for real-time trading. So, low latency stuff. We are dependent on the services we deliver, on technology, and getting it right first time every time—and quickly.

»DXOne—our DevEx ecosystem

Our response to that? It was interesting listening to Ruben earlier on from Lloyds in terms of their journey, the need to make developer experience, remove friction from that process, give developers, engineers and operations staff the right, modern tools to pave, build, develop applications and infrastructure—do it reliably and in a well-governed way.

Our response to that is to build what we're calling DXOne. That's our integrated developer platform. It has many components, best-of-breed tools, strategic tools, and strategic partnerships.

As a regulated entity, we need to make sure we build compliant software. It's reliable and resilient, and we can move faster and faster whilst in control and compliance, to make sure we delight our customers and we’re delivering change faster and faster to permit business innovation.

Akash Gupta:

As part of the DXOne program, one of the intents was to understand the gaps we have. That gives us the opportunity to improve the security push.

We did a lot of brainstorming sessions with security architects, security leads, and developer leads to understand how they manage the secrets within the organization. We did the security strategy document to start with. The intent was just to understand the gaps we have, how the teams are using different secrets in the organization, how they're using different tools and seize this opportunity to improve this in the new solution.

»The opportunities

»Multiple secret management tools

As part of the strategy document, we identified a lot of opportunities for us to build. First, we noticed that the teams were using multiple secrets management tools. It's because of the large history of LSEG with mergers and acquisitions.

»No single source of truth

It is a challenge that there is no single source of truth for the secret management tool. There is no lifecycle for the token or the secret. We wanted to provide a strategic tool for application teams to use, so that they have a single source of truth to start with.

»Use of hardcoded secrets

We want the new solution to switch to a dynamic secret engine and not have hardcoded secrets in the codebase in the first place.

There are a lot of issues and limitations with using hardcoded secrets because if it's exposed to humans, it defeats the purpose of using Vault and other tools. We want to make sure the new solution the secrets are not exposed to humans.

The other challenge with hardcoded secrets is that the team needs to do a manual rotation. And the high TTL (time-to-live) — we want the new secrets to be low TTL and make sure that there is a dynamic base so the teams can automatically and dynamically rotate those secrets.

»Infrastructure

As we were growing more with DXOne adoption, we wanted our infrastructure to be aligned with customer expectations—that new infrastructure we are building is highly available and highly resilient. So, this is one of the opportunities we noticed to improve in the new solution, when building the DXOne platform.

»RBAC

We noticed there is no consistent behavior in how the teams are accessing the gate. There were no defined policies on who could access what secrets. We wanted to make sure that in the new solution we have a consistent RBAC model so that we define who can access what secrets and how securely you can allow them to retrieve those secrets.

»Patterns and inner-source

Next, as part of the strategy document, we identified that a lot of teams trying to do the same thing. We wanted to make sure there are templates and patterns defined so they do not need to repeat themselves—and that they have a clear strategic way they can consume the secrets from different CI/CD pipelines.

»Network complexity

It's again because of mergers and acquisitions. When you do a lot of mergers and acquisitions, you have different network complexity. There are boundaries. The teams can access some of the tools and cannot access some of the different tools in the CI/CD. We wanted to make sure the new, strategic solution has a seamless experience of how they can access the strategic enterprise tools.

With all of these findings and the gaps we noticed in the strategy document, the intent was to go ahead with this strategy document, get a leadership review, and go ahead with some of the enterprise tools on the market. The intent was to do a lot of POCs and understand what will be the best tool for us to start with. We selected Vault Enterprise, which we found would be a better fit for our requirements.

»Vault capabilities at LSEG

How we have started building Vault Enterprise: We learned from the past tooling and the strategy document that we wanted to address all those issues with any strategic tools.

We started by making sure all of these secret engines and auth methods are aligned with the strategic alignment and roadmap. We started with some of the auth methods. We are using the JWT auth method in OIDC. The intent here is that it should be aligned with the strategic approach. We do not want to start enabling a lot of auth methods and a lot of secret engines.

We started with some of the secret engines like KV secret engine and dynamic secret engine, DataBlue, AWS, Azure, and Google Cloud. We wanted to address some common use cases for how the teams and organization can use service accounts.

We started, and we enabled the LDAP secret engine as well. One key secret engine we enabled is the custom secret Engine. We wanted to make sure that we use Vault, not just for the basic native secret engine. We wanted to increase the capability so that we can address the secret zero problem as well. We'll talk about the custom secret engine, AWS, and Azure dynamic secret engine a bit.

Being a regulated organization, we want to enable the secret engine auth method. We have a full governance process to make sure we enable the capabilities in the Vault platform. We go through the requirement gathering with the different teams, and we make sure we do spikes and POCs around it.

We go ahead with the architecture and design. We review all those auth methods and security engines, the different forums to make sure that it's approved by security. Once it's approved by security and governance then, we implement those secret engine auth methods.

»Dynamic secret engine

Once we start working on enabling some of the secret engine auth methods, the first secret engine we use extensively is the Azure secret engine. With the partnership with Microsoft, there are a lot of teams coming up with different use cases. We wanted to address all those use cases in the capability we have in Vault.

Obviously, the dynamic secret engine enables you to have low TTL with the tokens. It enables you to dynamically provision and get secrets securely in the CI/CD pipeline. The dynamic secret engine configuration is very simple. As a platforming team, you just need to onboard those teams and enable those Vault rules for that particular application team from the CI/CD agents. It authenticates to Vault and fetches the token for the CSPs, and then you can deploy your resources to the CSP.

The most important and interesting thing here is that—because of the different use cases we have, we supported multiple tenants in Azure. So that the teams can get and retrieve different secrets from different subscription resource groups. We configured the Vault SPN with different combinations of subscription sources so we can scope their secret to the defined tenant subscription resource group.

»Custom secret engine

We wanted to make sure we do not just build the static secret engine and the dynamic secret engine. We wanted to make sure that we use Vault Enterprise to increase some more capabilities. When we were doing a lot of talks with different stakeholders, we still noticed some of the teams wanted to use Vault to maintain the KV secrets.

For the dynamic secret engine, we wanted to make sure they do not use those just for Vault use cases. Somehow it defeats the purpose of using Vault if we are just maintaining the static secrets and dynamic secrets in Vault.

We wanted to understand all other use cases where the team needs to maintain the static secret in Vault, and that's where we came up with the custom secret engine approach.

The custom secret engine allows you to enable any backend with Vault to provide a dynamic functionality to application teams so that they do not need to manage static secrets or API keys manually. This is one of the key features we worked on with HashiCorp Vault.

We discussed how we can address this requirement with a lot of stakeholders. We go ahead with the problem statement. We had a discussion with HashiCorp, and then we actually started building that custom security engine within the organization. Some of the use cases like here, if you have a requirement to manage some of the tokens within Vault, you’ll want to customize those so that it'll work within your environment.

That's where you need to customize those secret engines somewhat so they will work very well within the LSEG environment. That's where we had a lot of engagement with HashiCorp before we went live. Our plan is to go back to the OSS community as well and contribute.

It required a lot of effort to build a custom secret engine within the organization. Once you build the custom secure engine capability within Vault, it's very easy to implement.

The intent here is you provide teams with functionality so that they do not need to manage different tokens or different API keys from different tools. When you enable the CI/CD capability in the organization, you need to make sure you manage the different tokens. Then you can use Vault to make sure that Vault can help you manage those tokens dynamically.

Once you configure this dynamic secret engine in Vault, it works by getting the job from the CI/CD. Then it authenticates to Vault through JWT authentication. Then, it calls back to the CI/CD tool.

Let's consider here there are lots of CI/CD tools. It can be here in this place where Vault needs to talk to those tools and then gets these tokens based on the JWT scope you provided. Then Vault will give you the token, which is retrieved from that CI/CD tool.

Once you configure this, it helps you to address the secret zero problem. It helps you to make sure that Vault is being used as a single source truth. It helps you to ensure that teams are not using those secrets as a manual intervention and they are not using them as hardcoded secrets as well.

It helped us address a lot of use cases where the team needs to go to Vault as a UI and manage those secrets manually. We wanted to address those use cases in the new solution, and make sure there are limited use cases where they need to go to Vault UI and log in through OIDC auth methods.

»Vault platform design for LSEG

Let's talk about what our platform looks like. We invested a lot of time to define the right namespace model. First, we tried the root namespace. We feel that it doesn't work very well, so we wanted to implement a different namespace model in the organization.

»Namespaces

The intent was to make sure it gives us a very good operating model capability so that we can enable auth methods and secret engines as per the business unit requirement. So, we had a lot of brainstorming sessions with different stakeholders. We started with defining the namespace. Root namespace didn't work out because it was not aligned with our business requirement.

Next, we tried having a namespace per CSP basis. We were not able to address and align our compliance and governance requirements with this. Then, finally, we managed to get a namespace at the business unit level. It's very lightweight. You can always modify and revisit this in the future if there are any changes in the business unit.

It allows us to have more flexibility in how we will enable auth engine, auth methods and security engines. It allows us to have isolation between business unit one and business unit two. It helps keep us aligned with more security and compliance requirements.

On the right side of the diagram, this is how we can maintain our replication and disaster recovery based on business requirements. So, if business unit requirements are not to replicate the data outside the UK or across Europe, you can always maintain this business requirement. That's the way we have implemented the business unit and the multi-tenancy model within LSEG.

»Our infrastructure

As we were going ahead with more adoption and development of the tool, we wanted our platform to be highly available and highly resilient.

We invested a lot of time to make sure that we have a CI/CD pipeline and that it's a single-click deployment where it can have zero downtime upgrade—that it's multi-cloud, multi-region, and highly available infrastructure. We also use Packer so we have custom Vault images. It helps us a lot to deploy faster in Vault. We embedded most of the license certificates with the Packer images themselves.

»Secrets journeys, roadmap, and milestones

Paul Cavanaugh:

We've heard of the overall positioning for the platform. We've heard elements of our journey, but I wanted to bring that to life for you. As Akash has already explained, we've come from a land of different issues in terms of different ways of managing secrets.

Looking around in the marketplace, it became pretty obvious. We needed to change ourselves to improve our own posture and practices. And Vault Community was a key learning platform for us. That heavily then informed our secrets management strategy. We've been executing on that and leading a decision, and now we're into the implementation and build stage.

We went live last year with Vault Enterprise and we've been incrementally building features as Akash has been describing. In a moment, we'll go into a couple of interesting use cases about how we're addressing developer experience and removing friction from the process.

Our ambitions for next year effectively are to get to that HA platform. Again, we're multi-cloud today but need to get to HA. That gives us the ability for the runtime support that we're building towards. Very much incremental: crawl, walk, run. We've been delivering for a period of time now, evolving our capabilities in response to business needs.

»Where are we today?

Here's a set of bullet points, which are the main issues we started off as pain points for the organization. If we've got a tick, it doesn't mean we've cracked it entirely. It means we're well on the road. We have adequate management and mitigation of those things. We'll continue to evolve those features.

The two that we're really focusing on at the moment is the HA (high availability). That's the second point. Then, in the middle is the onboarding—and acceleration of onboarding—of our AppSec infrastructure and developers onto our DXOne platform, of which secrets management is a fundamental and essential capability.

»DXOne Onboarding

Akash Gupta:

We do not want to onboard users or the teams on the DXOne platform. We want to onboard applications—as in products. In this we do not have any separate onboarding for Vault Enterprise as such.

»ServiceNow forms

We want the onboarding experience for the teams to be seamless. So, we provide them with a platform and a ServiceNow self-service form where they can onboard their applications using all our DX tools.

The onboarding form is pretty automated. It gets started as soon as the user raises the ServiceNow form. We look for all this information which we need to onboard applications. It’s the required way to define our back model for the different CI/CD tools. Then we get a lot of information around how we can onboard application teams based on their AWS and Azure subscription.

»Manual request approval

Once they fill out the ServiceNow form, it goes to the application owner to get those requests approved. We have internal tooling to ensure that all the applications we are onboarding on the DXOne are active in those systems.

Once application owners approve those requests, the automated CI/CD pipeline will be triggered on the back end. This pipeline ensures that particular applications get onboarded to the DXOne platform.

Once we make progress on the CI/CD automated pipeline, it goes to different internal stages. First, we create those LDAP groups and make sure that we have those groups synced with Azure AD (Active Directory). We have the recertification process to ensure we have a JML process defined to manage the user group memberships.

»Tools onboarding

Once we create those groups in Azure AD, we onboard the CI/CD tools. As I mentioned, this onboarding typically helps teams onboard their application or the different CI/CD tools.

»Vault onboarding

Once we go ahead with the tools onboarding, it triggers Vault onboarding. We have Lambda endpoints, which get triggered internally from the CI/CD tool onboarding itself. And as we progress, we create Vault rules and bound claims for that particular application team as part of the onboarding.

»Creation of Vault policies template

After this, we create a Vault policies template. This gives them enough access to read and write the secrets using CI/CD tools.

»Secrets engines onboarding

Next, we create and onboard them on different secret engines. It’s the requirement that you get from the ServiceNow form. We get those details and use that data to onboard them on the different secret engines.

»Closing the request

Once we progress with this, we just close the request and make sure that the users and the requester who has submitted this request get notified. We then complete this process—usually earlier. It used to take around three to four days because the application team needed to onboard different toolings.

We want to make sure it's a horizontal RBAC model and horizontal onboarding so they do not need to fill out different ServiceNow forms to get onboard on different toolings.

We wanted to make sure that there is a single ServiceNow form and entry so that they can get onboard on DXOne—as in one product—instead of onboarding different tools separately.

»Granular self-service Vault secrets engine

Once we onboard applications on the DXOne platform, we provided developer and owner roles based on the RBAC model they define. It somewhat limits how they can access, read, and write the secrets with Vault.

When we started introducing this—and I started with the early adopters phase—we got some feedback from the teams that they wanted to have more general access to Vault.

They wanted more granular access and separation duties defined within those groups to determine how they access Vault secrets. So, we went ahead with a more granular self-service Vault secrets engine.

Since we were using policy templates, it was a lot easier to enter any new requirements—if there is any need to define the RBAC role at one level down, which is where they want to define the separation duties and custom capabilities of the policies.

We get a payload requesting how they want to onboard their sub-applications inside Vault. Once we get this information, we trigger the Terraform pipeline. This helps them get onboard on Vault with a self-service mechanism if there are any custom requirements.

»Vault PKI secrets engine

We got a lot of requirements relating to how the developers are managing different PKI secrets in the organization. We partnered with the LSEG PKI team to enable this feature. The intent was to make sure we have a CI/CD capability that allows them to rotate and manage the PKI certificates. That's where it's going to help a lot of teams manage certificates in the organization.

It will help a lot of teams to manage the certificates on the fly. They can get the certificates through the CI/CD pipeline. They do not need to depend on other tools to generate those certificates manually.

We will also address the secret zero problem as well because the secrets are not exposed to humans in this way and it will be securely accessible through the CI/CD pipeline.

»Key learnings

We have a lot of key learnings from our adoption and onboarding journey:

»Fail fast approach and stakeholder involvement

First, we want a fail fast approach. Whenever we introduce any new secrets engine auth method, we go ahead with early adopter feedback. While working with early adopters, we want to fix all the issues that the teams are facing as soon as possible.

»Strategic intent

We want to drink our own champagne—to make sure we onboard our applications first and the tooling capabilities we have in DXOne.

»Communicate and promote

The next learning was to ensure we had all the how-to user guides in the organization When we onboard any teams, it's a lot easier for them to understand the capabilities of the tools and platform if we have all the how-to user guides. We have a lot of user guides, demos, and videos so they can get started without coming back to us after the onboarding.

Last but not least, we have a very good partnership with HashiCorp and DevOps, which helps us a lot to build this platform and world capabilities in the organization—and that helps us a lot.

Thank you.

Paul Cavanaugh:

Thank you.

More resources like this one

2/3/2023
Case Study

Automating Multi-Cloud, Multi-Region Vault for Teams and Landing Zones

1/5/2023
Case Study

How Discover Manages 2000+ Terraform Enterprise Workspaces

12/22/2022
Case Study

Architecting Geo-Distributed Mobile Edge Applications with Consul

12/13/2022
PDF

A Field Guide to Zero Trust Security in the Public Sector

View all resources