Transforming developer environments with service mesh at SPH Media
See how HCP Consul and HCP Vault lift a massive operational burden off SPH Media’s shoulders.
Update: Effective November 2024, HashiCorp has deprecated HCP Consul Central (aka HCP Consul management plane) as part of an effort to streamline offerings and focus on delivering the best possible solutions to customers. This helps accelerate product development in other areas of Consul to meet customer demands. For information on the latest Consul features and capabilities, please go to our Consul product page.
Be sure to check out the case study: Singapore media leader modernizes its cloud operating model along with this talk.
» Transcript
Yong Wen Chua:
Good afternoon, everyone. Thanks for taking time out to listen to our talk. Today, we'll be talking about how we're using Consul, specifically HCP Consul, to help our developers worry less about networking in terms of connectivity and security. We have a pre-recorded demo at the end to see how we got this working, and we'll link you to an open source Terraform repository where we have everything we've used to set up what we're going to demonstrate at the end.
» Who are we?
Niro and I are both from SPH Media. We're the publisher of the five main broadsheet newspapers in Singapore — in print and digital form. We also have several radio stations and publish some lifestyle magazines. We have a decent advertising footprint in Singapore in print, digital, and physical form. Physical meaning, like billboards at shopping malls. We have been around for more than 150 years, and our goal is to be the trusted news source on Singapore and Asia.
» The cloud platform engineering team
Niro and I are part of the cloud platform engineering team at SPH Media, where we take care of managing the infrastructure layer on these three main cloud providers that we use: AWS, GCP, and Alibaba Cloud.
On top of that, we manage and run common services like Terraform Cloud, HCP Consul, HCP Vault, GitHub, and others. For these services, we provide reusable code assets to aid that usage, such as Terraform modules in our private registry, GitHub actions, templates, template repos, and code snippets.
Our goal is to make all these platforms as self-serviceable as possible. For example, if you want to create an AWS account, you just submit a pull request or request a repository. Go through the entire CI/CD, get it checked and reviewed. Once it's merged in, an entire automated process will kick off to create an AWS account for you. All these are in service of the applications we run on top of all the platforms we manage. These three are the three main newspapers that we have among the five that I mentioned earlier.
» Some context
What are we trying to do here? The context today will be primarily about running things on AWS using Kubernetes, specifically Elastic Kubernetes Service, or EKS. We'll be using HCP Consul and HCP Vault to demonstrate what we have built so far. We are not using anything special in AWS or EKS. So, in theory, you can make use of what we're talking about to build this on GCP and other cloud providers if you want to.
We are talking about networking. So, we have to show this comic first — why networking is always very difficult to talk about and reason about. Ultimately, our goal is to help our developers not have to worry about it and figure this out at the platform level for them.
This is the current state of affairs: We have this massive hub-and-spoke spider web network of VPCs in our AWS environment. At the bottom left, we have the on-premise datacenter, which is hosting things like our printing press and other legacy applications that are still on-premise. Connection through to the cloud is via AWS Direct Connect into a transit gateway network that we host in a hub VPC. You can see around there are all the blue VPCs — all the other AWS accounts that we have that have attached to the transit gateway network.
» Networking pain points
What problems do we have? Right now, we have about 200+ AWS accounts. Each of these AWS accounts have their own VPCs. If we network them all together into this transit gateway network, we have to manage the IP addresses for all of them so they do not overlap. Otherwise, there's no way to route traffic between them. It turns out that 0/8 space is not that big, and we have a massive spreadsheet where we note which VPC is using what IP address.
It's also quite difficult to tell who can talk to what. Imagine you have one VPC and want to manage connection to and from 199 other VPCs. You can imagine how difficult it is to manage the security group rules in that context. It's made even worse when the IP address ranges of the other VPC networks are not contiguous or are differently sized. This is not a very good scenario to be in.
» Our wish list
This is a laundry list of what we are looking for in a solution in no specific order. The thing I want to point your attention to is multi-tenancy. We want to deploy something that works once and can serve multiple tenants without having to manage hundreds of separate installations. We also want to route traffic through overlapping CIDRs.
How does it look in the future? In theory, we want to have a future isolated state where each VPC is not connected by default, and we only establish connection when necessary. But we still need to worry about how services between VPCs communicate with each other.
How do we allow the on-premise datacenter to communicate to AWS and vice versa? And how do we connect HCP services to our AWS VPCs without going through the public internet? It turns out that we cannot actually physically cut the proverbial cables between each of the VPCs. We just have to use different networking constructs to achieve what we want.
As I've stated earlier, we'll be using Consul to try to achieve this. We'll be talking about how we use Consul to route traffic between VPCs, and we'll not really be talking about how we handle traffic within a VPC.
» Introducing our two use cases
The first is how do we secure service to service communication between services in different VPCs. These VPCs can be in different database accounts and might have overlapping CIDRs. The second use case that we'll be looking at is a more specialized use case of the first one, where we want to connect HCP Vault to our databases for the purposes of database secret engines. Imagine you have an RDS database in a VPC, some overlapping CIDR with something else. How does Vault know which RDS to connect to?
As I said earlier, we'll be using HCP Consul to host the control plane for us. HCP Consul helps us not have to worry about maintaining the control plane, and helps upgrade the Consul cluster as necessary. So, it's a massive operational burden lifted off our shoulders. HCP Consul also gives us some enterprise-only features that are helpful to us at set up.
» Multi-tenancy with admin partitions
We achieve multi-tenancy with Consul using admin partitions. Admin partitions exist as a level below the cluster and above-named spaces. Admin partitions have several isolation properties that help us achieve multi-tenancy.
For example, you can have overlapping silos between each admin partition. Each admin partition also has their own sets of defaults and settings that do not interfere with each other. Service to service communications across admin partitions are also not allowed by default unless you export the service from the admin partition to another. We also bridge traffic between admin partitions using mesh gateways. Mesh gateways can handle routing traffic across networks with overlapping CIDRs.
We'll be working with the following assumption for the rest of the slides. One VPC per AWS account and one admin partition per a database account — so a one to one to one relationship.
» Transit gateway and private NAT gateway
We're back to this modified version of this spider web. There's a difference here where you can see that we have added two different types of subnets and a private NAT gateway. Ultimately, we still need to have a way for traffic between admin partitions to flow together.
One easy way to do this would be just sending through the internet, but that's quite unsatisfactory for two main reasons. First, we don't want to pay the egress cost of AWS into the internet, and there are no network guarantees about how traffic is routed through the public internet. So, ideally we want to keep everything within AWS.
We are back to this network with the private energy gateway. There are two types of subnets I was talking about: the red non-routable subnets and the blue routable subnets. I'll explain more how we use route tables to send traffic from one type of subnet to the other in a later slide.
This is the high-level overview about how you can choose IP addresses for your subnets. In summary, only the non-routable subnets can overlap with each other. Everything else cannot have overlapping CIDRs.
I'll demonstrate how traffic can go from the non-routable part of the network to the routable part. Imagine you have something in the non-routable part of the network that wants to access HCP Vault, for example. The traffic will flow from the non-routable subnet to the private NAT gateway. At the private NAT gateway, the gateway will perform source address translation to that of the IP address of the private NAT gateway.
So, the private NAT gateway will get its IP address from the readable subnet range, and then the traffic will go to Vault, for example. Then, there will be return traffic back from Vault. When the return traffic goes back to the private NAT gateway, the private NAT gateway knows who to forward to for the return traffic. In this setup, there's no way for anything in the routable part of the network to open a connection back into the non-routable part of the network.
» Consul gossip requirements
Just remember in a normal Consul setup, there's also the control plane, and there are workload agents deployed near your workloads. In our scenario, we want to deploy workloads at the non-routable part of the network, which is at the red color part here.
The first requirement usually is that all Consul clients must be able to connect to the control plane. If you put your Consul clients and the workload at a non-routable part of the network, there’s no problem reaching the control plane. But there's also a requirement where the service must be able to initiate connection back into the workload agents themselves, which is not possible based on what I've explained earlier.
» Consul data plane
What's the solution to this? We make use of Consul data plane to help us out. Consul Data Plane is a replacement for the Consul agents or applications that are currently running on Kubernetes. The Kubelets on the nodes already performed some of the jobs of a Consul agent, which is health checking, for example. So, there's no actual need to replicate that.
We can replace the DaemonSet as the point on every single EKS node with a data plane container next to the Envoy HAproxy cycle, which is already injected as a cipher to your service. The data plane will take care of talking to the Consul control plane to get information for the Envoy proxy to do its thing.
Now, we have fewer networking requirements. The workload needs to reach the control plane, but not the other way around, and it's no longer necessary to deploy a DaemonSet across all your nodes. This enables us to deploy applications on Fargate, for example, where there's no way to deploy a DaemonSet port in the first place.
There's also no need to because the Consul agent ports usually require a host port, which is not allowed on Fargate. That has removed that requirement as well. It's also easier to upgrade your control plane independently from your workload because there's no longer a need to sync your Consul agent ports version with the Consul control plane.
This is a very complicated diagram of our setup, which I'll zoom into individual parts later on to explain what we've built here. As you can see, there's the blue routable parts of the network and the red non-renewable part. I've also added in the green part, which is the public subnets — just to demonstrate that in this setup, we can still access the public internet from our networks.
The first part at the top of our diagram is a VPC IP Address Manager. This is a perfectly optional component that you may want to use or not use. The whole point of this component is to issue IP addresses to your VPC subnets where we don't really want to use a spreadsheet to assign IP addresses anymore — so we went for this.
The table at the bottom right is where we have allocated IP address ranges for our routable and non-routable parts of the network. The non-routable part of the network is a 10.128/9 range which you can reuse however many times you want in all your VPCs. And the top part — the blue row — is the one that's managed by IPAM.
» HVN and TGW
On the bottom right, you can see HVN, which is the VPC that HCP manages for us. And inside, there's Vault and Consul, which we've deployed on HCP. On the left side, you can see the VPC, where the transit gateway is set up.
We've created, as I've said earlier, a partition for this entire VPC. The transit gateway lives in the routable part of the network. And we've also put a mesh and ingress gateway in this VPC. The purpose of the ingress gateway is for something outside the mesh to reach into the mesh. In this case, HCP Vault. And then a mesh is the mechanism by which traffic flows to the other partition.
» Application VCP
This is how it looks from an application point of view. This admin partition B is exactly the same on the left side of the diagram, which is partition A. There's still the blue routable and the red non-routable subnets. We've noted this VPC has two CIDRs. At the bottom is the overlapping non-routable CIDR, which is the primary CIDR of the VPC. And at the top, there's the secondary CIDR, which is managed and issued by IPAM.
We also attach a transit gateway to this network. It doesn't matter which subnet you attach the transit gateway to. You just need to make sure you attach it to at least one subnet per availability zone. Otherwise, AWS will not route the traffic to the AZ.
Let's first look at the route table for the non-routable subnets. As you can see, the first two rows are the primary and secondary CIDR of this VPC. Just route them locally. Nothing special about that. The third row is where our setup is different from the rest — where we route traffic to the route above 10.128/9 CIDR ranges through the private NAT gateway, as I mentioned earlier. The final route is simply sending everything else to the public NAT gateway to the internet.
It's the same picture for the routable subnet as well, where the second and third rows are just local traffic, and the first row is everything else to the internet. The difference is in the fourth row, where, instead of sending it through the private NAT gateway in the routable subnets, we send them through the transit gateway to the rest of the network.
» Service to service
How does traffic flow between services to services? On the left side is one admin partition — right side is another admin partition. Both of the red color subnets have overlapping CIDRs and the 10.0/16 range. Imagine we have a service in A that wants to talk to B. How would that work?
This is how the traffic will flow from A to B: First, service A would connect to its local Envoy proxy, which will then route the traffic to the local mesh gateway. The mesh gateway will follow the traffic to the mesh gateway in admin partition B, which will then load balance it according to how many replicas there are in admin partition B — so fairly straightforward. There's a bit more setup to do, but Niro will talk about that later.
» Vault to RDS
It's a similar picture for how we connect Vault through this entire setup through the RDS database. I want to connect Vault secrets and database secrets engine to an RDS in partition A, which is on the top left side. This is how the network traffic will flow: Vault will connect to the ingress gateway in the admin partition at the bottom, and the ingress gateway will forward the traffic to the mesh gateway in the same admin partition.
The mesh gateway will then forward the traffic through the transit gateway network to admin partition A. And because RDS is not a service that's part of the mesh, we have to use the terminating gateway to forward the traffic to it eventually.
» Use case #1: service to service
Now, Niro will talk about how we actually get this to work in our setup:
Nirosan Paramanathan:
Thank you, Yong Wen. Our first use case is service to service communication. We have two admin partitions — partition A and partition B. We have deployed the Consul data plane components using Consul K8, and then the frontend and the backend are deployed in separate partitions — VPC A and VPC B.
Let's look at the setup: At the top pane is the frontend K8s cluster, which is connected to partition A, and at the bottom pane is the backend K8s cluster, which is connected to partition B. If you can see, the frontend and backend are using the same overlapping CIDR — 10.0 segment. Let's try to connect to the backend using its IP address first. If we configure the frontend port to connect to the backend using its IP address, it's going to the frontend port and try to access the application.
Here, I'm doing a call request to the backend, and we can see it's getting timed out because since both of them are in the overlapping side, it's not routable. As you can see in the fake service example, the frontend and backend are not able to connect to each other.
To make this setup work, we need to route the traffic via the Consul mesh gateway. To do that, we have to add certain Consul config entries. First, we need to export services contained in the backend partition to the frontend services. Here we are exporting the backend service and the mesh gateway to the frontend partition.
Next, we need to add intentions. All the intentions are denied based on the default policy. We need to add intentions to allow frontend services to talk with the backend services. And we need to have the service defaults at the frontend to send the outbound traffic via the local mesh gateway. So, here, we have defined the mesh gateway more as local. And the last thing, since we are running on EKS, which doesn't support transparent proxies, we need to add explicit upstream port annotations as shown here.
Let's go back to our recorded demo. We have deployed mesh gateways in the frontend and the backend, which are deployed in non-overlapping CIDRs — 10.130 and 10.131. First, let's try to connect to the backend via the local mesh gateway. You can see the upstream VRL changed to localhost now. Well, let's try to tail the logs at the backend. Then we go into the frontend port and try to access the application.
Since we are now routing the traffic via the mesh gateway, we can see a 200 response from the backend. And we can see the logs coming in the backend ports as well. If you look at our fixed service, we can see both the frontend and backend are connected to each other. This is how it looks in the Consul UI. We can see the frontend is deployed in partition A. And in the frontend service upstreams tab, we can also see the backend services are configured.
» Use case #2: HCP Vault to RDS communication
As a reminder, Yong Wen mentioned the traffic will be originating from HCP Vault — going via the transit gateway to the ingress gateway and the local and remote mesh gateways then to the terminating gateway — and finally to the RDS.
To make this setup also work, we need to add certain Consul config entries. First, we need to register RDS as an external service. Next, we need to create a terminating gateway for the RDS service. Importantly, the terminating gateway role must have the right service permission to the RDS service.
Next, we create an ingress gateway for the RDS service with TCP listeners. Then, we need to add intentions to allow the ingress gateway talk to the RDS service. As you can see, our RDS is deployed in a non-routable CIDR — 10.0 segment, and we have deployed our ingress gateway in the routable CIDR — 10.130.
Let's try to create a database secrets engine. Then, let's try to create a database connection. I'm selecting the Postgres database plugin, giving a connection name. First, we will try to connect using the RDS DNS name itself, which is deployed in a non-routable CIDR. If we try to create the database connection first, since it's in a non-routable CIDR, it'll be timing out. If I fast-forward a bit, you can see the timeout happening.
If we change the RDS endpoint to the ingress gateway, we are able to create the database connection immediately. Now, let's try to add a role to retrieve a dynamic secret from HCP Vault. I'm adding the necessary SQL statements here to create the role. Then, let's try to generate credentials. Here, we can see we are able to generate dynamic credentials from HCP Vault. This means HCP Vault is able to reach the RDS via the ingress gateway.
» What's next?
We need to think about operationalizing our setup. We have around 200+ AWS accounts. How do we set it up as part of our account pipeline itself? Also, how do we monitor this on a large scale? And, importantly, how do we help our developers make sense of this setup?
Here's our GitHub link for the open source demo setup. You can clone and try it all yourself. Thank you.