Skip to main content
Presentation

HashiCorp Consul: Applying knowledge from years of experience and listening to our customers

James Phillips gives a systematic overview of Consul's different network models, how they work, what kind of use cases they serve, and how prepared queries can provide the glue that keeps service discovery simple.

James Phillips, HashiCorp’s Consul lead, includes an in-depth discussion of:

  • Consul Enterprise features -- especially the advanced network models and their use cases

  • Clustering (within a data center)

  • Federation (between data centers)

  • Gossip Pools (a SWIM implementation)

  • Isolation Topologies and Network Areas

Speaker

Transcript

I want to introduce myself, James Phillips. I'm the Consul Lead at HashiCorp and my membership in Generation X should prohibit me to use this phrase but when you all cheered for Consul 1.0, that gave me the feels, so thank you. That was really cool. I know that sounded weird coming from me and my hair is gray and everything but I appreciate that. Anyway, the topic of this is Consul and complex networks. I'm excited to see such a turnout for this because this sounds pretty kind of nerdy but this actually is stuff that we've basically applied knowledge from years of experience using Consul ourselves and listening to our customers and users on the road to 1.0, so let's get into it.

We'll start with just kind of a quick introduction to Consul, give you an overview. As was mentioned in the keynote, Consul is kind of the piece that connects all of our tools together and is used in your own infrastructure to connect your applications and services together. It combines a few different aspects in interesting ways. The heart of Consul is service discovery configuration orchestration, so when you have your application broken up into a series of services, you need to have ways for the services to locate each other. It's a really basic requirement once you have microservices.

You can register services with Consul and a service is really just an IP and a port number. There's various ways to do it it's statically with configs or VR APIs and then, you can discover them using our HTTP APIs and using DNS. The idea there is that Consul can glue into any existing applications that you have. They don't have to integrate with Consul directly. If they can look up a hostname, then they can use Consul’s service discovery.

We do some really interesting load balancing basically by just shuffling the results from DNS queries so you don't need to … for your internal services talking to one another, you don't need to deploy separate load balancers as part of that. You can remove moving pieces once your stuff is in service discovery. You can manage dynamic parts of your runtime configuration using our key value store, so you can store settings free applications. You can use it to turn features on and off at runtime and you can kind of integrate with anything in a lot of different ways. You can do first-class things like Vault and Nomad. They just talk to Consul directly using our client libraries.

You can use the DNS interface to integrate with some crusty thing you don't even know how to compile anymore. You can still use Consul because it can do DNS lookups and then, we have other tools like Consul template and Consul that can let you render out config files based on things and Consul inject environment variables. There's a lot of the glue Consul into your applications.

Service discovery is useful on its own but if your services are actually down, it doesn't make much sense to discover them or give them out to request for where's my database, where is this middle service, so by integrating health checks with service discovery, Consul can give you up-to-date, useful information about healthy instances of services. It gives you the ability to pull services out automatically based on the status of health checks rather than having to have people take action and remove them. There's a there's a bunch of different styles of checks there and there's even like a time to live check so you can require that your service talk to Consul on some periodic basis.

Then, finally, Consul supports high availability. Consul itself is highly available. It uses Raft, the Raft consensus algorithm to run multiple servers so your state is protected even in the event of a server failure. You can federate multiple Consul clusters together, so out of the box and for several years now, Consul’s had the ability to federate clusters so you can have your infrastructure distributed across multiple data centers and multiple continents and it works with that out of the box. We'll see some examples of these other aspects later in the talk but there's some pretty interesting things you can do with federated clusters around failover and some automatic capability that's pretty interesting, so we'll have some examples there.

Consul itself runs as an agent. You run the agent on every node in your infrastructure. It's a single Go binary with no dependencies so it's super easy to deploy. It doesn't require a bunch of other tools. It doesn't compose other things. You run the Go binary, you're done. Your applications always just talk to their agent running on their local machine whether it be a physical host or a VM, so the initial service discovery problem of where's Consul is solved by just talking to your local agent. The agent takes care of finding other agents in the cluster, finding where the healthy servers are. There's a lot of complexity that we take on with agent design rather than having to have some client that has to keep track of where the Consul servers are and things like that. The bulk of this talk will get into some details about how that works.

You run the agent everywhere. You run a smaller set of agents, usually three or five on a separate set of machines and you run those in server mode. Those keep some state for the cluster and have some special roles, so we'll talk about that. They provide a consistent view so when you query Consul to find services, you're actually interacting with the servers but that's hidden under the hood by how the agent provides an interface that masks all the details of the actual Consul cluster from you.

This is one of the first talks, at least I've given, that's going to focus on some Consul Enterprise features. This gives a rundown of just Consul Enterprises at a high level. This is a paid version of Consul that adds things like automated backups, automated upgrades. We have some interesting features around read scalability for the servers and redundancy zones lets you take advantage of having servers in different availability zones. We do some automatic management there but the interesting thing here and the focus of this talk is the last bit, which is the advanced network models. We'll actually get into four different network models. Two of these are open source and two of these are in enterprise and these are all in our shipping versions of Consul right now. We'll go through what these models are for, why we added them, what use cases they support and we'll show some examples of what they look like when you're using them.

There's basically two different things you're trying to do with these different Network models. One is clustering. When you have a set of machines that are logically related, they’re may be physically located in the same place or they're used for the same type of application, Consul lets you connect those together in a cluster and we call that a Consul data center. Each data center has one set of Consul servers that are part of that cluster and then basically any number of agents. Consul can run with clusters that just have a handful of machines up to tens, even hundreds of thousands of machines in the same cluster. For clustering, there's two models. There's something called LAN Gossip and then there's something called network segments which is available in Enterprise, so we'll go through those.

The second class of network models are models that support federation. Once you've created a cluster and you want to connect it with another cluster that might be physically located somewhere else, it might be on another continent, you create a relationship between the servers in the cluster and you federate them. The clients don't participate in a federation but by federating data centers together, you give clients access to resources that are in those remote data centers.

You can think of it as being able to find an instance of a service that's over in another data center or another continent like redundancy. It may be for having like a centralized data center that's managed by a team that maybe runs your Vault setup and then having a data center per team or per application for isolation. There's a lot of different use cases and we'll show some examples but the idea of federation is you take your Consul data centers and you join them together. There's two models here that we'll talk about. One is called WAN Gossip and then one is called network areas which is in the Enterprise version of Consul.

One thing that I'll probably throw out there and say a million times is the word gossip or gossip pools, so it's worth defining what that is because that's not necessarily a super well-known thing, so what a gossip pool is, is a set of agents that are connected together and they're basically running an algorithm that’s based on an academic paper published under the name SWIM, which I don't even … It has like infection style scalable … It’s a way for a group of machines to collectively learn about each other and keep a shared understanding of which machines are alive in a cluster. That's the bottom line of what all this stuff means.

There's three really interesting properties that are really useful for a service discovery system like Consul. One, it basically forms a distributed failure detector, so the machines, by participating in this algorithm, can figure out when one of their peers is no longer available. That's sort of an automatic property that you get at is algorithm and it scales really, really well. Unlike a central thing that might have to go check 10,000 machines, by the process of these 10,000 machines participating in this algorithm together, they can learn who's alive and who's not.

There's also a property that it has a broadcast mechanism so you can get information out to all the machines in the cluster really, really quickly and there's a shared list of all the agents so they can kind of know, not necessarily in a consistent fashion but more of an eventually consistent way, who all's in the cluster, who's coming, who's going and there's an anti-entropy process. Eventually, the drift gets cleaned up and they all end up in the same state.

This is used all over Consul. We use it for the client agents to discover where the Consul servers are, so when a server gets replaced, the new one will be broadcasted out to the rest of clusters through this mechanism. It enables this second item here about health checks. We can talk about this more in detail but it enables us to have basically kind of a push model for health check updates. If you imagine you have a cluster with 10,000 agents running and each one of those agents is running potentially hundreds of services that have hundreds of health check statuses, if you imagine a traditional health checking system that has some centralized thing trying to pull 10,000 machines times whatever number of services, it's not going to scale well.

The way Consul solves it is the agents only update the Consul servers when the health check status has changed. If something goes from passing to failing or failing to passing, we'll get an update. That works great. It scales really well. The only problem there is that if you have a node just die, it's not going to tell you that it died after it died so by having a distributed failure detector, you can close that loop and know, well, that notes stopped giving me updates and I learned that this node is down from this algorithm and I can mark the health checks as fail. It lets us scale health checking in a really huge way by exploiting the properties of this system.

Then, finally, the AP events fed into the CP Raft system, this is AP and CP in the sort of cap theorem sense. When the gossip pool learns about an event like a server coming or going or an agent going offline, there's a loose shared understanding that converges over time across the whole cluster but in Consul, when you make a query, you want to have a consistent answer. Is this host there or not? We feed those available and partition tolerant events into our consistent and partition tolerant system via Raft to give a consistent answer at the leader. Whoever the current leader is takes the events from the gossip pool and puts them in the catalog by running them through Raft. It's a very interesting property that we marry those two styles of systems together to get a lot of benefits inside of Consul.

There's a lot of words and it's hard to explain this stuff because there's a lot of subtle things. This is just kind of a animation showing what the gossip pool is doing. These bubbles all represent different Consul agents. In this case, the agent on node A is doing a health check on node X. It sends a probe. It doesn't get an act. It asks its peers to probe it. They send a probe. They didn't hear back. It also tried to send a TCP message to it. It didn't hear back.

There's a lot of detail here and actually, John Curry's lifeguard talk tomorrow will talk about a lot of work we did to make this reliable and work really well even at very large scales but there's kind of a process where there's some extra vetting that goes on to make sure that that note is really there because the consequences of declaring it failed are pretty high but eventually, that process completes and this node gossips out that that node is failed. Now, there's a shared understanding in the cluster that that node X is dead.

There's a node on the end there that didn't get the message, so through the periodic anti-entropy sync process, they compare notes and the loan node realizes that X is gone and reacts. We have published papers. This paper in particular gives details about how we changed the base SWIM paper, like things we did to implement to make that work and then additional things we did on top it’s optimizing for, basically behavior we saw in real-world cloud environments so this is definitely worth a look. Yeah. John Curry's talked, I think, tomorrow is going to cover this in detail.

Applying, clustering, federation, and the gospel concepts into how Consul actually uses them, we'll go through that now. Here's the concept of operations. We have a collection of machines. Each box represents a machine and each Consul color box represents a Consul agent running on that machine. We've got six clients and three servers. They're logically related so we've got them in something called DC1 like data center one. This is a collection of machines that are all related, so the shaded background represents that they're participating in one of these gossip pools all together.

Raft runs. We elect the leader, so that one's the leader, and it begins replicating state up to the servers. We have a highly available setup with three servers. We have a leader. We have clients. What can we do with this? Here's a basic example. This client wants to write the word world into the hello key, so it makes somebody, some application on there makes a request to that client. It forwards that request to just one of the servers. It doesn't know anything about who the leader is. It internally gets forward to leader. It gets written and it gets replicated, so that's just a really simple sort of an example of writing a key.

When you want to request that same key back, same thing. In this case, we're asking the leader to make sure we have a consistent result. There are some different modes that Consul offers to let you read from just any server but you might get a stale result. There's a lot of sort of low-level controls like that but we have a basic data center doing service discovery, all the features of Consul. Let's say you've got another data center you brought up on … The first one DC1 was in California. DC2 is on the East Coast. You replicated all your stuff. You have another set of machines that are independent. They're in their own gossip pool and you want to be able to make requests to get service discovery and key value information from one data center to the other.

You've got two clusters. Now, you federate them, so you create a relationship between the sets of Consul servers in the two data centers and then, once you've done that, then a client can say, “Hey, put this key over on the servers in data center two.” Consul, because the clusters are federated and knows how to route to some server over there, it knows how to send it to the leader so now, by making a simple relationship between your two sets of Consul servers, you can make requests between them. You can discover services over there. You can get at the config information. They're completely interconnected.

The whole point of this talk is to describe all the mechanisms we have to make that work. The end result of all these mechanisms is to have it just work like that. The reason we've had to make more advanced networking models is that it's not always possible to have simple network configuration, so we've learned from our customers and our users that I can't connect all my data centers worldwide in a big mesh. I have sets of things that can't talk to each other or I don't trust this team, so I want them in this data center because I don't want to run more servers but I need to isolate them from everybody else. These are the use cases we'll get into in detail.

We'll start with the first model for clustering, which is LAN Gossip. This is what you get when you join the cluster together to form a data center. It's in the open-source version Consul. It's been there since day one. This is like the bread and butter of what you do. It's basic clustering. The agents are all homogeneous, so they're all treated the same and they're in a full mesh. Service discovery works via DNS and HTTP. The KV store works. There's some interesting nearest-neighbor writing. We'll show some examples of that. What do you use this for? Maybe you've got a web application, so you've got several services there. You've got a database. You've got your Consul servers. You want web, your web service to find the user service or the search service and make requests to it. Super simple. You’re running a client agent on each machine along with your services. You've got your three Consul servers.

Maybe things get more complicated, so maybe you start running multiple instances of each of those services in something like an auto scaling group. If in the first case, you could have gotten away with putting something in a config file, in this case, you really can't because they're coming, going depending on load, so you've got this more complicated setup. Consul, if you were using Consul from the start, your services don't really care. They're just going to do DNS lookups for something like give me search.service.consul and Consul will give you back a healthy one.

Here's another use case. Maybe you just have a huge compute cluster, so you're running Nomad, you've got 10,000 Nomad clients all in one Consul data center, so you have some completely automated thing placing stuff potentially anywhere. Now, you're relying on Nomad’s integration with Consul to register jobs. It's placed on the various machines in the Consul so that they can find each other. These are all basic use cases but they can extend from a simple web app with a handful services to some massive cluster with thousands of worker machines with jobs being dynamically placed and moved and killed on the fly.

As we mentioned, the LAN Gossip, it depends on having a high speed low latency network. This is really designed for good networking that's all available and closed. You wouldn't run LAN Gossip across different continents or even with … You don't want pain times over a few handful of milliseconds kind of thing. Raft also depends on a timely sending of messages between your Consul servers, so there's like a soft real-time aspect implied here but the end, they do you have to be a full mesh. As we saw in the gossip example, the node A chose to probe node X but in general, any node could choose to probe any other node as part of the gossip system. You basically have to have every machine in this cluster open to speak based by default port 8301 on UDP and TCP protocols.

For security, the gossip protocol uses AES and a shared key and then the RPC, so those black lines we saw going from the clients up to the servers and between the servers, that's all secured with TLS. We do have an ACL system so you can protect the state of the servers and set up roles for your different applications and how they can access Consul and 0.9.3 at RPC rate limiter, so you can, as an operator, control the ability of your clients to make requests of the servers. You can set up like a token bucket kind of scheme to limit them. That was a community submission too which was really cool.

To form a cluster, there's a few ways to do it. There's a manual Consul join command and then there's a few different ways to automate it. You can automate it via list of IPs or DNS name and then we've got some newer features that let you find other agents using cloud instance metadata. Once you've joined, we will keep your agent up to date with any other agents that come and go but there is that initial bootstrapping problem of getting … You basically have to join one other agent somewhere to learn about the rest of cluster and then from that point forward, you'll be good.

If you had an agent running and you knew an IP and you're just an operator, you could type Consul join, some IP address, and it'll join. We have support for DNS names. A lot of people will use Terraform to set up Consul servers by hand and then they'll use Terraform to keep a DNS record up to date someplace with where the servers are so you can join against a maintained DNS record. Then, if you're on a cloud provider so DNS looks like this, and if you're on a cloud provider, you can say, “Hey, I'm in AWS.” Just query for some instances that have a Consul tagged with this value.

That looks like this. It'll discover some IPs by doing a query against AWS and then it will join with those. That's super nice because you can even have something like an auto scaling group replacing your Consul servers if they fail and this will just query form and find them so it's super, super nice. Once you've set up a cluster and run Consul numbers, they can all see each other. There's a shared set of nodes. You can access any feature of Consul from any agent running on that cluster. They're all joining together. They're dynamically running this protocol to check each other's health. Everything's good.

One little kind of side note here is because they're randomly probing each other at regular intervals, they actually are taking round-trip time measurements between each other that we can feed into a model and we can use those round-trip times in our queries so the Consul actually takes some answer to that and lets you … You can sort results, say give me anything running this service, this near this node, or give me the closest instance to me, so there's some cool features we get that’s sort of right along as benefits at the gossip mechanism.

We have something called a prepared query and I won't go into the details here but that near agent, so you can … With a query defined like this, you can have … This is the normal lookup. You might say find me cache.service.consul, which gives you some list of IPs back. With the prepared query defined like that, you can say nearest cache and it'll sort it based on round trip time estimate from the agent, so it's pretty cool.

That was our first clustering model, LAN Gossip. The open source personal Consul also includes something called WAN Gossip. This is a mechanism to let you federate different Consul clusters together. Instead of joining all the agents with all the other agents like you do for LAN, this just sets up a relationship between the Consul servers. An important property is the service information and the key value store information is local to each data center, so there's no replication implied in this model. If you have a data center in New York and you have one in San Francisco, they both have independent key value stores, they have independent sets of service registration, they can query between each other so you can you can write and read on either side as much as you want but they are two independent pools of data. We offer some tools to do replication like Consul replicate is the open source tool we have to let you replicate parts of the key value store but there's no built-in replication.

Here's a use case for that where you have a basic geographic redundancy. On the top, I've got a Consul data center in New York and in the bottom, I have one in Amsterdam. I federate them using WAN Gossip, so now I've built a relationship between the servers and I can discover services. If all the instances of some service in New York go down, I can fail over and find the ones in Amsterdam. There's a lot of different ways to use the same in your infrastructure.

Here's another use case. You can imagine this within one physical data center but say I've got one … Maybe I have a team that manages a shared vault cluster that's used among many different teams and then, I put each team in its own Consul data center. This is an isolation use case where I want maybe their reports service to be able to get it, the vault server and I want the payment service to be able to use it but I don't want the reports service to be able to go into the payments database. This is for an isolation use case. Notice here that I had to run a set of Consul servers for each of these different data centers, so they're all still independent clusters. Each one has its own KV store. They're fully independent but by federating them, I can discover where the vault server is from the other two, those types of use cases.

It uses the same SWIM implementation that LAN Gossip use but it's tuned for a low speed, high latency network, so this is designed for geographic world's spanning ping times. It's fine with that. You have to mesh all the servers together, so any server in the federation has to be able to reach any other server on those ports and protocols. The cosmic encryption uses the same shared key design. The RPC uses TLS and each data center is an independent failure domain, so they might drop offline but hopefully nothing in one data center can affect the other Consul-wise. They have independent servers. They have independent Raft setups. They're not replicating data.

There's an interesting feature that's new in the Consul's [inaudible 00:28:48] eight series called soft fail. The initial cut of this, you'd have the information from all servers like let's say you had three different data centers that are federated. You had one in San Francisco, one in New York and one in Amsterdam. In the earlier versions of this, say New York and Amsterdam were having trouble communicating with each other but San Francisco and New York were fine, the flapping between those two could actually affect the ability for San Francisco to talk to New York. We fixed that so now, we actually … We take the information from the WAN Gossip pool about servers being down and feed that in as informational but we don't actually stop using a server unless we're actually getting failing requests.

You can have a problem with some of your members in this pool but it doesn't affect all the other members who may have connectivity because you might have different links that are vastly different performance wise and reliability wise. Soft fail is a super interesting update that we've done recently, if you have older experience with this and then the rate limiter and the ACL story is the same.

To form federations, it's really similar to LAN Gossip, so you can manually join, you can use the cloud stuff. Once you join, you can see all your servers. A new thing is you can see the data centers that are federated with you, so those are different things you can make requests against. I can go ask for what are the services in Amsterdam being on one of the New York servers or agents. I can ask for details about a particular service. I want to find all the redis instances over there.

There's also the ability … You can see in the … There's two sets of addresses, so we have the ability to use for service discovery, you can use one address within a data center and then you can use some other address if you're being reached from outside the data center. A lot of people with hybrid setups, they might have some local stuff, some influence that they run themselves on premises and they're interacting with two different cloud providers or whatever. This supports if you have NAT or VPN type setups, you can usefully discover a service and give it an address from when it's being reached from the outside.

You can use DNS to find things so this is, I'm in New York and I'm trying to find a instance of redis in Amsterdam. KV works in this example. I wrote a key in Amsterdam. I'm trying to read it back from New York. That doesn't work because the New York KV store is independent so I need to fetch it from Amsterdam as well. This is an interesting example, so this is … If I've got a federation and for this use case, we have our two data centers, I can actually create this query and say that if you don't find it in New York, try looking for it in Amsterdam. This is a query template, so it applies to any DNS lookup that's in the .query.consul namespace. By defining something, I had a query called HA, so if I dig for HA redis.query.consul, it finds the ones in New York. Everything's fine.

Say, something bad happens and all those go down in New York, I run the same query again and I get the ones in Amsterdam, so I defined a static list of data centers to failover to and Consul just does that under the hood. The neat thing about this is that query is registered centrally. That's an object that's created in Consul and then applications just use the name and run the query, so that failover logic was defined in one place and it's consumed by any number of applications just doing these DNS lookups and that can be changed at runtime. That can be changed dynamically and it's logic that you don't have to put in all your applications to understand how did you fail over.

Advanced clustering. We looked at a basic clustering model with LAN Gossip. Consul Enterprise adds an advanced clustering feature called Network Segments. It's very similar to how LAN Gossip works but it's applicable when you can't have a full mesh among all your agents. It lets you create different segments within a cluster of things that can talk to each other network wise and you can create them distinct so they don't have to fully mesh. Let's look at some examples of that. In our previous slide, we sort of had within a data center, we were running with a bunch of different clusters just for isolation. We had to run three sets of servers for each one. This is kind of a more cost-effective version of that configuration using Network Segments.

We have one shared set of Consul servers in the default segment and then I've created three other segments for the Vault payments and reports apps and those are all distinct segments. Each segment has to have a relationship with the servers but between the segments, they don't interoperate, so maybe my network rules allow just the traffic to Vault so they can both use Vault but it doesn't allow any other traffic on any other ports between those segments and none of the gossip has to be flowing between those segments. Each of those three and the servers are an independent gossip pool. This is a super common request we got from a lot of large-scale Consul users, is they said, “Well, these are kind of related but we don't really want to run a whole bunch of Consul servers especially for this thing with only three other agents in it.”

There's no way that this thing is going to be allowed to do gossip with all these other servers. By being able to compartmentalize your datacenter into different segments, you can meet your network requirements and not have to run a massive set of Consul servers. Since you're sharing servers, you also can share your KV store more easily, your all-in-one service catalog. For cases where like you have Vault is shared but nothing else is shared, you can mix and match like that. You can have a small number of shared services that are maybe in the default segment but otherwise, the segments are independent network wise.

All really similar to what LAN Gossip requires except that each segment’s on its own port. Encryptions the same story. TLS is the same. The client agents don't have to have any connectivity outside of their segment except to the Consul servers, so they're each in their own isolated pools. When you form clusters with segments, the servers, you basically just configure the list of available segments and assign them port numbers and potentially different interfaces if the server is a multi-homed thing. On the client side, you simply list which segment you're in and then you join the same way as you did for LAN Gossip so it's really easy to use.

When you have a completed cluster, you can list like this is run on a server so we can see across all the segments and then you can isolate your list to just a particular segment. KV works the same. Catalog basically works the same but there is the ability to filter to find a service inside of your segment and we can show an example for that. Using the same prepared query feature, we just set a filter. We say find that service name but find the one that's into the segment for the agent that's making the request, so if I do local DB from the payment segment, I get that DB’s address and if I'm running that from the report segment, I get a different one. The service name is the same but I'm getting the one that skipped to my segment.

Then, finally, this is a similar concept but applied to federation. We have an advanced federation model called Network Areas. This, again, is available in Consul Enterprise. This is a use case where you want to do federations. You want to join different consul data centers together but you can't put all the servers in a full mesh. All the other federation behavior is essentially identical but you have a topology maybe like this. This is a case where you want like a central hub that may be used for management or some centrally managed thing and then you want to run each tenant in a totally independent federated cluster and you don't want any interaction between the tenants.

A super common use case for this is maybe you have stuff something like vault or some shared resource in the hub. Then, in the tenant, you have untrusted things. In use cases we've seen from customers, sometimes people have to put some software from some third party, like maybe their business involves some interface with a bunch of different companies and they have to run some other company's software that's less trusted and they want to use Consul but they want to isolate it and then they don't want different of their customers’ infrastructures to be able to talk to each other at all. This hub-and-spoke type of model works really well for that. Yeah. The tenants can go to the hub but tenants can't talk to each other.

Another super, super common use case for this is if you have a massive geo-distributed thing but not all of your sites can be connected in a full mesh. If you were to do this with WAN Gossip model, you would have to have a connectivity between Amsterdam and Amsterdam and Singapore, maybe you can't have that. This just wasn't possible before and it affects people that are running at a massive global scale. How does this work? It's the same SWIM implementation that WAN Gossip uses, so kind of like the same low expectation timing. It's fine with 80 millisecond ping times, that kind of thing.

You need a full mesh for all the servers in a given area but the area's get defined as basically pairs of data center. We'll show an example. The cool thing is RPC and gossip just use TLS. You get rid of the managing the idea of a shared encryption key, the worldwide, you just use TLS for everything. To form these, you basically define an area on each side. In New York, I create an area with Amsterdam. In Amsterdam, I create an area with New York. Then, I'd join them up. Then, once the relationship is established and they're connected up, I can see the servers from both sides from either side.

Once it's set up, it works just like WAN Gossip. You can make those cross data center requests, like all the stuff just feels the same. You are using TLS for everything so you don't have to deal with gossip keys which is super nice. It has the same soft fail behavior, so a problem in one area won't affect a problem in another geographic area. Then, a wrap-up example. Because we have those round-trip time estimates, instead of listing the alternate data centers, you can say you can create a query that says just try the next two closest ones and Consul will automatically failover for you. The same example as before, New York, everything goes down, it figured out to use an Amsterdam but they did that based on round-trip time estimates. You didn't have to configure that. If you added a new data center that was better, it would just start using it or if one went offline, it'll pick the next best one.

In conclusion, our existing tagline added on any network topology. This stuff came out of these four models, came out of experience with lots of real world use of Consul. We think we've got pretty good coverage for small teams, companies trying to compartmentalize things, companies who are still running things together but want a compartmentalized network-wise, massive global distributed CDN stuff, we've become accommodating all these different use cases, in a set of models that although are different feel the same way to the user who's using Consul, they can all be used simultaneously so you can incrementally adopt these. You don't have to move everything worldwide over to one, and its exploits the gossip properties that we have in interesting ways and the gossip properties apply across all the different models. That's it. Thank you.

More resources like this one

3/15/2023Case Study

Using Consul Dataplane on Kubernetes to implement service mesh at an Adfinis client

1/20/2023FAQ

Introduction to Zero Trust Security

1/4/2023Presentation

A New Architecture for Simplified Service Mesh Deployments in Consul

12/31/2022Presentation

Canary Deployments with Consul Service Mesh on K8s