Skip to main content

Seamlessly migrate from Consul service discovery to service mesh

Try this example method for transitioning from Consul service discovery to service mesh without affecting uptimes or development teams.

Migrating from HashiCorp Consul service discovery to service mesh is a smart move for platform teams looking to boost their applications’ security, observability, and availability, all without requiring modifications from their development teams. This blog post will briefly introduce you to the advantages of moving to a service mesh and provide a step-by-step, no-downtime migration guide.

»Service mesh benefits

Why is a service mesh migration worth your time? Here are some of the benefits:

»Advance security

Service mesh allows teams to quickly enforce zero trust security principles using mTLS on all east/west traffic, significantly reducing the risk of unauthorized access and data breaches. Platform teams can enable an application’s existing service discovery DNS lookup to allow for both HTTP and mTLS connections. This allows all applications to transition to using mTLS connections without impacting any of their dependent services (such as downstreams).

»Enhance observability

It also provides application teams with new capabilities such as distributed tracing and data plane metrics. Distributed tracing acts like a GPS tracking system for each request, providing detailed insights into its journey across services, and helping quickly pinpoint bottlenecks and performance issues. Data plane metrics offer real-time insights into traffic flows between microservices that include requests per second, error rates, and advanced L7 features such as retries. These insights can improve decision-making and lead to higher application availability.

»Increase resilience

Service mesh improves application availability by automatically handling retries, rate limiting, circuit breaking, and timeouts, helping to ensure that services remain accessible and performant, even under adverse conditions. Applications in a service mesh use traffic splitting for blue/green or canary deployments to reduce risks associated with updates and new releases.

»Improve multi-tenancy scalability

If you need to give users self-service capabilities in multi-tenant environments or meet higher compliance requirements, consider upgrading to Consul Enterprise. With the power to manage their own namespaces or even entire service meshes, Consul Enterprise gives teams the autonomy to innovate and streamline operations. It ensures team isolation, enabling the safe management of application deployments and resilience strategies.

Beyond operational agility, Consul Enterprise empowers teams to comply with rigorous regulations, by offering L3/L4 networking control over service mesh connections, FIPS 140–2 compliance, and full audit logs. This enhanced level of governance and flexibility allows teams to fine-tune their service ecosystems to meet specific operational demands and compliance needs.

»Migration to service mesh

Now that we’ve explored the top reasons to switch to Consul service mesh, it’s time to walk through the migration, step by step.

We’ll begin with an overview of the Amazon EKS cluster and the Consul components that will be deployed. In this guide, the Consul server and example services will be deployed on the same EKS cluster for simplicity. However, the principles and steps outlined are also relevant for environments utilizing virtual machines or combination of platforms. The EKS cluster in this guide will run a legacy api service, using only service discovery, and the new mesh-enabled web service that is accessible only through the Consul API gateway. The diagram below shows the initial environment that will be setup:

Amazon EKS cluster with Consul installed. The web service is located within the service mesh using an Envoy proxy to direct requests to the api service. api is outside the mesh, has no proxy, and relies solely on service discovery.

Amazon EKS cluster with Consul installed. The web service is located within the service mesh using an Envoy proxy to direct requests to the api service. api is outside the mesh, has no proxy, and relies solely on service discovery.

To streamline the initial setup, the following key steps are condensed into bullet points, with detailed step-by-step instructions available in the README.md for this project’s GitHub repo.

  • Provision infrastructure: Use HashiCorp Terraform to provision an AWS VPC and EKS cluster. This includes cloning the repository, initializing Terraform, and applying infrastructure as code to set up the environment.
  • Connect to EKS: Update the kubeconfig with the EKS cluster details using the AWS CLI and set up convenient kubectl aliases for easier management.
  • Install the AWS LB controller: Set up the AWS load balancer controller to map internal Network load balancers or Application load balancers to Kubernetes services. The Consul Helm chart will use AWS LB annotations to properly set up internally routable mesh gateways and make the Consul UI externally available.
  • Install Consul Helm chart: Deploy the example Consul Helm chart values enabling the following components:
    • TLS: Enables TLS across the cluster to verify the authenticity of the Consul servers and clients
    • Access Control Lists: Automatically manage ACL tokens and policies for all of Consul
    • connect-inject: Configures Consul’s automatic service mesh sidecar injector
    • api-gateway: Enables the Consul API gateway and manages it with Kubernetes Gateway API CRDs
    • sync-catalog: A process that syncs Kubernetes services to Consul for service discovery
    • cni: Facilitates service mesh traffic redirection without requiring CAP_NET_ADMIN privileges for Kubernetes pods
    • metrics: Exposes Prometheus metrics for Consul servers, gateways, and Envoy sidecars
  • Setup DNS forwarding in EKS: Configure DNS forwarding within EKS to allow service discovery via Consul.
  • Deploy service using Consul service discovery: Deploy api and Kubernetes catalog-sync to automatically register the service with Consul the same way VMs register services using Consul agents.
  • Deploy service using Consul service mesh: Deploy the service web into the mesh. Mesh-enabled services aren’t available externally without a special ingress or API gateway allowing the traffic. Set up the Consul API gatewaywith a route to web so it's accessible from the browser.

The steps above complete the initial setup. Consul is installed on EKS, and the web service is operational within the service mesh, directing requests to the api service outside the mesh, which utilizes service discovery exclusively. The Consul API gateway has been set up with routes to enable external requests to web. Run the command below to retrieve the URL for the Consul API gateway and store it in a variable for future use. The external address may take a couple minutes to propagate, so be patient.

export APIGW_URL=$(kubectl get services --namespace=consul api-gateway -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
nslookup ${APIGW_URL}

Once the gateway is resolvable, use the generated URL below to access web and verify the initial environment is working as expected.

echo "http://${APIGW_URL}/ui"
The image above shows the expected response: web is within the mesh accessing api.service.consul, which is located outside the mesh. Traffic between web and api is HTTP and unencrypted.

The image above shows the expected response: web is within the mesh accessing api.service.consul, which is located outside the mesh. Traffic between web and api is HTTP and unencrypted.

Now it’s time to migrate api into the service mesh:

»Migrate services into the service mesh

To smoothly migrate services into the service mesh, we'll follow a clear, three-step approach:

  • Step 1: Enable permissive mode
  • Step 2: Enforce mTLS
  • Step 3: Use virtual services

»Step 1: Enable permissive mode

To begin, you need to migrate api into the mesh. It’s crucial that HTTP requests to api.service.consul continue to function for downstream services not in the service mesh, while services within the mesh use mTLS for secure communication.

The first step is implementing permissive MutualTLSMode for api, allowing it to accept both HTTP and mTLS connections.

The api service shown in the diagram, is now in the service mesh and uses an Envoy proxy set to permissive mode, which allows it to support both HTTP and mTLS traffic.

The api service shown in the diagram, is now in the service mesh and uses an Envoy proxy set to permissive mode, which allows it to support both HTTP and mTLS traffic.

To enable permissive MutualTLSMode, the api service defaults need to configure MutualTLSMode to permissive. Here’s an example for ServiceDefaults:

apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceDefaults
metadata:
  name: api
  namespace: api
spec:
  protocol: http
  mutualTLSMode: "permissive"

Create a new deployment for api enabling service mesh and apply these new ServiceDefaults to enable permissive mode:

kubectl apply -f api/permissive_mTLS_mode/init-consul-config/servicedefaults-permissive.yaml
kubectl apply -f api/permissive_mTLS_mode/api-v2-mesh-enabled.yaml

Refresh the browser a few times and watch how the same requests from web to api.service.consul are routed to both api (non-mesh) and api (mesh) deployments. Consul uses a weighted round-robin load balancing algorithm by default to distribute requests from web across both api deployments.

web sends requests to api.service.consul evenly to the mesh and non-mesh api deployments.  web sees no difference between these two api deployments.

web sends requests to api.service.consul evenly to the mesh and non-mesh api deployments.  web sees no difference between these two api deployments.

After verifying the api (mesh) deployment is working with the original DNS lookup api.service.consul, remove the original api (non-mesh) deployment:

kubectl -n api delete deployment api-v1

Newly onboarded services can run in permissive mode while other downstream and upstream services are migrated to the service mesh in any order. This ensures a smooth transition for all services.

Services can be onboarded to the mesh upon their next release using an annotation or by enabling the entire namespace, which doesn’t require changes from the development team. While in permissive mode, requests to the original service discovery name api.service.consul will be over HTTP. Verify this by sniffing the incoming traffic to the api pod while refreshing the browser to generate traffic:

kubectl debug -it -n api $(kubectl -n api get pods --output jsonpath='{.items[0].metadata.name}') --target consul-dataplane --image nicolaka/netshoot -- tcpdump -i eth0 src port 9091 -A
 
 
Targeting container "consul-dataplane". If you don't see processes from this container it may be because the container runtime doesn't support this feature.
Defaulting debug container name to debugger-v77g6.
If you don't see a command prompt, try pressing enter.
{
 "name": "api (mesh)",
 "uri": "/",
 "type": "HTTP",
 "ip_addresses": [
   "10.15.3.183"
 ],
 "start_time": "2024-02-16T19:46:35.805652",
 "end_time": "2024-02-16T19:46:35.827025",
 "duration": "21.372186ms",
 "body": "API response",
 "code": 200
}
 

Follow these steps to migrate all downstream and upstream services into the service mesh without impacting service availability or development teams.

»Step 2: Enforce mTLS

After migrating all dependent downstream services into the mesh, disable permissive mode and start enforcing secure mTLS connections for all requests to api. To avoid any downstream service changes or disruptions, configure the service mesh to properly handle the original DNS lookups, so web can continue making requests to api.service.consul.

The api service shown in the diagram is switched to strict mutualTLSMode with dialedDirectly enabled. This enforces mTLS for all existing requests to api.service.consul.

The api service shown in the diagram is switched to strict mutualTLSMode with dialedDirectly enabled. This enforces mTLS for all existing requests to api.service.consul.

During this step, switch api from permissive to strict mutualTLSMode to enforce mTLS for all requests. To ensure downstream services, such as webusing api.service.consul, aren’t impacted, set the dialedDirectly transparent proxy mode. This action enables a TCP passthrough on the api service’s Envoy sidecar proxy. This enforces mTLS on requests going to the api pod IP. This means requests for api.service.consul will be routed to the api pod IP where the proxy is now listening and enforcing mTLS. These two settings can be updated while the api service is running.

To enable strict MutualTLSMode and dialedDirectly, update the api ServiceDefaults.

kubectl apply -f ./api/permissive_mTLS_mode/init-consul-config/intention-api.yaml
kubectl apply -f ./api/permissive_mTLS_mode/servicedefaults-strict-dialedDirect.yaml.enable

Note: Before enabling strict mutualTLSMode, a service intention is created first to ensure web is authorized to make requests to api.

Now all requests to api.service.consul are being encrypted with mTLS:

kubectl debug -it -n api $(kubectl -n api get pods --output jsonpath='{.items[0].metadata.name}') --target consul-dataplane --image nicolaka/netshoot -- tcpdump -i eth0 src port 9091 -A
 
 
 
Targeting container "consul-dataplane". If you don't see processes from this container it may be because the container runtime doesn't support this feature.
Defaulting debug container name to debugger-g669d.
If you don't see a command prompt, try pressing enter.
20:18:34.047169 IP api-v2-b45bf7655-9kshs.9091 > 10-15-3-175.web.web.svc.cluster.local.43512: Flags [P.], seq 148:626, ack 3559, win 462, options [nop,nop,TS val 3923183901 ecr 2279397636], length 478
E....;@....'
...
...#.....k.f4m............
.. ...................6.@nW.S._r"h....m.@;U....WyY........h........m......q.B.......N.Y}.F.A.{y..^..........]..@0.zv">Y#.....6.n.z..Oh.6.p..G.....9...@0.zv.y.......#U.......h.o..w6.....`.\......*...N..u.".U...`\.;....M..=.....$..,....e...T`.I/.a.z.$;...c........z..Y..q...W.."...........%.*...
.3..Y/.....a..R(..6..0...Ka`.GIt._.Dn...N......L k..j...ch.7)'......m/........3....t."....r..4|t7..Q..vfs.....I..*..|..4m%......c..!w7u..s.......t.,.....EF7....Bd...P..........E....h..3;n..........+.
 

Congratulations! You have successfully migrated an existing service into the Consul service mesh and enforced mTLS without requiring any changes from development.

»Step 3: Use virtual services

For development teams to take full advantage of the L7 traffic capabilities such as retries, rate limits, timeouts, circuit breakers, and traffic splitters, they will want to start using virtual services. For example, web would stop making requests to api.service.consul and start using api.virtual.consul.

Once web is updated to use the virtual address, it will have immediate access to all L7 traffic routing rules applied to api. These capabilities provide huge improvements in service availability that any development team will appreciate, and they can make this change at their convenience. Here’s how:

Deploy web-v2, which has been updated to use api.virtual.consul. Refresh the browser until you see requests from web-v2 route to the new virtual address (you may need to clear the cache). Once validated, delete web-v1 to ensure all requests use the new virtual address:

kubectl apply -f api/permissive_mTLS_mode/web-v2-virtualaddress.yaml.enable
kubectl -n web delete deploy/web-v1
The web-v2 deployment was updated to use the new virtual address api.virtual.consul to make upstream requests to api.

The web-v2 deployment was updated to use the new virtual address api.virtual.consul to make upstream requests to api.

web is now making requests to api.virtual.consul. That means api can now create traffic splitters to support canary deployments, or retries to improve availability, and web will automatically apply them with every request to api. Once all downstream services are using the virtual address, disable dialedDirectly for api to ensure L7 traffic patterns are being applied to all future requests (included in the ServiceDefaults recommendation example below).

»Additional security recommendations

Following the migration, there are several ways to further secure your service mesh. First, remove the MutualTLSMode line from the service defaults for each service. This will enforce the strict mode and reduce misconfiguration risks for a critical security setting:

apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceDefaults
metadata:
  name: api
  namespace: api
spec:
  protocol: http
  #mutualTLSMode: "strict"
  transparentProxy: 
    #dialedDirectly: true

Next, disable the EnablingPermissiveMutualTLS mode mesh-wide so no services can enable permissive mode in the future and bypass mTLS.

Note: If services were already able to set their MutualTLSMode=permissive, this mesh-wide setting will not override those services already running in permissive mode because doing so could impact service availability. Those services must first remove permissive MutualTLSMode, as recommended above:

apiVersion: consul.hashicorp.com/v1alpha1
kind: Mesh
metadata:
  name: mesh
  namespace: consul
spec:
  #allowEnablingPermissiveMutualTLS: true

Additionally, secure the mesh by setting meshDestinationsOnly: true to restrict any service from making external requests. A terminating gateway would now be required to authorize all external requests:

apiVersion: consul.hashicorp.com/v1alpha1
kind: Mesh
metadata:
  name: mesh
  namespace: consul
spec:
  #allowEnablingPermissiveMutualTLS: true
  transparentProxy:
    meshDestinationsOnly: true

Apply these additional security recommendations using the following commands:

kubectl apply -f api/permissive_mTLS_mode/init-consul-config/servicedefaults-std.yaml.enable
kubectl apply -f web/init-consul-config/mesh-secure.yaml.enable

»Recap

Transitioning from Consul service discovery to service mesh brings immediate enhancements in zero trust security and observability. By following the three-step approach described in this blog post, platform teams can smoothly transition to service mesh without modifying current application configurations. This approach benefits organizations that have numerous development teams, operate in silos, or face communication hurdles.

Initially, enabling permissive MutualTLSMode allows services to support both HTTP and mTLS connections, ensuring accessibility across mesh and non-mesh services. Subsequently, enforcing mTLS secures all traffic with encryption, and setting dialedDirectly supports all existing requests using Consul DNS. Finally, adopting virtual services unlocks advanced Layer 7 traffic-management features, allowing developers to enhance service reliability at their own pace by simply updating request strings from service to virtual.

As your service mesh and multi-tenant ecosystem grow, you might encounter increasing demands for self-service options and higher compliance standards. Learn how Consul Enterprise extends the foundational capabilities of Consul with enhanced governance, multi-tenant support, and operational agility, ensuring organizations can meet the demands of complex service ecosystems and regulatory standards with ease.

Sign up for the latest HashiCorp news

By submitting this form, you acknowledge and agree that HashiCorp will process your personal information in accordance with the Privacy Policy.