Multi-Cluster Service Discovery in Kubernetes and Service Mesh


In this blog series, we will dig into specific challenge areas for multi-cluster Kubernetes and service mesh architecture, considerations and approaches in solving them. For our first post, we’ll focus on service discovery and how we need to approach Kubernetes for multi-cluster environments. 

What is Kubernetes Service Discovery? 

Service discovery is the backbone of distributed systems and microservices architecture to understand what services exist in an environment so they can be connected to each other. This is important because these services are dynamic, unlike traditional software which had static network locations tied to physical hardware. Microservices are constantly changing; being updated, scaled up and down, redeployed and the services need a way to “discover and connect” with each other at all times. 

This is true for Kubernetes environments with application pods being deployed and redeployed to different nodes across a cluster. Kubernetes has a built-in service discovery mechanism called “Kubernetes Services” that enables clients to talk to a Virtual IP and get correctly routed at run time to a pod selected by that service. In this Kubernetes service discovery model, the backing services could be coming and going, but the client just talks to the cluster virtual IP for that service. A Kubernetes service mesh takes this a step further and leverages this service discovery to build more sophisticated health checking, request-level load balancing, and fallback policies. Outside of Kubernetes, one may use Zookeeper, Consul, or Netflix OSS Eureka to provide a backing registry on which to build service discovery. However, what happens when you try to discover Kubernetes services in different clusters, or more importantly, different networks?


Challenges of Kubernetes Service Discovery Across Networks

Things start to get more complicated as you think about how to architect service discovery to work across clusters or networks. How does a service in one network find services in another network? Is there a single, global service registry? Do we replicate services between boundaries? Do we try to share a single network space across everything?

There needs to be an intelligent way to expose the services and be able to locate the service across N number of different clusters in order to facilitate the necessary connection and do so in a way that’s trusted by both ends of the connection. 

Specific challenges that arise for multi-cluster patterns:

  • Service to service relationships: traversing across boundaries
  • Traffic policies: manually applied per cluster vs. global 
  • Fault domains: trust and tolerance 
  • Topology changes 


Gloo Mesh for multi-network service discovery

A service mesh enables service-to-service communication and typically plugs in with a service discovery mechanism for a specific cluster or network. But a service mesh doesn’t automatically solve the multi-network problem. A service mesh can route to services outside of its network, but how does it know which services exist and to where to route them?

That’s where Gloo Mesh, an open-source service-mesh management plane, helps. 

Gloo Mesh can register clusters (and non Kubernetes workloads) and build a global registry across networks. Gloo Mesh can then orchestrate each mesh (potentially deployed 1:1 with a cluster or network) by updating the mesh with vital cross-network service discovery information. For example, with Istio, a very popular service mesh implementation built on Envoy Proxy, Gloo Mesh can discover which services run on what clusters and build that information for each Istio control plane (istiod) by creating ServiceEntry resources that point to services in other clusters. That gives a single Istio mesh awareness of services running in other clusters. Gloo Mesh can do this for all of the Istio control planes across multiple networks and create the illusion of a “VirtualMesh” where all of the meshes are working on concert together to provide routing and service discovery across any networks.

Gloo Mesh diagram

With this automated service-discovery across multiple networks and clusters, Gloo Mesh can be used to build things like global priority failover, multi-cluster traffic routing policies, and access control. In Gloo Mesh, this is done with the following:

  • Operators register their clusters/meshes with Gloo Mesh
  • Gloo Mesh begins service discovery and builds a master index of microservices running in which clusters/networks
  • Gloo Mesh automates building of the service-discovery entry on each mesh control plane (ie, through replicating Kubernetes Services or mesh-specific resources like ServiceEntry in Istio)
  • Microservices within a single mesh can use this information locally to route to services outside of its network
  • Users can build failover, routing, and access control policies on top of this service-discovery model


Get Started with Gloo Mesh

We invite you to check out the project and join the community. also offers enterprise support for Istio service mesh for those looking to operationalize service mesh environments, request a meeting to learn more here