Multi-Cluster Service Discovery in Kubernetes and Service Mesh Engineering
| July 7, 2020

In this blog series, we will dig into specific challenge areas for multi-cluster Kubernetes and service mesh architecture, considerations and approaches in solving them. For our first post, we’ll focus on service discovery and how we need to approach it for multi-cluster environments. 

What is Service Discovery 

Service discovery is the backbone of distributed systems and microservices architecture to understand what services exist in an environment so they can be connected to each other. This is important because these services are dynamic, unlike traditional software which had static network locations tied to physical hardware. Microservices are constantly changing; being updated, scaled up and down, redeployed and the services need a way to “discover and connect” with each other at all times. 

This is true for Kubernetes environments with application pods being deployed and redeployed to different nodes across a cluster. Kubernetes has a built-in service discovery mechanism called “Kubernetes Services” that enables clients to talk to a Virtual IP and get correctly routed at run time to a pod selected by that service. In this model, the backing services could be coming and going, but the client just talks to the cluster virtual IP for that service. Service mesh takes this a step further and leverages this service discovery to build more sophisticated health checking, request-level load balancing, and fallback policies. Outside of Kubernetes, one may use Zookeeper, Consul, or Netflix OSS Eureka to provide a backing registry on which to build service discovery. However, what happens when you try to discover services in different clusters, or more importantly, different networks?


Challenges of Service Discovery Across Networks

Things start to get more complicated as you think about how to architect service discovery to work across clusters or networks. How does a service in one network find services in another network? Is there a single, global service registry? Do we replicate services between boundaries? Do we try to share a single network space across everything?

There needs to be an intelligent way to expose the services and be able to locate the service across N number of different clusters in order to facilitate the necessary connection and do so in a way that’s trusted by both ends of the connection. 

Specific challenges that arise for multi-cluster patterns:

  • Service to service relationships: traversing across boundaries
  • Traffic policies: manually applied per cluster vs. global 
  • Fault domains: trust and tolerance 
  • Topology changes 


Service Mesh Hub for multi-network service discovery

A service mesh enables service-to-service communication and typically plugs in with a service discovery mechanism for a specific cluster or network. But it doesn’t automatically solve the multi-network problem. A service mesh can route to services outside of its network but how does it know which services exist, and to where to route them?

That’s where Service Mesh Hub, an open-source service-mesh management plane, helps. 

Service Mesh Hub can register clusters (and non Kubernetes workloads) and build a global registry across networks. It can then orchestrate each mesh (potentially deployed 1:1 with a cluster or network) by updating it with vital cross-network service discovery information. For example, with Istio, a very popular service mesh implementation built on Envoy Proxy, Service Mesh Hub can discover which services run on what clusters and build that information for each Istio control plane (istiod) by creating ServiceEntry resources that point to services in other clusters. That gives a single Istio mesh awareness of services running in other clusters. Service Mesh Hub can do this for all of the Istio control planes across multiple networks and create the illusion of a “VirtualMesh” where all of the meshes are working on concert together to provide routing and service discovery across any networks.

With this automated service-discovery across multiple networks and clusters, Service Mesh Hub can be used to build things like global priority failover, multi-cluster traffic routing policies, and access control. In Service Mesh Hub, this is done with the following:

  • Operators register their clusters/meshes with Service Mesh Hub
  • Service Mesh Hub begins service discovery and builds a master index of services running in which clusters/networks
  • Service Mesh Hub automates building of the service-discovery entry on each mesh control plane (ie, through replicating Kubernetes Services or mesh-specific resources like ServiceEntry in Istio)
  • Services within a single mesh can use this information locally to route to services outside of its network
  • Users can build failover, routing, and access control policies on top of this service-discovery model


Get Started

Service Mesh Hub was updated and open sourced in May and has recently started community meetings to expand the conversation around service mesh. We invite you to check out the project and join the community. also offers enterprise support for Istio service mesh for those looking to operationalize service mesh environments, request a meeting to learn more here

Back to Blog