Handling Service to Service Failover at the Edge or Service Mesh

In this blog series, we will dig into specific challenge areas for multi-cluster Kubernetes and service mesh architecture, considerations and approaches in solving them.

The previous blog post covered Multi-Cluster Service Mesh Failover and Fallback Routing as failover routing is one of the many reasons organizations look to adopt service mesh in addition to service to service communication, tracing, and more.

However service to service failover routing across multiple clusters can also be accomplished without a service mesh, and instead using API/Edge gateway.

This blog post will demonstrate and compare both approaches to achieving service to service failover to help you choose the method best suited to your use case and requirements.

Technologies included in this blog post include:

  • Gloo, a next generation API / Edge Gateway built with Envoy Proxy
  • Gloo Federation, Gloo’s multi-cluster configuration and traffic management capability
  • Service Mesh Hub, a multi-cluster service mesh control plane


Service to Service Failover without Service Mesh

Let’s say, you don’t need to secure Service to Service communications inside your cluster (because it’s done at the application level, for example), but you need this Service to Service failover capability.

Gloo Federation can support this use case with federation of multiple API gateways.

Gloo is a feature rich gateway with function-level routing, support for legacy apps, microservices and serverless, discovery capabilities, tight integration with leading open-source projects, and more. Gloo is uniquely designed to support hybrid applications, in which multiple technologies, architectures, protocols, and clouds can coexist.

Gloo Federation enables you to configure and manage multiple Gloo instances in multiple Kubernetes clusters and create Failover rules across them in case a Service becomes unavailable in the local cluster.

In this scenario, Gloo is used to secure the Edge on the first cluster and send the request to the right Service in the same cluster, depending on the configuration you specified.

You can create a FailoverScheme to determine the Service you want to use on a remote cluster when the corresponding local Service becomes unavailable:

kubectl apply --context kind-kind1 -f - <

I configured the service to reply blue-pod on the local cluster and green-pod on the remote one.

If I send a request to the local cluster, I get the expected output:


Now, let’s scale down the service in the local cluster:

kubectl scale deploy/echo-blue --replicas=0

If I send a request to the local cluster again, I get the following output:


As you can see, the request have been automatically sent by Gloo on the local cluster to the other Gloo instance running on the remote cluster and as a user I didn’t experience any downtime.

And I can even easily see in the Gloo Federation UI for which Services I have a Failover configured and I did not need a service mesh to achieve this.


Service to Service Failover with Service Mesh

At Solo.io, we are obviously big fans of service mesh and believe that it is the future of new application development as microservices and Kubernetes patterns grow in adoption. There are many good reasons for using service mesh for your applications and in a previous blog, I explained how to set up the service to service failover use case between two Istio clusters using Service Mesh Hub.

But can we combine the two approaches ?

The best of both worlds

What if I want to secure the Edge with Gloo (to get things like external authentication, WAF, OPA, advanced transformations, …),  secure internal Service to Service communication with a Service Mesh (like Istio, AppMesh, …), and  also want to implement service to service failover?

Good news! You can do that ?


In a nominal situation, the requests coming from your end users would go through the Gloo Gateway and the Gloo Gateway will reach the productpage service (using mTLS). Then, the productpage service will reach the reviews service using standard Istio routing.


If the local reviews service becomes unavailable, the Failover configured by Service Mesh Hub allows the local productpage service to reach seamlessly the remote reviews service (through the Istio Ingress Gateway of the remote cluster).


If the local productpage service becomes unavailable, the Failover configured by Gloo Federation allows the local Gloo instance to reach seamlessly the remote productpage service (through the Gloo Gateway of the remote cluster). Then, the productpage service will reach the reviews service using standard Istio routing.


Get Started

Solo.io offers a variety of solutions for service mesh environments including: Service Mesh Hub, Enterprise Subscription for Istio, and a Developer Portal for Istio. Gloo API Gateway is available in open source and enterprise editions with Federation available in enterprise.