Cross-cluster service communication with service mesh

In this blog series, we will dig into specific challenge areas for multi-cluster Kubernetes and service mesh architecture, considerations and approaches in solving them.

The previous blog post covered Identity Federation for Multi-Cluster Kubernetes and Service Mesh which is the foundation for cross-cluster service communication.

Istio is the most popular Service Mesh technology and is designed to handle multi-cluster traffic.

First of all, let’s have a look a the different multi-cluster deployment models available with Istio.

Istio multi-cluster deployment models

There are 2 ways to configure Istio to allow services running on different clusters to communicate together.

  • The shared control plane approach

You need a flat network between the clusters, but even if you meet this requirement, I wouldn’t recommend this approach for availability reasons.

What happens if the cluster where the control plane is running become unavailable ?

  • The replicated control planes approach

In that case, there’s a control plane on each cluster, so there’s no availability concerns.

And there’s no specific networking requirements. All you need is to be able to access the Istio Ingress Gateway of the other cluster(s).

But, it’s far more complex to configure and maintain.

We assume you read the previous Blog post and understand why it’s important to setup each Istio cluster with a different trust domain and how to federate the identity of the different clusters.

Now that each service has a unique identity across all the Istio clusters and that all the Istio clusters have a common root cert, we still need to solve several other challenges:

  • how does one cluster knows about the services running on the other clusters ?
  • how can a service in one cluster reach a service in another cluster ?

Cross-cluster service communication the hard way

We start with the following deployment:

Istio has been deployed on each cluster and we want to allow the productpage service on the first cluster to send requests to the reviews service on both clusters.

We’ll do it step by step.

ServiceEntry on the first cluster

First of all, the first cluster isn’t aware of the reviews service running on the second cluster;

So, we need to create a ServiceEntry to define it and to tell the first cluster how to reach it:

apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: reviews.default.svc.kind3.global
  namespace: istio-system
spec:
  addresses:
  - 253.124.25.94
  endpoints:
  - address: 172.18.0.230
    labels:
      cluster: cluster2
    ports:
      http: 15443
  hosts:
  - reviews.default.svc.cluster2.global
  location: MESH_INTERNAL
  ports:
  - name: http
    number: 9080
    protocol: TCP
  resolution: DNS

The address `253.124.25.94` must be unique, so you need to manually track which IP addresses you use for each ServiceEntry.

The endpoint corresponds to the Istio Ingress Gateway of the second cluster (generally exposed with a Kubernetes Service Type Load Balancer). Note the label we set as we’ll use it later to target this ServiceEntry from the DestinationRule we’re going to create after.

Finally, the host value `reviews.default.svc.cluster2.global` needs to end by the .global suffix to tell

To understand why we use this suffix, we need to have a look a the coredns config of Istio:

.:53 {
 errors
 health
 
 # Removed support for the proxy plugin: https://coredns.io/2019/03/03/coredns-1.4.0-release/
 grpc global 127.0.0.1:8053
 forward . /etc/resolv.conf {
  except global
 }
 
 prometheus :9153
 cache 30
 reload
}

The grpc global 127.0.0.1:8053 forwards all requests to *.global to the istio-coredns plugin which is listening on port 8053.

DestinationRule on the first cluster

The next step is to create a DestinationRule to define the different subsets:

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: reviews.default.svc.cluster2.global
  namespace: istio-system
spec:
  host: reviews.default.svc.cluster2.global
  subsets:
  - labels:
      cluster: cluster2
    name: version-v3
  - labels:
      cluster: cluster2
    name: version-v1
  - labels:
      cluster: cluster2
    name: version-v2
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL

You can see that we use the same FQDN we have defined above (reviews.default.svc.cluster2.global) and we create one subset for each version of the reviews service running on the second cluster.

And the label cluster: cluster2 allows us to target the ServiceEntry we created in the previous step.

VirtualService on the first cluster

Now, we can create a VirtualService to define how we want the traffic to be spread across the 2 clusters.

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: reviews
  namespace: default
spec:
  hosts:
  - reviews.default.svc.cluster.local
  http:
  - route:
    - destination:
        host: reviews.default.svc.cluster2.global
        subset: version-v3
      weight: 75
    - destination:
        host: reviews.default.svc.cluster.local
        subset: version-v1
      weight: 15
    - destination:
        host: reviews.default.svc.cluster.local
        subset: version-v2
      weight: 10

With this VirtualService, we tell Istio we want the requests for `reviews.default.svc.cluster.local` to be sent to:

  • the version 3 of the reviews service running on the second cluster 75% of the time
  • the version 1 of the reviews service running on the local cluster 15% of the time
  • the version 2 of the reviews service running on the local cluster 10% of the time

Note that we use the subsets we have defined in our DestinationRule.

EnvoyFilter on the second cluster

With all the Istio CRDs we have created on the first cluster, we are now able to send traffic to the second cluster, but we still need to configure the second cluster to tell him how to manage this incoming traffic;

Let’s start with the EnvoyFilter:

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: cross-cluster-traffic
  namespace: istio-system
spec:
  configPatches:
  - applyTo: NETWORK_FILTER
    match:
      context: GATEWAY
      listener:
        filterChain:
          filter:
            name: envoy.filters.network.sni_cluster
        portNumber: 15443
    patch:
      operation: INSERT_AFTER
      value:
        name: envoy.filters.network.tcp_cluster_rewrite
        typed_config:
          '@type': type.googleapis.com/istio.envoy.config.filter.network.tcp_cluster_rewrite.v2alpha1.TcpClusterRewrite
          cluster_pattern: \.cluster2.global$
          cluster_replacement: .cluster.local
  workloadSelector:
    labels:
      istio: ingressgateway

We modify the configuration of Envoy running in the Istio Ingress Gateway to replace the .cluster2.global suffix by .cluster.local to handle the traffic like if it was coming from a local service.

DestinationRule on the second cluster

We still need to create a DestinationRule to define our subsets on this cluster:

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: reviews
  namespace: default
spec:
  host: reviews.default.svc.cluster.local
  subsets:
  - labels:
      version: v3
    name: version-v3
  - labels:
      version: v1
    name: version-v1
  - labels:
      version: v2
    name: version-v2
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL

You can see that we now use the label version: v* in each subset to target the Pods that have this label.

We’re done !

So, configuring cross-cluster service communication manually is doable, but complex, error-prone and not very scalable.

Cross-cluster service communication made easy

All we’ve done manually above can be done by creating a single Service Mesh Hub CRD:

apiVersion: networking.smh.solo.io/v1alpha2
kind: TrafficPolicy
metadata:
  namespace: service-mesh-hub
  name: simple
spec:
  destinationSelector:
  - kubeServiceRefs:
      services:
        - clusterName: cluster1
          name: reviews
          namespace: default
  trafficShift:
    destinations:
      - kubeService:
          clusterName: cluster2
          name: reviews
          namespace: default
          subset:
            version: v3
        weight: 75
      - kubeService:
          clusterName: cluster1
          name: reviews
          namespace: default
          subset:
            version: v1
        weight: 15
      - kubeService:
          clusterName: cluster1
          name: reviews
          namespace: default
          subset:
            version: v2
        weight: 10

This CRD is easy to read and maintain, no ?

ServiceMesh Hub is then creating all the complex plumbing for us ?

Get started

We invite you to check out the project and join the community. Solo.io also offers enterprise support for Istio service mesh for those looking to operationalize service mesh environments, request a meeting to learn more here.