The evolution of multicluster support in Istio 1.8

Istio 1.8 has just been released and one of the area that has evolved is its multicluster support.

If you are familiar with Istio, you probably know the 2 multicluster deployment models that were available in the previous versions:

  • Shared control plane

In this configuration a single Istio control plane was deployed and Pods running on different clusters were generally communicating together directly (when clusters were on the same network).

The Istio control plane was communicating withe the Kubernetes API servers of the other clusters to discover the workloads running on them.

  • Replicated control plane

In this configuration, an Istio control plane was deployed on each cluster and the Pods running on different clusters were communicating together through the Istio Ingress gateways. Note that each Istio control plane was only communicating with the Kubernetes API server running on the same cluster.

This configuration was the most popular one because it was providing higher availability.

What has changed in Istio 1.8 ?

If you look at the documentation, you will find 4 configurations:

  • Primary-Remote

This configuration is the same as the Shared control plane configuration with a single network.

  • Primary-Remote on different networks

This configuration is the same as the Shared control plane configuration with different networks.

  • Multi-Primary

This configuration is similar to the Replicated control plane configuration, but the Pods running on different clusters can communicate together directly and the Istio control plane can now discover workloads running on the other clusters (by accessing the remote Kubernetes API servers).

  • Multi-Primary on different networks

This configuration is the same as the Replicated control plane configuration, but the Istio control plane can now discover workloads running on the other clusters (by accessing the remote Kubernetes API servers).

So what has really changed ?

As you can see, not much, it’s just an evolution.

You get more flexibility because you can now take the best features of the previous Shared control plane (direct communication between Pods when a flat network is available, automated discovery of the remote workloads) and the Replicated control plane (higher availability, no flat network required) configurations and build your own architecture.

How does Endpoint Discovery work ?

What is this mysterious Endpoint Discovery feature ?

If you look at the Istio documentation, you’ll find an explanation about how to enable it (by creating secrets with the Kubeconfig of the remote clusters), but nothing else.

So, I decided to have a look at it !

I followed the documentation to deploy Istio on 2 Kubernetes clusters following the Multi-Primary on different networks configuration.

Then, I’ve created the secrets and deployed the Sleep and HelloWorld services has shown here.

It means that I now have the Sleep and HelloWorld-v1 services running on my first Kubernetes cluster and the HelloWorld-v2 service running on my second Kubernetes cluster.

When I try to curl the Helloworld service from the Sleep service on the first cluster I get the 2 different output below randomly:

  • helloworld-v1-5b75657f75-jg6jm
  • helloworld-v2-7855866d4f-dm9gq9

It proves that the Endpoint Discovery works correctly.

I was expecting the Istio control plane to create new ServiceEntries for the discovered services, but it didn’t.

So it works, but how can I see what services have been discovered ?

If the requests are correctly spread across the 2 versions of the HelloWorld services, it means that the Envoy container running in the Sleep Pod is aware of them.

Let’s have a look at the Envoy clusters:

...
outbound|5000||helloworld.sample.svc.cluster.local::192.168.163.201:5000::cx_active::1
...
outbound|5000||helloworld.sample.svc.cluster.local::172.18.0.231:15443::cx_active::1
...

As you can see, I have 2 different endpoints.

The first endpoint corresponds to the IP and port of the HelloWorld Pod running locally, while the second endpoint corresponds to the IP and port of the Istio EastWest gateway running on the remote cluster.

Using a dedicated gateway (the EastWest gateway) is now recommended for cross cluster communications (instead of using the Ingress gateway for both external and cross cluster communications).

In fact, at Solo.io, we always recommended this approach. And for the Ingress, you can use a more powerful gateway like Gloo Edge (which provides advanced transformations, OIDC and JWT authentication, Web Application Firewall, …).

But how can I get more information about the endpoints discovered by the Istio control plane ?

I finally found a way to get more information after looking at the Istio source code.

There’s a debug endpoint called `endpointShardz` that returns the information I was looking for:

{
  "helloworld.sample.svc.cluster.local": {
   "sample": {
    "Shards": {
     "cluster1": [
      {
       "Labels": {
        "app": "helloworld",
        "istio.io/rev": "default",
        "pod-template-hash": "5b75657f75",
        "security.istio.io/tlsMode": "istio",
        "service.istio.io/canonical-name": "helloworld",
        "service.istio.io/canonical-revision": "v1",
        "topology.istio.io/cluster": "cluster1",
        "topology.istio.io/network": "network1",
        "topology.kubernetes.io/region": "us-east-1",
        "topology.kubernetes.io/zone": "us-east-1c",
        "version": "v1"
       },
       "Address": "192.168.163.201",
       "ServicePortName": "http",
       "EnvoyEndpoint": {
        "HostIdentifier": {
         "Endpoint": {
          "address": {
           "Address": {
            "SocketAddress": {
             "address": "192.168.163.201",
             "PortSpecifier": {
              "PortValue": 5000
             }
            }
           }
          }
         }
        },
        "metadata": {
         "filter_metadata": {
          "envoy.transport_socket_match": {
           "tlsMode": "istio"
          },
          "istio": {
           "network": "network1",
           "workload": "helloworld-v1;sample;helloworld;v1"
          }
         }
        },
        "load_balancing_weight": {
         "value": 1
        }
       },
       "ServiceAccount": "spiffe://cluster.local/ns/sample/sa/default",
       "Network": "network1",
       "Locality": {
        "Label": "us-east-1/us-east-1c/",
        "ClusterID": "cluster1"
       },
       "EndpointPort": 5000,
       "LbWeight": 0,
       "TLSMode": "istio",
       "Namespace": "sample",
       "WorkloadName": "helloworld-v1"
      }
     ],
     "cluster2": [
      {
       "Labels": {
        "app": "helloworld",
        "istio.io/rev": "default",
        "pod-template-hash": "7855866d4f",
        "security.istio.io/tlsMode": "istio",
        "service.istio.io/canonical-name": "helloworld",
        "service.istio.io/canonical-revision": "v2",
        "topology.istio.io/cluster": "cluster2",
        "topology.istio.io/network": "network2",
        "topology.kubernetes.io/region": "us-east-1",
        "topology.kubernetes.io/zone": "us-east-1c",
        "version": "v2"
       },
       "Address": "192.168.233.73",
       "ServicePortName": "http",
       "EnvoyEndpoint": {
        "HostIdentifier": {
         "Endpoint": {
          "address": {
           "Address": {
            "SocketAddress": {
             "address": "192.168.233.73",
             "PortSpecifier": {
              "PortValue": 5000
             }
            }
           }
          }
         }
        },
        "metadata": {
         "filter_metadata": {
          "envoy.transport_socket_match": {
           "tlsMode": "istio"
          },
          "istio": {
           "network": "network2",
           "workload": "helloworld-v2;sample;helloworld;v2"
          }
         }
        },
        "load_balancing_weight": {
         "value": 1
        }
       },
       "ServiceAccount": "spiffe://cluster.local/ns/sample/sa/default",
       "Network": "network2",
       "Locality": {
        "Label": "us-east-1/us-east-1c/",
        "ClusterID": "cluster2"
       },
       "EndpointPort": 5000,
       "LbWeight": 0,
       "TLSMode": "istio",
       "Namespace": "sample",
       "WorkloadName": "helloworld-v2"
      }
     ]
    },
    "ServiceAccounts": {
     "spiffe://cluster.local/ns/sample/sa/default": {}
    }
   }
  },

How Gloo Mesh can help ?

Gloo Mesh is a management plane that simplifies operations and workflows of service mesh installations across multiple clusters and deployment footprints. With Gloo Mesh, you can install, discover, and operate a service-mesh deployment across your enterprise, deployed on premises, or in the cloud, even across heterogeneous service-mesh implementations.

Federated trust

First of all, when you want to use the Primary-Remote on different networks configuration, you need to establish trust between all clusters in the mesh.

To achieve that, you need to create root and intermediate certificates and to distribute them across the clusters.

In a previous Blog post, I’ve described how Gloo Mesh does that for you by simply creating a VirtualMesh object. It’s easier and more secure (you don’t need to transfer the certificates through the network).

Service Discovery

The second step is to discover the workloads running on the remote clusters. As you’ve seen previously, Istio 1.8 can now do that automatically, but Gloo Mesh was already providing this capability. So what is the advantage of using the Gloo Mesh discovery ?

Imagine you have 4 Istio control planes and you want to allow cross cluster communications. With the native Istio Endpoint Discovery, you would need to create 3 secrets on each clusters (to allow the local Istio control plane to discover the workloads of the 3 remote Kubernetes clusters). So, you must share the Kubernetes credentials of all the clusters with all the clusters.

Gloo Mesh is a management plane for multiple Istio clusters and discovers the workloads running on the different Kubernetes clusters. So, you must only share the Kubernetes credentials of all the clusters with Gloo Mesh. It’s more secure and generate less load on the clusters.

Also, Gloo Mesh creates a ServiceEntry for each Service running in another cluster. It’s easier than using the Envoy API to list the Envoy clusters as I’ve shown above.

Finally, Gloo Mesh creates a global name (for example, helloworld.sample.svc.cluster2.global) to allow you to explicitly communicate with a remote service.

Traffic Management

Istio VirtualServices can be used to define how to route traffic to different versions (on the same or on different clusters), but a VirtualService is a local resource and additional objects must be created (for example, DestinationRules to define the subsets corresponding to each version).

Gloo Mesh has a concept of a TrafficPolicy, which is a global object that allows you to define how to route traffic in a simpler way. Gloo Mesh then creates the VirtualServices and other objects needed.

Gloo Mesh FailoverServices are also available to give you more control over where to route the traffic when a local service becomes unavailable.

Here is an example of a Gloo Mesh TrafficPolicy:

apiVersion: networking.mesh.gloo.solo.io/v1alpha2
kind: TrafficPolicy
metadata:
  namespace: gloo-mesh
  name: simple
spec:
  destinationSelector:
  - kubeServiceRefs:
      services:
        - clusterName: cluster1
          name: reviews
          namespace: default
  trafficShift:
    destinations:
      - kubeService:
          clusterName: cluster2
          name: reviews
          namespace: default
          subset:
            version: v3
        weight: 75
      - kubeService:
          clusterName: cluster1
          name: reviews
          namespace: default
          subset:
            version: v1
        weight: 15
      - kubeService:
          clusterName: cluster1
          name: reviews
          namespace: default
          subset:
            version: v2
        weight: 10

Authorization

Istio AuthorizationPolicies can be used to define which services can communicate together, but again, an AuthorizationPolicy is a local object.

Gloo Mesh has a concept of an AccessPolicy, which is a global object that allows you to define globally which services can communicate together. Gloo Mesh then creates the AuthorizationPolicies needed.

Here is an example of a Gloo Mesh AccessPolicy:

apiVersion: networking.mesh.gloo.solo.io/v1alpha2
kind: AccessPolicy
metadata:
  namespace: gloo-mesh
  name: reviews
spec:
  sourceSelector:
  - kubeServiceAccountRefs:
      serviceAccounts:
        - name: bookinfo-reviews
          namespace: default
          clusterName: cluster1
        - name: bookinfo-reviews
          namespace: default
          clusterName: cluster2
  destinationSelector:
  - kubeServiceMatcher:
      namespaces:
      - default
      labels:
        service: ratings

RBAC

Kubernetes RBAC can be used to define who can create Istio objects (VirtualServices, AuthorizationPolicies`, …), but it doesn’t provide enough granularity.

Gloo Mesh Enteprise includes RBAC that let you define who can create a TrafficPolicy (for example) and what can be the scope of the object (which source, which target, what kind of operation, …).

Here is an example of a Gloo Mesh Role:

apiVersion: rbac.mesh.gloo.solo.io/v1alpha1
kind: Role
metadata:
  name: default-namespace-admin-role
  namespace: gloo-mesh
spec:
  trafficPolicyScopes:
    - trafficPolicyActions:
        - ALL
      trafficTargetSelectors:
        - kubeServiceMatcher:
            labels:
              "*": "*"
            namespaces:
              - "default"
            clusters:
              - "*"
        - kubeServiceRefs:
            services:
              - name: "*"
                namespace: "default"
                clusterName: "*"
      workloadSelectors:
        - labels:
            "*": "*"
          namespaces:
            - "*"
          clusters:
            - "*"

Management

Gloo Mesh Enteprise provides a web UI that provides visibility across all your clusters.

You can use it to easily see which workloads and traffic targets are impacted by each policy.

Web Assembly

Gloo Mesh Enterprise will also make it easy for you to build, push and deploy Web Assembly filters on any Istio workload.

Conclusion

The way Istio supports multicluster deployment has slightly evolved in the 1.8 version.

It’s a little bit more flexible, but each Istio control plane remains independent.

A management plane, like Gloo Mesh, can dramatically simplify the way you operate a multicluster deployment.

Get started

We invite you to check out the project and join the community. Solo.io also offers enterprise support for Istio service mesh for those looking to operationalize service mesh environments, request a meeting to learn more here.