Identity Federation for Multi-Cluster Kubernetes and Service Mesh

Denis Jannot | September 22, 2020

In this blog series, we will dig into specific challenge areas for multi-cluster Kubernetes and service mesh architecture, considerations and approaches in solving them.

The previous blog posts focused on aspects of Failover and Fallback routing from a service mesh perspective and in comparison (and combined with) multi-cluster API gateway instances.

In this blog post we start looking at federating identity across multiple clusters for authentication between services. This blog post and the previous Service Discovery post are complimentary to understand what services exists, where they exist and which ones should be communicating to each other.

To start, there are two different kinds of Authentication that are need for these environments:

  • Service to Service authentication
  • End user authentication

In this Blog post, we’ll focus on Service to Service authentication.

If you want to learn more about End user authentication, you can have a look at the Gloo documentation.

Service to Service Authentication

By default the TLS protocol only proves the identity of the server to the client using X.509 certificate and the authentication of the client to the server is left to the application layer.

Mutual TLS authentication refers to two parties authenticating each other at the same time.

In Istio, Mutual TLS work as follow:

  • Istio re-routes the outbound traffic from a client to the client’s local sidecar Envoy.
  • The client side Envoy starts a mutual TLS handshake with the server side Envoy. During the handshake, the client side Envoy also does a secure naming check to verify that the service account presented in the server certificate is authorized to run the target service.
  • The client side Envoy and the server side Envoy establish a mutual TLS connection, and Istio forwards the traffic from the client side Envoy to the server side Envoy.
  • After authorization, the server side Envoy forwards the traffic to the server service through local TCP connections.

SPIFFE, the Secure Production Identity Framework for Everyone, is a set of open-source standards for securely identifying software systems in dynamic and heterogeneous environments. Systems that adopt SPIFFE can easily and reliably mutually authenticate wherever they are running.

A SPIFFE ID is a string that uniquely and specifically identifies a workload. SPIFFE IDs are a Uniform Resource Identifier (URI) which takes the following format: spiffe://trust domain/workload identifier

In the case of Istio, the SPIFFE ID of a workload looks like spiffe://<trust domain>/ns/<namespace>/sa/<service account>

The default trust domain is cluster.local, so a the SPIFFE ID corresponding to a Pod started with the service account pod-sa in the default namespace would be spiffe://cluster.local/ns/default/sa/pod-sa.

In a multi-cluster deployment, using the  cluster.local trust domain is a problem because there would be no way to differentiate a workload of a cluster from one of another cluster if they use the same service account and namespace names.

Istio allows you to use a different trust domain using the trustDomain parameter of the MeshConfig option.

Local Service to Service Authentication

Let’s start by a simple local example.

I’ve deployed Istio on a cluster using the kind2 trust domain.

When you deploy the bookinfo demo application on Istio, the productpage micro service sends requests to the reviews micro service.

If we modify the logging level of the Envoy sidecar proxy running in the reviews Pod and load the web page, we’ll see the information below in the logs of the reviews Pod:

As you can see, the Envoy sidecar proxy running in the reviews Pod is able to determine that request is coming from a Pod running on the cluster deployed with the trust domain kind2 and using the Service Account bookinfo-productpage of the default namespace.

Multi-cluster Service to Service Authentication

First of all, I’ve deployed Istio on a second cluster using the kind3 trust domain.

The lab is composed of 3 kind clusters:

Service Mesh Hub can help unify the root identity between multiple service mesh installations so any intermediates are signed by the same Root CA and end-to-end mTLS between clusters and services can be established correctly.

Run this command to see how the communication between micro services occur currently:

You should get something like that:

It means that the traffic is currently not encrypted.

Enable TLS on both clusters:

Run the command again:

Now, the output should be like that:

As you can see, mTLS is now enabled.

Now, run the same command on the second cluster:

The output should be like that:

The first certificate in the chain is the certificate of the workload and the second one is the Istio CA’s signing (CA) certificate.

As you can see, the Istio CA’s signing (CA) certificates are different in the 2 clusters, so one cluster can’t validate certificates issued by the other cluster.

Creating a Virtual Mesh will unify the root identity.

Run the following command to create the Virtual Mesh:

When we create the VirtualMesh and set the trust model to shared, Service Mesh Hub will kick off the process to unify the identity to a shared root.

First, Service Mesh Hub will create the Root CA.

Then, Service Mesh Hub will use a Certificate Request (CR) agent on each of the clusters to create a new key/cert pair that will form an intermediate CA used by the mesh on that cluster. It will then create a Certificate Request.

Virtual Mesh Creation

Service Mesh Hub will sign the certificate with the Root CA. At that point, we want Istio to pick up the new intermediate CA and start using that for its workloads.

To do that Service Mesh Hub creates a Kubernetes secret called cacerts in the istio-system namespace.

You can have a look at the Istio documentation here if you want to get more information about this process.

Check that the new certificate has been created on the first cluster:

Here is the expected output:

Check that the new certificate has been created on the second cluster:

Here is the expected output:

As you can see, the secrets contain the same Root CA (base64 encoded), but different intermediate certs.

Now, let’s check what certificates we get when we run the same commands we ran before we created the Virtual Mesh:

The output should be like that:

And let’s compare with what we get on the second cluster:

The output should be like that:

You can see that the last certificate in the chain is now identical on both clusters. It’s the new root certificate.

The first certificate is the certificate of the service. Let’s decrypt it.

Copy and Paste the content of the certificate (including the BEGIN and END CERTIFICATE lines) in a new file called /tmp/cert and run the following command:

The output should be as follow:

The Subject Alternative Name (SAN) is the most interesting part. It allows the sidecar proxy of the reviews service to validate that it talks to the sidecar proxy of the rating service.

References

Get started

Service Mesh Hub was updated and open sourced in May and has recently started community meetings to expand the conversation around service mesh. We invite you to check out the project and join the community. Solo.io also offers enterprise support for Istio service mesh for those looking to operationalize service mesh environments, request a meeting to learn more here

Back to Blog