Identity Federation for Multi-Cluster Kubernetes and Service Mesh

In this blog series, we will dig into specific challenge areas for multi-cluster Kubernetes and service mesh architecture, considerations and approaches in solving them.

The previous blog posts focused on aspects of Failover and Fallback routing from a service mesh perspective and in comparison (and combined with) multi-cluster API gateway instances.

In this blog post we start looking at federating identity across multiple clusters for authentication between services. This blog post and the previous Service Discovery post are complimentary to understand what services exists, where they exist and which ones should be communicating to each other.

To start, there are two different kinds of Authentication that are need for these environments:

  • Service to Service authentication
  • End user authentication

In this Blog post, we’ll focus on Service to Service authentication.

If you want to learn more about End user authentication, you can have a look at the Gloo documentation.

Service to Service Authentication

By default the TLS protocol only proves the identity of the server to the client using X.509 certificate and the authentication of the client to the server is left to the application layer.

Mutual TLS authentication refers to two parties authenticating each other at the same time.

In Istio, Mutual TLS work as follow:

  • Istio re-routes the outbound traffic from a client to the client’s local sidecar Envoy.
  • The client side Envoy starts a mutual TLS handshake with the server side Envoy. During the handshake, the client side Envoy also does a secure naming check to verify that the service account presented in the server certificate is authorized to run the target service.
  • The client side Envoy and the server side Envoy establish a mutual TLS connection, and Istio forwards the traffic from the client side Envoy to the server side Envoy.
  • After authorization, the server side Envoy forwards the traffic to the server service through local TCP connections.

SPIFFE, the Secure Production Identity Framework for Everyone, is a set of open-source standards for securely identifying software systems in dynamic and heterogeneous environments. Systems that adopt SPIFFE can easily and reliably mutually authenticate wherever they are running.

A SPIFFE ID is a string that uniquely and specifically identifies a workload. SPIFFE IDs are a Uniform Resource Identifier (URI) which takes the following format: spiffe://trust domain/workload identifier

In the case of Istio, the SPIFFE ID of a workload looks like spiffe://<trust domain>/ns/<namespace>/sa/<service account>

The default trust domain is cluster.local, so a the SPIFFE ID corresponding to a Pod started with the service account pod-sa in the default namespace would be spiffe://cluster.local/ns/default/sa/pod-sa.

In a multi-cluster deployment, using the  cluster.local trust domain is a problem because there would be no way to differentiate a workload of a cluster from one of another cluster if they use the same service account and namespace names.

Istio allows you to use a different trust domain using the trustDomain parameter of the MeshConfig option.

Local Service to Service Authentication

Let’s start by a simple local example.

I’ve deployed Istio on a cluster using the kind2 trust domain.

When you deploy the bookinfo demo application on Istio, the productpage micro service sends requests to the reviews micro service.

If we modify the logging level of the Envoy sidecar proxy running in the reviews Pod and load the web page, we’ll see the information below in the logs of the reviews Pod:

fields {
  key: "source.namespace"
  value {
    string_value: "default"
  }
}
fields {
  key: "source.principal"
  value {
    string_value: "kind2/ns/default/sa/bookinfo-productpage"
  }
}

As you can see, the Envoy sidecar proxy running in the reviews Pod is able to determine that request is coming from a Pod running on the cluster deployed with the trust domain kind2 and using the Service Account bookinfo-productpage of the default namespace.

Multi-cluster Service to Service Authentication

First of all, I’ve deployed Istio on a second cluster using the kind3 trust domain.

The lab is composed of 3 kind clusters:

Service Mesh Hub can help unify the root identity between multiple service mesh installations so any intermediates are signed by the same Root CA and end-to-end mTLS between clusters and services can be established correctly.

Run this command to see how the communication between micro services occur currently:

kubectl --context kind-kind2 exec -t deploy/reviews-v1 -c istio-proxy \
-- openssl s_client -showcerts -connect ratings:9080

You should get something like that:

CONNECTED(00000005)
139706332271040:error:1408F10B:SSL routines:ssl3_get_record:wrong version number:../ssl/record/ssl3_record.c:332:
---
no peer certificate available
---
No client certificate CA names sent
---
SSL handshake has read 5 bytes and written 309 bytes
Verification: OK
---
New, (NONE), Cipher is (NONE)
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
Early data was not sent
Verify return code: 0 (ok)
---
command terminated with exit code 1

It means that the traffic is currently not encrypted.

Enable TLS on both clusters:

kubectl --context kind-kind2 apply -f - <

Run the command again:

kubectl --context kind-kind2 exec -t deploy/reviews-v1 -c istio-proxy \
-- openssl s_client -showcerts -connect ratings:9080/pre>

Now, the output should be like that:

...
Certificate chain
 0 s:
   i:O = kind2
-----BEGIN CERTIFICATE-----
MIIDFzCCAf+gAwIBAgIRALsoWlroVcCc1n+VROhATrcwDQYJKoZIhvcNAQELBQAw
...
BPiAYRMH5j0gyBqiZZEwCfzfQe1e6aAgie9T
-----END CERTIFICATE-----
 1 s:O = kind2
   i:O = kind2
-----BEGIN CERTIFICATE-----
MIICzjCCAbagAwIBAgIRAKIx2hzMbAYzM74OC4Lj1FUwDQYJKoZIhvcNAQELBQAw
...
uMTPjt7p/sv74fsLgrx8WMI0pVQ7+2plpjaiIZ8KvEK9ye/0Mx8uyzTG7bpmVVWo
ugY=
-----END CERTIFICATE-----
...

As you can see, mTLS is now enabled.

Now, run the same command on the second cluster:

kubectl --context kind-kind3 exec -t deploy/reviews-v1 -c istio-proxy \
-- openssl s_client -showcerts -connect ratings:9080

The output should be like that:

...
Certificate chain
 0 s:
   i:O = kind3
-----BEGIN CERTIFICATE-----
MIIDFzCCAf+gAwIBAgIRALo1dmnbbP0hs1G82iBa2oAwDQYJKoZIhvcNAQELBQAw
...
YvDrZfKNOKwFWKMKKhCSi2rmCvLKuXXQJGhy
-----END CERTIFICATE-----
 1 s:O = kind3
   i:O = kind3
-----BEGIN CERTIFICATE-----
MIICzjCCAbagAwIBAgIRAIjegnzq/hN/NbMm3dmllnYwDQYJKoZIhvcNAQELBQAw
...
GZRM4zV9BopZg745Tdk2LVoHiBR536QxQv/0h1P0CdN9hNLklAhGN/Yf9SbDgLTw
6Sk=
-----END CERTIFICATE-----
...

The first certificate in the chain is the certificate of the workload and the second one is the Istio CA’s signing (CA) certificate.

As you can see, the Istio CA’s signing (CA) certificates are different in the 2 clusters, so one cluster can’t validate certificates issued by the other cluster.

Creating a Virtual Mesh will unify the root identity.

Run the following command to create the Virtual Mesh:

cat << EOF | kubectl --context kind-kind1 apply -f -
apiVersion: networking.smh.solo.io/v1alpha2
kind: VirtualMesh
metadata:
  name: virtual-mesh
  namespace: service-mesh-hub
spec:
  mtlsConfig:
    autoRestartPods: true
    shared:
      rootCertificateAuthority:
        generated: null
  federation: {}
  meshes:
  - name: istiod-istio-system-kind2
    namespace: service-mesh-hub
  - name: istiod-istio-system-kind3
    namespace: service-mesh-hub
EOF

When we create the VirtualMesh and set the trust model to shared, Service Mesh Hub will kick off the process to unify the identity to a shared root.

First, Service Mesh Hub will create the Root CA.

Then, Service Mesh Hub will use a Certificate Request (CR) agent on each of the clusters to create a new key/cert pair that will form an intermediate CA used by the mesh on that cluster. It will then create a Certificate Request.

Virtual Mesh Creation

Service Mesh Hub will sign the certificate with the Root CA. At that point, we want Istio to pick up the new intermediate CA and start using that for its workloads.

To do that Service Mesh Hub creates a Kubernetes secret called cacerts in the istio-system namespace.

You can have a look at the Istio documentation here if you want to get more information about this process.

Check that the new certificate has been created on the first cluster:

kubectl --context kind-kind2 get secret -n istio-system cacerts -o yaml

Here is the expected output:

apiVersion: v1
data:
  ca-cert.pem: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUZFRENDQXZpZ0F3SUJBZ0lRUG5kRDkwejN4dytYeTBzYzNmcjRmekFOQmdrcWhraUc5dzBCQVFzRkFEQWIKTVJrd0Z3WURWU...
  jFWVlZtSWl3Si8va0NnNGVzWTkvZXdxSGlTMFByWDJmSDVDCmhrWnQ4dz09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
  ca-key.pem: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlKS0FJQkFBS0NBZ0VBczh6U0ZWcEFxeVNodXpMaHVXUlNFMEJJMXVwbnNBc3VnNjE2TzlKdzBlTmhhc3RtClUvZERZS...
  DT2t1bzBhdTFhb1VsS1NucldpL3kyYUtKbz0KLS0tLS1FTkQgUlNBIFBSSVZBVEUgS0VZLS0tLS0K
  cert-chain.pem: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUZFRENDQXZpZ0F3SUJBZ0lRUG5kRDkwejN4dytYeTBzYzNmcjRmekFOQmdrcWhraUc5dzBCQVFzRkFEQWIKTVJrd0Z3WURWU...
  RBTHpzQUp2ZzFLRUR4T2QwT1JHZFhFbU9CZDBVUDk0KzJCN0tjM2tkNwpzNHYycEV2YVlnPT0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
  key.pem: ""
  root-cert.pem: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUU0ekNDQXN1Z0F3SUJBZ0lRT2lZbXFGdTF6Q3NzR0RFQ3JOdnBMakFOQmdrcWhraUc5dzBCQVFzRkFEQWIKTVJrd0Z3WURWU...
  UNBVEUtLS0tLQo=
kind: Secret
metadata:
  labels:
    agent.certificates.smh.solo.io: service-mesh-hub
    cluster.multicluster.solo.io: ""
  name: cacerts
  namespace: istio-system
type: certificates.smh.solo.io/issued_certificate

Check that the new certificate has been created on the second cluster:

kubectl --context kind-kind3 get secret -n istio-system cacerts -o yaml

Here is the expected output:

apiVersion: v1
data:
  ca-cert.pem: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUZFRENDQXZpZ0F3SUJBZ0lRWXE1V29iWFhGM1gwTjlNL3BYYkNKekFOQmdrcWhraUc5dzBCQVFzRkFEQWIKTVJrd0Z3WURWU...
  XpqQ1RtK2QwNm9YaDI2d1JPSjdQTlNJOTkrR29KUHEraXltCkZIekhVdz09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
  ca-key.pem: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlKS1FJQkFBS0NBZ0VBMGJPMTdSRklNTnh4K1lMUkEwcFJqRmRvbG1SdW9Oc3gxNUUvb3BMQ1l1RjFwUEptCndhR1U1V...
  MNU9JWk5ObDA4dUE1aE1Ca2gxNCtPKy9HMkoKLS0tLS1FTkQgUlNBIFBSSVZBVEUgS0VZLS0tLS0K
  cert-chain.pem: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUZFRENDQXZpZ0F3SUJBZ0lRWXE1V29iWFhGM1gwTjlNL3BYYkNKekFOQmdrcWhraUc5dzBCQVFzRkFEQWIKTVJrd0Z3WURWU...
  RBTHpzQUp2ZzFLRUR4T2QwT1JHZFhFbU9CZDBVUDk0KzJCN0tjM2tkNwpzNHYycEV2YVlnPT0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
  key.pem: ""
  root-cert.pem: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUU0ekNDQXN1Z0F3SUJBZ0lRT2lZbXFGdTF6Q3NzR0RFQ3JOdnBMakFOQmdrcWhraUc5dzBCQVFzRkFEQWIKTVJrd0Z3WURWU...
  UNBVEUtLS0tLQo=
kind: Secret
metadata:
  labels:
    agent.certificates.smh.solo.io: service-mesh-hub
    cluster.multicluster.solo.io: ""
  name: cacerts
  namespace: istio-system
type: certificates.smh.solo.io/issued_certificate

As you can see, the secrets contain the same Root CA (base64 encoded), but different intermediate certs.

Now, let’s check what certificates we get when we run the same commands we ran before we created the Virtual Mesh:

kubectl --context kind-kind2 exec -t deploy/reviews-v1 -c istio-proxy \
-- openssl s_client -showcerts -connect ratings:9080

The output should be like that:

...
Certificate chain
 0 s:
   i:
-----BEGIN CERTIFICATE-----
MIIEBzCCAe+gAwIBAgIRAK1yjsFkisSjNqm5tzmKQS8wDQYJKoZIhvcNAQELBQAw
...
T77lFKXx0eGtDNtWm/1IPiOutIMlFz/olVuN
-----END CERTIFICATE-----
 1 s:
   i:O = service-mesh-hub
-----BEGIN CERTIFICATE-----
MIIFEDCCAvigAwIBAgIQPndD90z3xw+Xy0sc3fr4fzANBgkqhkiG9w0BAQsFADAb
...
hkZt8w==
-----END CERTIFICATE-----
 2 s:O = service-mesh-hub
   i:O = service-mesh-hub
-----BEGIN CERTIFICATE-----
MIIE4zCCAsugAwIBAgIQOiYmqFu1zCssGDECrNvpLjANBgkqhkiG9w0BAQsFADAb
...
s4v2pEvaYg==
-----END CERTIFICATE-----
 3 s:O = service-mesh-hub
   i:O = service-mesh-hub
-----BEGIN CERTIFICATE-----
MIIE4zCCAsugAwIBAgIQOiYmqFu1zCssGDECrNvpLjANBgkqhkiG9w0BAQsFADAb
...
s4v2pEvaYg==
-----END CERTIFICATE-----
...

And let’s compare with what we get on the second cluster:

kubectl --context kind-kind3 exec -t deploy/reviews-v1 -c istio-proxy \
-- openssl s_client -showcerts -connect ratings:9080

The output should be like that:

...
Certificate chain
 0 s:
   i:
-----BEGIN CERTIFICATE-----
MIIEBjCCAe6gAwIBAgIQfSeujXiz3KsbG01+zEcXGjANBgkqhkiG9w0BAQsFADAA
...
EtTlhPLbyf2GwkUgzXhdcu2G8uf6o16b0qU=
-----END CERTIFICATE-----
 1 s:
   i:O = service-mesh-hub
-----BEGIN CERTIFICATE-----
MIIFEDCCAvigAwIBAgIQYq5WobXXF3X0N9M/pXbCJzANBgkqhkiG9w0BAQsFADAb
...
FHzHUw==
-----END CERTIFICATE-----
 2 s:O = service-mesh-hub
   i:O = service-mesh-hub
-----BEGIN CERTIFICATE-----
MIIE4zCCAsugAwIBAgIQOiYmqFu1zCssGDECrNvpLjANBgkqhkiG9w0BAQsFADAb
...
s4v2pEvaYg==
-----END CERTIFICATE-----
 3 s:O = service-mesh-hub
   i:O = service-mesh-hub
-----BEGIN CERTIFICATE-----
MIIE4zCCAsugAwIBAgIQOiYmqFu1zCssGDECrNvpLjANBgkqhkiG9w0BAQsFADAb
...
s4v2pEvaYg==
-----END CERTIFICATE-----
...

You can see that the last certificate in the chain is now identical on both clusters. It’s the new root certificate.

The first certificate is the certificate of the service. Let’s decrypt it.

Copy and Paste the content of the certificate (including the BEGIN and END CERTIFICATE lines) in a new file called /tmp/cert and run the following command:

openssl x509 -in /tmp/cert -text

The output should be as follow:

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            7d:27:ae:8d:78:b3:dc:ab:1b:1b:4d:7e:cc:47:17:1a
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: 
        Validity
            Not Before: Sep 17 08:21:08 2020 GMT
            Not After : Sep 18 08:21:08 2020 GMT
        Subject: 
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (2048 bit)
                Modulus:
...
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage: 
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Subject Alternative Name: critical
                URI:spiffe://kind3/ns/default/sa/bookinfo-ratings
    Signature Algorithm: sha256WithRSAEncryption
...
-----BEGIN CERTIFICATE-----
MIIEBjCCAe6gAwIBAgIQfSeujXiz3KsbG01+zEcXGjANBgkqhkiG9w0BAQsFADAA
...
EtTlhPLbyf2GwkUgzXhdcu2G8uf6o16b0qU=
-----END CERTIFICATE-----

The Subject Alternative Name (SAN) is the most interesting part. It allows the sidecar proxy of the reviews service to validate that it talks to the sidecar proxy of the rating service.

References

Get started

Service Mesh Hub was updated and open sourced in May and has recently started community meetings to expand the conversation around service mesh. We invite you to check out the project and join the community. Solo.io also offers enterprise support for Istio service mesh for those looking to operationalize service mesh environments, request a meeting to learn more here