Step by Step: Datastax Cassandra with Istio and SNI routing

Cassandra is a very popular “NoSQL” database. Cassandra is a highly distributed document database that can be tolerant to certain types of failures and scaled for data-intensive microservices. As Kubernetes has become the defacto container deployment platform for microservices, running stateful workloads like Cassandra is a common choice. There are quite a few guides showing how to deploy Cassandra on Kubernetes as a StatefulSet but there are much fewer guides for connecting applications running outside of the Kubernetes cluster to the Cassandra database running inside the cluster. At Solo.io we help customers and prospects operationalize microservices networking and routing technology built on Envoy proxy like Gloo or Istio. In this blog post, we’ll dig into the details of getting a deployment of Datastax Cassandra running on Kubernetes with TLS and SNI through Istio to enable routing from outside of the cluster.

We will take a step by step approach to this guide as the components themselves are quite complex. We will build up the architecture and consume the pieces as needed explaining the approach and benefits. As we are following a specific path in this blog, and there are many considerations and tradeoffs at each step, please do reach out to us if you have questions or need help.

The architecture for this blog

There are a few ways deploy Cassandra to Kubernetes and then use Istio to control the traffic routing. In this blog post, we’re specifically considering the following architecture:

Arch

What we see in this architecture are the following salient points:

  • Cassandra deployed with the Datastax Cassandra Operator as StatefulSet
  • The Cassandra inter-node communication is secured with TLS using Cassandra’s configuration
  • Istio deployed as the service mesh treating the connections between the nodes as plaintext TCP
  • Istio ingress gateway deployed at the edge with TLS passthrough and SNI routing configured
  • Client lives outside the Kubernetes cluster with intent to connect to DB running inside cluster

In the following steps, we’ll see the following sections:

  • Deploy Istio 1.7.x
  • Deploying the Datastax Cassandra Kubernetes operator
  • Deploy a DSE CassandraDatacenter configured for TLS
  • Configure Istio ingress for TCP routing
  • Test Client
  • Configure Istio for TLS passthrough/SNI routing
  • Verify with client

Source code for this blog

You can follow along or review the source code for this blog at the following repo: https://github.com/christian-posta/dse-cass-istio-sni

Installing Istio 1.7

Go download Istio 1.7 and follow the instructions for your platform to install. For example, this blog assumes a simple installation like the following:

$  istioctl install -y

We will disable mTLS for Istio in this blog as we’ll be leveraging Cassandra’s built in TLS to be able to do client to server end-to-end TLS as well as SNI routing:

$  kubectl apply -f dse-peerauth-disable.yaml

You can be more fine-grained about what namespaces don’t use mTLS with Istio (ie, the specific Cassandra namespace) instead of a mesh-wide setting like we used above.

Deploying Datastax Cassandra

As mentioned previously, we’ll be deploying Cassandra using the Datastax Cassandra operator. You can read the official Datastax docs for more details. We will specifically be installing the DSE operator for Kubernetes 1.16. To prepare our environment and deploy the operator run the following from the root of the source for this blog:

$  kubectl create namespace cass-operator

namespace/cass-operator created 
$  kubectl config set-context --current --namespace=cass-operator

Context "remote-cluster-context" modified.

Let’s label the cass-operator namespace so that our Database pods get the Istio sidecar injected:

$  kubectl label namespace cass-operator istio-injection=enabled

namespace/cass-operator labeled

One thing to consider before deploying the operator is to disable injecting the Istio sidecar into the operator deployment. You can do that by adding the following annotation to the Kubernetes Deployment:


      annotations:
        sidecar.istio.io/inject: "false"

Let’s deploy the operator:

$  kubectl apply -f operator-v1.16-all.yaml

We are running our deployments on GKE/Google Cloud. We will provision the following StorageClass for the Cassandra Database:


apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: server-storage
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-ssd
  replication-type: none
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete

Go ahead and apply it from the source code directory:

$  kubectl apply -f dse-storage-class.yaml

storageclass.storage.k8s.io/server-storage configured 

Lastly, let’s deploy the DSE CassandraDatacenter resource:

$  kubectl apply -f dse-moderate-tls.yaml

serviceaccount/svc-tke-cass created
cassandradatacenter.cassandra.datastax.com/dc1 created

We will dig into the details of this resource in the next sections.

At this point, we should see our Cassandra nodes start to come up:

$  kubectl get po
NAME                             READY   STATUS     RESTARTS   AGE
cass-operator-74d7dfc646-76v5z   1/1     Running    0          7m51s
dse-dc1-default-sts-0            0/3     Init:0/2   0          13s
dse-dc1-default-sts-1            0/3     Init:0/2   0          13s
dse-dc1-default-sts-2            0/3     Init:0/2   0          13s

We also see the headless services that were created for this SatefuleSet:

$  kubectl get svc
NAME                                  TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)                      AGE
cass-operator-metrics                 ClusterIP   10.8.66.157   <none>        8383/TCP,8686/TCP            4h14m
cassandradatacenter-webhook-service   ClusterIP   10.8.74.74    <none>        443/TCP                      4h14m
dse-dc1-all-pods-service              ClusterIP   None          <none>        9042/TCP,8080/TCP,9103/TCP   4h6m
dse-dc1-service                       ClusterIP   None          <none>        9042/TCP,8080/TCP,9103/TCP   4h6m
dse-seed-service                      ClusterIP   None          <none>        <none>    

Following the Kubernetes Service rules for StatefulSets, we can address each of the pods with the following hostnames:

  • dse-dc1-default-sts-0.dse-dc1-service.cass-operator.svc.cluster.local
  • dse-dc1-default-sts-1.dse-dc1-service.cass-operator.svc.cluster.local
  • dse-dc1-default-sts-2.dse-dc1-service.cass-operator.svc.cluster.local

However, Istio does not know about these DNS names. Istio does know about the headless services and can pull the EDS/endpoints for these, but it does not know about the specific DNS names. We can create those explicitly that look something like this for each host:


apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: dse-dc1-sts-0-se
  namespace: cass-operator
spec:
  hosts:
  - dse-dc1-default-sts-0.dse-dc1-service.cass-operator.svc.cluster.local
  location: MESH_INTERNAL
  ports:
  - name: cassanda
    number: 9042
    protocol: TLS
  resolution: DNS

Let’s create all of the service entries:

$  kubectl apply -f dse-ingressgateway-serviceentry.yaml
serviceentry.networking.istio.io/dse-dc1-sts-0-se created
serviceentry.networking.istio.io/dse-dc1-sts-1-se created
serviceentry.networking.istio.io/dse-dc1-sts-2-se created

Understanding what we’ve deployed so far

When we configured the CassandraDatacenter we configured with the following properties:


metadata:
  name: dc1

spec:
  clusterName: dse

  size: 3

  resources:
    requests:
      memory: 2G
      cpu: 2
    limits:
      memory: 4G
      cpu: 4  

We’ve named the CassandraDatacenter resource dc1, specified a cluster size of 3 and configured resource requests and limits for the StatefuleSet that gets created. We are specifically calling out this resource section because if you don’t give Cassandra enough resources it will struggle to come up fully and appear to be stuck in a state where not all of the containers are in the Ready state.

Another part of the CassandraDatacenter config file concerns setting up TLS for the cassandra inter-node communication as well as the client TLS settings:


  config:
    # See https://docs.datastax.com/en/dse/6.7/dse-dev/datastax_enterprise/config/configCassandra_yaml.html
    cassandra-yaml:
      authenticator: org.apache.cassandra.auth.PasswordAuthenticator
      authorizer: org.apache.cassandra.auth.CassandraAuthorizer
      role_manager: org.apache.cassandra.auth.CassandraRoleManager
      server_encryption_options:
         internode_encryption: all
         keystore: /etc/encryption/node-keystore.jks
         keystore_password: dc1
         truststore: /etc/encryption/node-keystore.jks
         truststore_password: dc1
      client_encryption_options:
         enabled: true
         optional: true
         keystore: /etc/encryption/node-keystore.jks
         keystore_password: dc1
         truststore: /etc/encryption/node-keystore.jks
         truststore_password: dc1

You can see we turn on the internode_encryption property as well as enabled: true for the client_encryption_options. These settings point to certs that live in the /etc/encryption/node-keystore.jks keystore/truststore. By default, these will be automatically created unless you create them ahead of time. So with these settings we can connect with either a plaintext or TLS client. We will try this in the next section.

Let’s take a look at the certificate/keystore as Kubernetes secrets:

$  kubectl get secret
NAME                           TYPE                                  DATA   AGE
cass-operator-token-67vk5      kubernetes.io/service-account-token   3      3h48m
cass-operator-webhook-config   Opaque                                2      3h48m
dc1-ca-keystore                Opaque                                2      3h40m
dc1-keystore                   Opaque                                1      3h40m
default-token-vd869            kubernetes.io/service-account-token   3      3h48m
dse-superuser                  Opaque                                2      3h40m
svc-tke-cass-token-tbpwk       kubernetes.io/service-account-token   3      3h40m

You can see the dc1-ca-keystore and the dc1-keystore secrets that have been created. The first one contains the root CA that created the leaf certificates that live in the second one which is the JKS keystore.

We want to use the root CA that got created in the next sections, so let’s save that off to a file called dc1-root-ca.pem:

$ kubectl get secret -n cass-operator dc1-ca-keystore -o jsonpath="{.data['cert']}" | base64 --decode > dc1-root-ca.pem

So what we have so far if a default installation of the Datastax Cassandra using TLS between the Cassandra ring nodes with an option to use TLS from the client. The default certs/keystores were created and mounted into the StatefulSet. Let’s try routing traffic to this Cassandra cluster. After a few moments, you should see the Cassandra nodes up correctly:

$  kubectl get po

NAME                             READY   STATUS    RESTARTS   AGE
cass-operator-74d7dfc646-76v5z   1/1     Running   0          62m
dse-dc1-default-sts-0            3/3     Running   0          18m
dse-dc1-default-sts-1            3/3     Running   0          18m
dse-dc1-default-sts-2            3/3     Running   0          18m

Routing with Istio using TCP

We want to set up some basic TCP routing with Istio to verify everything works and was set up correctly before we start setting up the more complicated TLS/SNI routing. We will create a simple Istio Gateway and VirtualService resource that allows traffic to flow from port 9042 to the Cassandra nodes:


apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: cassandra-tcp-gateway
spec:
  selector:
    istio: ingressgateway
  servers:
  - port:
      number: 9042
      name: tcp-cassandra
      protocol: TCP
    hosts:
    - "*"

This Gateway resource just opens the 9042 port and expects TCP traffic. The following VirtualServiceroutes traffic from this port to the headless Cassandra service intended for routing:


apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: tcp-echo-vs-from-gw
spec:
  hosts:
  - "*"
  gateways:
  - cassandra-tcp-gateway
  tcp:
  - match:
    - port: 9042
    route:
    - destination:
        host: dse-dc1-service.cass-operator.svc.cluster.local
        port:
          number: 9042

Now let’s actually set this up:

$  kubectl apply -f dse-ingressgateway.yaml 
gateway.networking.istio.io/cassandra-tcp-gateway created
virtualservice.networking.istio.io/tcp-echo-vs-from-gw created

Connect via simple TCP

Let’s port-forward the Istio ingress gateway so that our client can try connect on port 9042. Note, if you’re using a cloud provider or load balancer, you’ll want to verify that all firewalls are open for the 9042 port:

$  kubectl port-forward -n istio-system deploy/istio-ingressgateway 9042
Forwarding from 127.0.0.1:9042 -> 9042
Forwarding from [::1]:9042 -> 9042

Now using a Cassandra client, we should be able to connect locally. For example, we can use the cqlsh client by passing some credentials (see below for getting the credentials):

$  USERNAME=$(kubectl get secret -n cass-operator dse-superuser -o jsonpath="{.data['username']}" | base64 --decode)
$  PASSWORD=$(kubectl get secret -n cass-operator dse-superuser -o jsonpath="{.data['password']}" | base64 --decode)
$  cqlsh -u $USERNAME -p $PASSWORD 127.0.0.1 9042

Connected to dse at 127.0.0.1:9042.
[cqlsh 6.8.0 | DSE 6.8.0 | CQL spec 3.4.5 | DSE protocol v2]
Use HELP for help.
dse-superuser@cqlsh> 

Yay! We’ve connected to the DB through Istio ingress gateway using simple TCP routing. We can run a cqlshcommand to verify it works;


dse-superuser@cqlsh> select cluster_name, data_center from system.local;

 cluster_name | data_center
--------------+-------------
          dse |         dc1

(1 rows)

Connect via TLS

We saw in the previous section that we configured Cassandra to accept both plaintext and TLS client connections. Let’s try connect using cqlsh over TLS:

$  cqlsh --ssl --cqlshrc ./cqlshrc.ssl 

This should connect just like it did in the previous step.

We have configured our cqlsh rc file with the correct username/password and trusted cert.

[authentication]
username = dse-superuser
password = W_NYIVMR3TvPUwSFHDdt9r9xfn88QQ5F6LYwWMY_vDyY9WiVwQ5oHw

[connection]
hostname = 127.0.0.1
port = 9042
 
[ssl]
certfile = ./dc1-root-ca.pem
validate = false 

Using Istio SNI to handle the routing

Imagine we have more than just the Cassandra database listening on the 9042 port or we need to secure and address each node individually. We can use SNI for more fine-grained routing using Istio. To do that, we need to configure Istio’s ingress gateway to use TLS Passthrough and configure our Istio routing rules to match on specific SNI hostnames. Let’s take a look.

The first thing we need to do is configure the Istio ingress gateway to treat the connections on port 9042 as TLS and use PASSTHROUGH semantics. This means the Istio ingress gateway will NOT try to terminate the TLS connection and will try to route it according to the Hostname present in the Server Name TLS extension (SNI) exchanged in the TLS ClientHello part of the handshake. We can configure the Istio ingress gateway like this:


apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: cassandra-tcp-gateway
spec:
  selector:
    istio: ingressgateway
  servers:
  - port:
      number: 9042
      name: tcp-cassandra-sni
      protocol: TLS
    hosts:
    - "*"
    tls:
      mode: PASSTHROUGH

Note we are just passing through any Hostname, though we could be very selective as required. We will match using an Istio VirtualService:


apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: sni-dsw-vs-from-gw
spec:
  hosts:
  - "dse-dc1-default-sts-0.dse-dc1-service.cass-operator.svc.cluster.local"
  - "dse-dc1-default-sts-1.dse-dc1-service.cass-operator.svc.cluster.local"
  - "dse-dc1-default-sts-2.dse-dc1-service.cass-operator.svc.cluster.local"
  - "8717bdb7-d879-459e-979a-f1fa6795e88a"
  - "c8617bd4-af31-45c5-9d12-01c0247c7f8b"
  - "6eec31c9-af22-4438-a416-8f78fe0693f2"  
  gateways:
  - cassandra-tcp-gateway
  tls:
  - match:
    - port: 9042
      sniHosts:
        - "c8617bd4-af31-45c5-9d12-01c0247c7f8b"
        - dse-dc1-default-sts-0.dse-dc1-service.cass-operator.svc.cluster.local
    route:
    - destination:
        host: dse-dc1-default-sts-0.dse-dc1-service.cass-operator.svc.cluster.local
        port:
          number: 9042
  - match:
    - port: 9042
      sniHosts:
        - "6eec31c9-af22-4438-a416-8f78fe0693f2"
        - dse-dc1-default-sts-1.dse-dc1-service.cass-operator.svc.cluster.local
    route:
    - destination:
        host: dse-dc1-default-sts-1.dse-dc1-service.cass-operator.svc.cluster.local
        port:
          number: 9042
  - match:
    - port: 9042
      sniHosts:
        - dse-dc1-default-sts-2.dse-dc1-service.cass-operator.svc.cluster.local
        - "8717bdb7-d879-459e-979a-f1fa6795e88a"
    route:
    - destination:
        host: dse-dc1-default-sts-2.dse-dc1-service.cass-operator.svc.cluster.local
        port:
          number: 9042   

In this VirtualService we are explicitly matching on the hostnames that make up the Cassandra database. One important part to this puzzle is that under the covers the Cassandra client discovers the Cassandra nodes based on their HostID (that’s the UUID strings you see above). So to correctly set up our VirtualService we need to be able to match SNI names based on their Kubernetes DNS names as well as the internal Cassandra HostIDs. You can figure out the HostIDs and match them up to their respective Pod IP address by running the following:

$  kubectl exec -it -n cass-operator -c cassandra dse-dc1-default-sts-0 -- nodetool status

Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving/Stopped
--  Address     Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.20.1.7   236.97 KiB  1            100.0%            8717bdb7-d879-459e-979a-f1fa6795e88a  default
UN  10.20.0.7   232.38 KiB  1            100.0%            c8617bd4-af31-45c5-9d12-01c0247c7f8b  default
UN  10.20.2.10  171.54 KiB  1            100.0%            6eec31c9-af22-4438-a416-8f78fe0693f2  default

You can use the Pod IP addresses listed here to match them up.

Let’s create and apply our VirtualService:

$  kubectl apply -f dse-ingressgateway-sni.yaml

gateway.networking.istio.io/cassandra-tcp-gateway configured
virtualservice.networking.istio.io/sni-dsw-vs-from-gw created

At this point, we have Istio configured to use SNI routing for the Cassandra client. In the next section, we try connect a client to this configuration.

Connecting a TLS client to Cassandra with Istio SNI routing in place

At this point, we have set up the following architecture:

Now we just need a suitable client to test this out. Unfortunately the default cqlsh CLI client does not send SNI headers, so we need to use another client. We’ve included a simple Python client with the following configuration:


export CQLSH_HOST=dse-dc1-default-sts-0.dse-dc1-service.cass-operator.svc.cluster.local
export CQL_USER=dse-superuser
export CQL_PASSWORD=W_NYIVMR3TvPUwSFHDdt9r9xfn88QQ5F6LYwWMY_vDyY9WiVwQ5oHw
export CQLSH_PORT=9042 
export SSL_CERTFILE=dc1-root-ca.pem

Let’s source the env file (dse-client.env) and run the python client:

$  source dse-client.env
$  python dse-client.py

2020-10-12 07:47:07,547 [WARNING] cassandra.cluster: Cluster.__init__ called with contact_points specified, but no load_balancing_policy. In the next major version, this will raise an error; please specify a load-balancing policy. (contact_points = [<SniEndPoint: dse-dc1-default-sts-0.dse-dc1-service.cass-operator.svc.cluster.local:9042:dse-dc1-default-sts-0.dse-dc1-service.cass-operator.svc.cluster.local>], lbp = None)
2020-10-12 07:47:08,357 [INFO] cassandra.policies: Using datacenter 'dc1' for DCAwareRoundRobinPolicy (via host 'dse-dc1-default-sts-0.dse-dc1-service.cass-operator.svc.cluster.local:9042:dse-dc1-default-sts-0.dse-dc1-service.cass-operator.svc.cluster.local'); if incorrect, please specify a local_dc to the constructor, or limit contact points to local cluster nodes
2020-10-12 07:47:08,358 [INFO] cassandra.cluster: New Cassandra host <Host: dse-dc1-default-sts-0.dse-dc1-service.cass-operator.svc.cluster.local:9042:6eec31c9-af22-4438-a416-8f78fe0693f2 dc1> discovered
2020-10-12 07:47:08,359 [INFO] cassandra.cluster: New Cassandra host <Host: dse-dc1-default-sts-0.dse-dc1-service.cass-operator.svc.cluster.local:9042:8717bdb7-d879-459e-979a-f1fa6795e88a dc1> discovered
2020-10-12 07:47:09,723 [INFO] root: key        col1    col2
2020-10-12 07:47:09,723 [INFO] root: ---        ----    ----
2020-10-12 07:47:09,783 [INFO] root: got a row
2020-10-12 07:47:09,783 [INFO] root: dse        dc1

Conclusion

Setting up Cassandra on Kubernetes using StatefuleSets and headless services with the Datastax operator and Istio for SNI routing is very powerful but can be complex. In this blog post we saw ONE approach to doing this, however there are other options with their own tradeoffs. One part we did not cover in this blog is creating the correct certificates and SANs for each of the nodes. Please reach out to us if you need help with this kind of pattern or Istio support in general.

Special thanks to Joe Searcy and Vijay Kumar of T-Mobile for their contribution to this effort!