Running Istio with Cilium’s Kube Proxy Replacement

The kube-proxy replacement offered by Cilium’s CNI is a very powerful feature which can increase performance for large Kubernetes clusters. This feature uses an eBPF data plane to replace the kube-proxy implementations offered by Kubernetes distributions typically implemented with either iptables or ipvs. When using other networking infrastructure, the otherwise hidden Cilium eBPF implementation used to replace kube-proxy can bleed through and provide unintended behaviors. We see this when trying to use Istio service mesh with Cilium’s kube-proxy replacement: kube-proxy replacement, by default, breaks Istio.

To understand why Cilium’s kube-proxy replacement breaks Istio, we need to understand some basic networking details in Kubernetes and Cilium. For example, in Figure 1 below, kube-proxy is implemented with iptables (which is the default in Kubernetes). When an application wants to communicate with another service in a Kubernetes cluster, the application calls the service by host name/FQDN of the Kubernetes Service. The application makes the call, the host name for that service gets resolved by DNS to the Cluster IP for the service, and then kube-proxy (iptables in this case) handles the translation of the Cluster IP, load balancing, and NATing required to make this work transparently.

kube-proxy with iptables

Figure 1: Using default Kubernetes kube-proxy implementation with iptables

When you use Istio on Kubernetes, you see a different behavior: Istio does service discovery, health checking, connection pooling, more sophisticated Layer 7 load balancing—and more—inline with a service request. For example, in Figure 2, we see Istio uses a sidecar (although a sidecarless option is available) to implement its functionality.

sidecar proxy

Figure 2: Istio uses a sidecar proxy to implement service communications

In this example, the application looks up a Kubernetes Service FQDN with a DNS call (in this case, Istio can proxy DNS locally to save load on the Kubernetes DNS servers as well as implement zone-aware load balancing, et al.) and then takes the Cluster IP to make a call. This call goes through the Istio sidecar proxy which then gets matched to the appropriate upstream endpoint. When the traffic leaves the Pod, all of the routing and load balancing decisions have already been made and the call will be to an actual Pod IP address. Kube-proxy does not need to do anything in this case.

When you enable Cilium’s kube-proxy replacement in Kubernetes, iptables is no longer used to implement kube-proxy. Cilium implements kube-proxy functionality and uses its eBPF data plane to handle the load balancing across the various Pod IPs that make up a backend service.

Cilium kube-proxy

Figure 3: Cilium kube-proxy replacement transparently does the service to pod translation in the Pod

As you can see, Cilium’s eBPF data plane actually watches for TCP events within the Pod, and upon starting a connection, can transparently do the Cluster IP to Pod IP translation. In this case, the traffic leaving the application pod is already off to a destination Pod IP. Because of this behavior (translating directly in the Pod), we run into challenges with Istio. Istio expects to match on a Kubernetes Service’s Cluster IP. In fact, if a Pod IP is targeted directly, most of Istio’s functionality is bypassed.

Cilium implements this kube-proxy replacement as early as possible in the data path, but this doesn’t have to be the case. If we want Istio to correctly work, we can configure Cilium to do any of the Kubernetes Service Cluster IP to Pod IP translations outside of the Pod (like the default iptables implementation does—however Cilium will do this with eBPF).

To do this, we need to configure Cilium’s socket load balancing and operations functionality to apply to traffic when it’s seen on the host namespace only. We do this with the socketLB.hostNamespaceOnly=true configuration option. For example, installing into a Kind Kubernetes cluster could look like this:

helm upgrade --install cilium cilium/cilium --version 1.13.2 \
--namespace kube-system \
--set ipam.mode=kubernetes \
--set kubeProxyReplacement=strict \
--set socketLB.hostNamespaceOnly=true \
--set k8sServiceHost=kind1-control-plane \
--set k8sServicePort=6443

Please note, the hostNamespaceOnly feature is only available in Linux Kernel 5.7 or above.

When configured like this, Istio’s functionality (DNS capture, Service Load balancing, circuit breaking, traffic splitting, content-based routing, etc.) will continue to operate as expected.

The “host namespace only” settings simply tells Cilium to do the translation at the host level instead of the pod level (i.e., as soon as possible in the data path). There’s no difference from the point of functionality.

Running Istio and Cilium Together in Harmony

Istio and Cilium are better run together, however, among the low-level options and configurations to be mindful of, users must also understand how policies written for one layer (Layer 3 and 4 – CNI) affect or complement those for another layer (Layer 7 – service mesh). Cilium’s network policy objects and Istio’s policies should be complementary to provide defense in depth.

In my experience, SREs, developers, and platform users don’t want to know about the nuances and implementation details at these layers. Often they just want to think in terms of policy that affects “workload A” or “workload B”. Gloo Network for Cilium is Solo.io’s solution to managing and configuring application networking policies across workloads, clusters, zones, regions, and clouds with a single unified API to specify the policy. Want to continue to conversation on this? I am always happy to discuss more on Twitter or on Slack (@ceposta).