Cilium Service Mesh in Action
In December of 2021, I participated to the Cilium Service Mesh Beta program, so I was curious to try the service mesh features that have been incorporated in Cilium 1.12. It’s interesting to watch these technologies develop, and even better to dive into them to see what capabilities, features, benefits, and challenges they bring at each stage in their development. In this blog, you can follow along as I walk through installing Cilium 1.12 on GKE with the ingress featured enabled, create an ingress object to expose one service, create a second ingress object to expose another service, then review Layer 7 traffic management to see how this works.
Cilium ingress
We’ll walk through trying this out, and you are welcome to follow along. You can find the documentation for this feature here.
Install Cilium on GKE with the ingress feature enabled
Let’s start by deploying a new GKE Kubernetes cluster to test it out.
gcloud container clusters create cilium \
--node-taints node.cilium.io/agent-not-ready=true:NoExecute \
--zone europe-west1-d
Then, I can deploy Cilium with the ingress feature enabled on GKE using the cilium
CLI.
cilium install \
--kube-proxy-replacement=strict \
--helm-set ingressController.enabled=true
Here is the output:
🔮 Auto-detected Kubernetes kind: GKE
ℹ️ Using Cilium version 1.12.0
🔮 Auto-detected cluster name: gke-solo-test-236622-europe-west1-d-cilium
🔮 Auto-detected datapath mode: gke
✅ Detected GKE native routing CIDR: 10.16.0.0/14
ℹ️ helm template --namespace kube-system cilium cilium/cilium --version 1.12.0 --set cluster.id=0,cluster.name=gke-solo-test-236622-europe-west1-d-cilium,cni.binPath=/home/kubernetes/bin,encryption.nodeEncryption=false,gke.disableDefaultSnat=true,gke.enabled=true,ingressController.enabled=true,ipam.mode=kubernetes,ipv4NativeRoutingCIDR=10.16.0.0/14,kubeProxyReplacement=strict,nodeinit.enabled=true,nodeinit.reconfigureKubelet=true,nodeinit.removeCbrBridge=true,operator.replicas=1,serviceAccounts.cilium.name=cilium,serviceAccounts.operator.name=cilium-operator
ℹ️ Storing helm values file in kube-system/cilium-cli-helm-values Secret
🚀 Creating Resource quotas...
🔑 Created CA in secret cilium-ca
🔑 Generating certificates for Hubble...
🚀 Creating Service accounts...
🚀 Creating Cluster roles...
🚀 Creating ConfigMap for Cilium version 1.12.0...
🚀 Creating GKE Node Init DaemonSet...
🚀 Creating Agent DaemonSet...
🚀 Creating Operator Deployment...
⌛ Waiting for Cilium to be installed and ready...
✅ Cilium was successfully installed! Run 'cilium status' to view installation health
Let’s have a look at the different pods running in my cluster:
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system cilium-dlcxr 1/1 Running 0 2m43s
kube-system cilium-lb5hw 1/1 Running 0 2m43s
kube-system cilium-node-init-jqp5x 1/1 Running 0 2m43s
kube-system cilium-node-init-r2vn8 1/1 Running 0 2m43s
kube-system cilium-node-init-z799q 1/1 Running 0 2m43s
kube-system cilium-operator-598c495f5f-7w8k9 1/1 Running 0 2m43s
kube-system cilium-vnw5c 1/1 Running 0 2m43s
kube-system event-exporter-gke-5479fd58c8-zglht 2/2 Running 0 3m45s
kube-system fluentbit-gke-fg7nz 2/2 Running 0 3m4s
kube-system fluentbit-gke-m5hrx 2/2 Running 0 3m4s
kube-system fluentbit-gke-msjch 2/2 Running 0 3m5s
kube-system gke-metrics-agent-9gq9b 1/1 Running 0 3m4s
kube-system gke-metrics-agent-lwkqr 1/1 Running 0 3m4s
kube-system gke-metrics-agent-qbp7m 1/1 Running 0 3m5s
kube-system konnectivity-agent-78df777b57-hz9rv 1/1 Running 0 3m39s
kube-system konnectivity-agent-78df777b57-m268w 1/1 Running 0 106s
kube-system konnectivity-agent-78df777b57-zbx6b 1/1 Running 0 106s
kube-system konnectivity-agent-autoscaler-555f599d94-hjvwl 1/1 Running 0 3m37s
kube-system kube-dns-56494768b7-nnxvf 4/4 Running 0 102s
kube-system kube-dns-56494768b7-p54dr 4/4 Running 0 3m50s
kube-system kube-dns-autoscaler-f4d55555-qkcg9 1/1 Running 0 3m50s
kube-system kube-proxy-gke-cilium-default-pool-21a8e3bd-711h 1/1 Running 0 2m30s
kube-system kube-proxy-gke-cilium-default-pool-21a8e3bd-fznt 1/1 Running 0 2m24s
kube-system kube-proxy-gke-cilium-default-pool-21a8e3bd-gz19 1/1 Running 0 2m25s
kube-system l7-default-backend-69fb9fd9f9-j9b9f 1/1 Running 0 3m35s
kube-system metrics-server-v0.4.5-bbb794dcc-27s88 2/2 Running 0 87s
kube-system pdcsi-node-2h6q8 2/2 Running 0 3m4s
kube-system pdcsi-node-985tl 2/2 Running 0 3m4s
kube-system pdcsi-node-bsdh9 2/2 Running 0 3m5s
There’s no difference in term of what’s being deployed when the Ingress feature is enabled.
Next, let’s have a look at the Kubernetes services:
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-system default-http-backend NodePort 10.118.85.221 <none> 80:32731/TCP 4m37s
kube-system kube-dns ClusterIP 10.118.80.10 <none> 53/UDP,53/TCP 4m53s
kube-system metrics-server ClusterIP 10.118.89.169 <none> 443/TCP 4m14s
Again, nothing is different from a standard Cilium deployment. I was expecting a Kubernetes service for my ingresses, so let’s see how it goes when an ingress object is created. After that, deploy the Istio bookinfo demo application.
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.14/samples/bookinfo/platform/kube/bookinfo.yaml
Create an ingress object to expose one service
I’ll create my first ingress object using the cilium
class to expose the details service.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: details-ingress
namespace: default
spec:
ingressClassName: cilium
rules:
- http:
paths:
- backend:
service:
name: details
port:
number: 9080
path: /details
pathType: Prefix
I can see that a new Kubernetes service has been created:
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default cilium-ingress-details-ingress LoadBalancer 10.118.93.239 <pending> 80:30463/TCP 30s
It’s a LoadBalancer
service, which means that it will trigger the creation of a Google Cloud network load balancer. If I wait a little bit more, I can see that an external IP is now assigned to this service:
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default cilium-ingress-details-ingress LoadBalancer 10.118.93.239 34.140.121.201 80:30463/TCP 5m19s
I should now be able to access the details
service through this external IP:
export EXTERNAL_IP=$(kubectl get svc cilium-ingress-details-ingress -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
curl "http://${EXTERNAL_IP}/details/1"
So what happened here behind the scenes? First of all, let’s have a look at the Cilium logs:
kubectl -n kube-system logs -l k8s-app=cilium | grep cilium
Here is the output:
level=info msg="[lds: add/update listener 'cilium-ingress-default-details-ingress'" subsys=envoy-upstream threadID=81
level=info msg="Adding new proxy port rules for cilium-ingress-default-details-ingress:13252" proxy port name=cilium-ingress-default-details-ingress subsys=proxy
level=info msg="Adding new proxy port rules for cilium-ingress-default-details-ingress:14878" proxy port name=cilium-ingress-default-details-ingress subsys=proxy
level=info msg="Adding new proxy port rules for cilium-ingress-default-details-ingress:10390" proxy port name=cilium-ingress-default-details-ingress subsys=proxy
It has triggered the creation of a CiliumEnvoyConfig Kubernetes object:
apiVersion: cilium.io/v2
kind: CiliumEnvoyConfig
metadata:
creationTimestamp: "2022-07-29T12:45:28Z"
generation: 1
name: cilium-ingress-default-details-ingress
namespace: default
ownerReferences:
- apiVersion: networking.k8s.io/v1
kind: Ingress
name: details-ingress
uid: 33b25cfa-7ab7-491f-bc25-d5c161d156fc
resourceVersion: "3426"
uid: 0e887959-cb8c-45cd-8e99-9486bc50ab6e
spec:
backendServices:
- name: details
namespace: default
number:
- "9080"
resources:
- '@type': type.googleapis.com/envoy.config.listener.v3.Listener
filterChains:
- filterChainMatch:
transportProtocol: raw_buffer
filters:
- name: envoy.filters.network.http_connection_manager
typedConfig:
'@type': type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
httpFilters:
- name: envoy.filters.http.router
rds:
routeConfigName: cilium-ingress-default-details-ingress_route
statPrefix: cilium-ingress-default-details-ingress
listenerFilters:
- name: envoy.filters.listener.tls_inspector
name: cilium-ingress-default-details-ingress
socketOptions:
- description: Enable TCP keep-alive, annotation io.cilium/tcp-keep-alive. (default
to enabled)
intValue: "1"
level: "1"
name: "9"
state: STATE_LISTENING
- description: TCP keep-alive idle time (in seconds). Annotation io.cilium/tcp-keep-alive-idle
(defaults to 10s)
intValue: "10"
level: "6"
name: "4"
state: STATE_LISTENING
- description: TCP keep-alive probe intervals (in seconds). Annotation io.cilium/tcp-keep-alive-probe-interval
(defaults to 5s)
intValue: "5"
level: "6"
name: "5"
state: STATE_LISTENING
- description: TCP keep-alive probe max failures. Annotation io.cilium/tcp-keep-alive-probe-max-failures
(defaults to 10)
intValue: "10"
level: "6"
name: "6"
state: STATE_LISTENING
- '@type': type.googleapis.com/envoy.config.route.v3.RouteConfiguration
name: cilium-ingress-default-details-ingress_route
virtualHosts:
- domains:
- '*'
name: '*'
routes:
- match:
safeRegex:
googleRe2: {}
regex: /details(/.*)?$
route:
cluster: default/details:9080
maxStreamDuration:
maxStreamDuration: 0s
- '@type': type.googleapis.com/envoy.config.cluster.v3.Cluster
connectTimeout: 5s
name: default/details:9080
outlierDetection:
consecutiveLocalOriginFailure: 2
splitExternalLocalOriginErrors: true
type: EDS
typedExtensionProtocolOptions:
envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
'@type': type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
useDownstreamProtocolConfig:
http2ProtocolOptions: {}
services:
- listener: cilium-ingress-default-details-ingress
name: cilium-ingress-details-ingress
namespace: default
This contains a raw Envoy configuration which is then passed to the Envoy process running in the Cilium pods.
We can take a look at the Envoy config dump using the following commands:
cilium=$(kubectl -n kube-system get pods -l k8s-app=cilium -o jsonpath='{.items[0].metadata.name}')
kubectl -n kube-system exec -q $cilium -- apt update
kubectl -n kube-system exec -q $cilium -- apt -y install curl
kubectl -n kube-system exec -q $cilium -- curl -s --unix-socket /var/run/cilium/envoy-admin.sock http://localhost/config_dump
It contains the configuration we’ve seen in the CiliumEnvoyConfig Kubernetes object.
Create a second ingress object to expose another service
Now, let’s create another ingress object to expose the reviews
service.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: reviews-ingress
namespace: default
spec:
ingressClassName: cilium
rules:
- http:
paths:
- backend:
service:
name: reviews
port:
number: 9080
path: /reviews
pathType: Prefix
I can see that a new Kubernetes service has been created tor this ingress:
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default cilium-ingress-details-ingress LoadBalancer 10.118.93.239 34.140.121.201 80:30463/TCP 53m
default cilium-ingress-reviews-ingress LoadBalancer 10.118.90.144 35.187.118.187 80:30680/TCP 93s
I should now be able to access the reviews
service through this external IP:
export EXTERNAL_IP=$(kubectl get svc cilium-ingress-reviews-ingress -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
curl "http://${EXTERNAL_IP}/reviews/1"
As you can see, a new Kubernetes service (so a new Cloud load balancer) is created for each ingress resource, which may not be something you’d expect.
L7 traffic management
Another functionality that is part of the Cilium service mesh features is L7 traffic management. Explaining it is quite straightforward because the features mainly consist of letting a user create CiliumEnvoyConfig objects to apply raw Envoy configuration. Here, what I want to achieve is sending 90% of the requests to the reviews-v1
pods and 10% to reviews-v2
pods when any pod sends a request to the reviews
Kubernetes service.
In order to accomplish this, I took a look at the example in this documentation to figure out how to reach my goal.
I started with the following Kubernetes object:
apiVersion: cilium.io/v2
kind: CiliumEnvoyConfig
metadata:
name: envoy-lb-listener
spec:
services:
- name: reviews
namespace: default
resources:
- "@type": type.googleapis.com/envoy.config.listener.v3.Listener
name: envoy-lb-listener
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: envoy-lb-listener
rds:
route_config_name: lb_route
http_filters:
- name: envoy.filters.http.router
- "@type": type.googleapis.com/envoy.config.route.v3.RouteConfiguration
name: lb_route
virtual_hosts:
- name: "lb_route"
domains: [ "*" ]
routes:
- match:
prefix: "/"
route:
weighted_clusters:
clusters:
- name: "default/reviews-v1"
weight: 90
- name: "default/reviews-v2"
weight: 10
retry_policy:
retry_on: 5xx
num_retries: 3
per_try_timeout: 1s
- "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
name: "default/reviews-v1"
connect_timeout: 5s
lb_policy: ROUND_ROBIN
type: EDS
outlier_detection:
split_external_local_origin_errors: true
consecutive_local_origin_failure: 2
- "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
name: "default/reviews-v2"
connect_timeout: 3s
lb_policy: ROUND_ROBIN
type: EDS
outlier_detection:
split_external_local_origin_errors: true
consecutive_local_origin_failure: 2
Next, check to see if it works using the details
pod:
kubectl exec -it deploy/ratings-v1 -- curl reviews:9080/reviews/0
I got the following error:
no healthy upstream
To learn more, let’s have a look at the Envoy clusters. I didn’t find information about how to display the Envoy clusters in the Cilium documentation, but I’ve found a way to get it using the following commands.
pod=$(kubectl -n kube-system get pods -l k8s-app=cilium -o jsonpath='{.items[0].metadata.name}')
kubectl -n kube-system exec -q ${pod} -- curl -s --unix-socket /var/run/cilium/envoy-admin.sock http://localhost/clusters | grep reviews
Here is the output:
default/reviews-v2::observability_name::default/reviews-v2
default/reviews-v2::outlier::success_rate_average::-1
default/reviews-v2::outlier::success_rate_ejection_threshold::-1
default/reviews-v2::outlier::local_origin_success_rate_average::-1
default/reviews-v2::outlier::local_origin_success_rate_ejection_threshold::-1
default/reviews-v2::default_priority::max_connections::1024
default/reviews-v2::default_priority::max_pending_requests::1024
default/reviews-v2::default_priority::max_requests::1024
default/reviews-v2::default_priority::max_retries::3
default/reviews-v2::high_priority::max_connections::1024
default/reviews-v2::high_priority::max_pending_requests::1024
default/reviews-v2::high_priority::max_requests::1024
default/reviews-v2::high_priority::max_retries::3
default/reviews-v2::added_via_api::true
default/reviews-v1::observability_name::default/reviews-v1
default/reviews-v1::outlier::success_rate_average::-1
default/reviews-v1::outlier::success_rate_ejection_threshold::-1
default/reviews-v1::outlier::local_origin_success_rate_average::-1
default/reviews-v1::outlier::local_origin_success_rate_ejection_threshold::-1
default/reviews-v1::default_priority::max_connections::1024
default/reviews-v1::default_priority::max_pending_requests::1024
default/reviews-v1::default_priority::max_requests::1024
default/reviews-v1::default_priority::max_retries::3
default/reviews-v1::high_priority::max_connections::1024
default/reviews-v1::high_priority::max_pending_requests::1024
default/reviews-v1::high_priority::max_requests::1024
default/reviews-v1::high_priority::max_retries::3
default/reviews-v1::added_via_api::true
default/reviews:9080::observability_name::default/reviews:9080
default/reviews:9080::outlier::success_rate_average::-1
default/reviews:9080::outlier::success_rate_ejection_threshold::-1
default/reviews:9080::outlier::local_origin_success_rate_average::-1
default/reviews:9080::outlier::local_origin_success_rate_ejection_threshold::-1
default/reviews:9080::default_priority::max_connections::1024
default/reviews:9080::default_priority::max_pending_requests::1024
default/reviews:9080::default_priority::max_requests::1024
default/reviews:9080::default_priority::max_retries::3
default/reviews:9080::high_priority::max_connections::1024
default/reviews:9080::high_priority::max_pending_requests::1024
default/reviews:9080::high_priority::max_requests::1024
default/reviews:9080::high_priority::max_retries::3
default/reviews:9080::added_via_api::true
default/reviews:9080::10.16.2.100:9080::cx_active::0
default/reviews:9080::10.16.2.100:9080::cx_connect_fail::0
default/reviews:9080::10.16.2.100:9080::cx_total::0
default/reviews:9080::10.16.2.100:9080::rq_active::0
default/reviews:9080::10.16.2.100:9080::rq_error::0
default/reviews:9080::10.16.2.100:9080::rq_success::0
default/reviews:9080::10.16.2.100:9080::rq_timeout::0
default/reviews:9080::10.16.2.100:9080::rq_total::0
default/reviews:9080::10.16.2.100:9080::hostname::
default/reviews:9080::10.16.2.100:9080::health_flags::healthy
default/reviews:9080::10.16.2.100:9080::weight::1
default/reviews:9080::10.16.2.100:9080::region::
default/reviews:9080::10.16.2.100:9080::zone::
default/reviews:9080::10.16.2.100:9080::sub_zone::
default/reviews:9080::10.16.2.100:9080::canary::false
default/reviews:9080::10.16.2.100:9080::priority::0
default/reviews:9080::10.16.2.100:9080::success_rate::-1.0
default/reviews:9080::10.16.2.100:9080::local_origin_success_rate::-1.0
default/reviews:9080::10.16.0.199:9080::cx_active::1
default/reviews:9080::10.16.0.199:9080::cx_connect_fail::0
default/reviews:9080::10.16.0.199:9080::cx_total::1
default/reviews:9080::10.16.0.199:9080::rq_active::0
default/reviews:9080::10.16.0.199:9080::rq_error::0
default/reviews:9080::10.16.0.199:9080::rq_success::1
default/reviews:9080::10.16.0.199:9080::rq_timeout::0
default/reviews:9080::10.16.0.199:9080::rq_total::1
default/reviews:9080::10.16.0.199:9080::hostname::
default/reviews:9080::10.16.0.199:9080::health_flags::healthy
default/reviews:9080::10.16.0.199:9080::weight::1
default/reviews:9080::10.16.0.199:9080::region::
default/reviews:9080::10.16.0.199:9080::zone::
default/reviews:9080::10.16.0.199:9080::sub_zone::
default/reviews:9080::10.16.0.199:9080::canary::false
default/reviews:9080::10.16.0.199:9080::priority::0
default/reviews:9080::10.16.0.199:9080::success_rate::-1.0
default/reviews:9080::10.16.0.199:9080::local_origin_success_rate::-1.0
default/reviews:9080::10.16.0.229:9080::cx_active::0
default/reviews:9080::10.16.0.229:9080::cx_connect_fail::0
default/reviews:9080::10.16.0.229:9080::cx_total::0
default/reviews:9080::10.16.0.229:9080::rq_active::0
default/reviews:9080::10.16.0.229:9080::rq_error::0
default/reviews:9080::10.16.0.229:9080::rq_success::0
default/reviews:9080::10.16.0.229:9080::rq_timeout::0
default/reviews:9080::10.16.0.229:9080::rq_total::0
default/reviews:9080::10.16.0.229:9080::hostname::
default/reviews:9080::10.16.0.229:9080::health_flags::healthy
default/reviews:9080::10.16.0.229:9080::weight::1
default/reviews:9080::10.16.0.229:9080::region::
default/reviews:9080::10.16.0.229:9080::zone::
default/reviews:9080::10.16.0.229:9080::sub_zone::
default/reviews:9080::10.16.0.229:9080::canary::false
default/reviews:9080::10.16.0.229:9080::priority::0
default/reviews:9080::10.16.0.229:9080::success_rate::-1.0
default/reviews:9080::10.16.0.229:9080::local_origin_success_rate::-1.0
From this, I can see that Cilium hasn’t associated the reviews-v1
and reviews-v2
Envoy clusters with any endpoint.
Then I found the following statement in the troubleshooting guide:
The Envoy Discovery Service (EDS) has a name that follows the convention <namespace>/<service-name>:<port>.
I was expecting it to follow the convention <namespace>/<deployment-name>:<port>
instead.
So, the only option I have is to define a different Kubernetes service for each version:
apiVersion: v1
kind: Service
metadata:
name: reviews-v1
labels:
app: reviews
service: reviews
version: v1
spec:
ports:
- port: 9080
name: http
selector:
app: reviews
version: v1
---
apiVersion: v1
kind: Service
metadata:
name: reviews-v2
labels:
app: reviews
service: reviews
version: v2
spec:
ports:
- port: 9080
name: http
selector:
app: reviews
version: v2
However, I still get the same error and no endpoint associated with the new Kubernetes services.
After that, I realized the the CiliumEnvoyConfig
object created for the Ingress object was referencing backendServices
.
So, I took a look at the CRD definition and found the purpose of this option:
BackendServices specifies Kubernetes services whose backends are automatically synced to Envoy using EDS. Traffic for these services is not forwarded to an Envoy listener. This allows an Envoy listener load balance traffic to these backends while normal Cilium service load balancing takes care of balancing traffic for these services at the same time.
Great, this looks like exactly what I need!
Let’s update the CiliumEnvoyConfig
object as follows:
apiVersion: cilium.io/v2
kind: CiliumEnvoyConfig
metadata:
name: envoy-lb-listener
spec:
services:
- name: reviews
namespace: default
backendServices:
- name: reviews-v1
namespace: default
- name: reviews-v2
namespace: default
resources:
- "@type": type.googleapis.com/envoy.config.listener.v3.Listener
name: envoy-lb-listener
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: envoy-lb-listener
rds:
route_config_name: lb_route
http_filters:
- name: envoy.filters.http.router
- "@type": type.googleapis.com/envoy.config.route.v3.RouteConfiguration
name: lb_route
virtual_hosts:
- name: "lb_route"
domains: [ "*" ]
routes:
- match:
prefix: "/"
route:
weighted_clusters:
clusters:
- name: "default/reviews-v1"
weight: 90
- name: "default/reviews-v2"
weight: 10
retry_policy:
retry_on: 5xx
num_retries: 3
per_try_timeout: 1s
- "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
name: "default/reviews-v1"
connect_timeout: 5s
lb_policy: ROUND_ROBIN
type: EDS
outlier_detection:
split_external_local_origin_errors: true
consecutive_local_origin_failure: 2
- "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
name: "default/reviews-v2"
connect_timeout: 3s
lb_policy: ROUND_ROBIN
type: EDS
outlier_detection:
split_external_local_origin_errors: true
consecutive_local_origin_failure: 2
And now it works! Ninety percent of the time I get the following output:
{"id": "0","podname": "reviews-v1-55b668fc65-wggz8","clustername": "null","reviews": [{ "reviewer": "Reviewer1", "text": "An extremely entertaining play by Shakespeare. The slapstick humour is refreshing!"},{ "reviewer": "Reviewer2", "text": "Absolutely fun and entertaining. The play lacks thematic depth when compared to other plays by Shakespeare."}]}
It wasn’t easy, but I’ve finally found the right syntax to achieve my goal!
A couple of other issues I went through:
- Many components in the Envoy configuration must of have unique names across all the
CiliumEnvoyConfig
. The listener names and the route names, for example. Otherwise, strange behaviors occur, like a route of oneCiliumEnvoyConfig
used by anotherCiliumEnvoyConfig
. - When I submit a new
CiliumEnvoyConfig
, I didn’t know if it was accepted or not. I had to check in the Cilium Operator logs to check if the object has been rejected.
Summary
Let’s start with the ingress controller.
Having a different Kubernetes service (so, a new Cloud load balancer) created for each ingress would require creation of multiple ingress objects by multiple application teams along with many Cloud load balancers. It would cost a lot of money and it would become very complicated for them to share the same domain name. Also, currently it doesn’t really support any Kubernetes annotations, so you can’t do much with it.
All the features you find in other Kubernetes ingresses could be implemented in Cilium. However, it took other companies and communities years to do it right. There are many options out there, including powerful Cloud Native API gateways.
What about the L7 traffic management features ?
I firmly believe that Envoy isn’t supposed to be configured directly by humans, but by a control plane. Despite the fact that I have 5 years of experience working with Envoy and have built the Envoy UI tool to allow people to understand Envoy configurations, it took me a lot of time and effort to find the right syntax to achieve my fairly simple goal. So, I think asking users to provide raw Envoy configuration isn’t a good idea. It’s complex to find the right syntax and having several users submitting their own configurations will quickly generate conflicts that will be nearly impossible to troubleshoot.
I am also concerned about many applications are sharing the same Envoy instance for L7 traffic management (noisy neighbors, scalability, and so on). While it is possible for a new control plane could be built from scratch, it does take years to get right.
I love Cilium as a CNI and all the eBPF and performance optimization Cilium provides for L3/L4 capabilities.