SuperGloo to the Rescue! Making it easier to write extensions for Service Mesh

SuperGloo is an open source project for service mesh orchestration at scale. SuperGloo provides an opinionated abstraction layer that simplifies the installation, management, and operations of one or more service meshes like Istio, AWS App Mesh, Linkerd, and HashiCorp Consul deployed on-prem, in the cloud or any combination you need.

There are a growing number of articles on why SuperGloo, like Christian Posta’s “Solo.io Streamlines Service Mesh and Serverless Adoption for Enterprises in Google Cloud”. This article is going to focus on how SuperGloo can help software packages like Weaveworks Flagger work on multiple Services Meshes like Istio and AWS App Mesh that both support traffic shifting.

Flagger is a cool open source project that automates the promotion of Canary deployments of your Kubernetes services. You associate a Flagger Canary Kubernetes custom resource (CRD) with your deployment and Flagger follows your defined rules for helping roll out a new version. It detects when a new version of your service has deployed, instantiating your new version in parallel to your existing version, slowly shifting request traffic between the two versions, and using your defined Prometheus metric health checks to determine if Flagger should continue moving more traffic to the new version or roll back to the old version. Since a Canary CRD is a YAML file, this provides you a declarative way to ensure that all of your service upgrades follow your prescribed sophisticated rollout strategy and complements GitOps pipelines used in Weave Flux and JenkinsX.

More information on what Canary deployments and Traffic Shifting are is in the following posts. Gloo uses the same underlying data plane technology — Envoy — as Istio to provide traffic shifting capabilities used by Flagger and Knative. Gloo is an API/Function gateway and not a full Service Mesh, so Gloo can be used in use cases that do not require all of the power, and weight, of a full Service Mesh implementation.

This article quickly runs through setting up the Flagger podinfo example application on SuperGloo with Istio so you all can see what’s involved and try yourselves if you like.

Install Kubernetes and Helm

The first step on our journey is to get a basic local Kubernetes cluster running. My friend Christian Hernandez clued me in on kind (Kubernetes IN Docker) from the Kubernetes sig-testing team. It’s a fast, lightweight way to spin up/down a local cluster assuming you have a locally running copy of Docker, e.g., Docker Desktop. This example works equally well in minikube if you prefer. The following code does the basics you need for most Kubernetes clusters.

Creates a kind cluster with one control plane node and one worker node
Configures the KUBECONFIG as kind creates a separate kubeconfig file for each kind cluster
Installs Helm and Tiller with a service account for Tiller

# Create Kubernetes cluster using Kubernetes IN Docker (kind)
# Install: `go get -u sigs.k8s.io/kind`
cat <

Use SuperGloo to install and configure Istio

Here’s where the magic happens so let’s spend a little time teasing out all the things that are happening due to these few lines of code.

# Install Solo.io SuperGloo Service Mesh Operator
supergloo init

# Wait for SuperGloo to start
kubectl --namespace supergloo-system rollout status deployment/supergloo --watch=true

# Use SuperGloo to install and configure Istio the easy way
supergloo install istio \
--name=flagger-test \
--namespace=supergloo-system \
--installation-namespace=istio-system \
--version=1.0.6 \
--auto-inject=true \
--grafana=false \
--jaeger=false \
--mtls=false \
--prometheus=true \
--update=true

The first command supergloo init installs SuperGloo into your Kubernetes cluster that is equivalent to using Helm to install SuperGloo.

helm repo add supergloo http://storage.googleapis.com/supergloo-helm
helm upgrade --install supergloo supergloo/supergloo --namespace=supergloo-system

The second command kubectl --namespace supergloo-system rollout status deployment/supergloo --watch=true is a hack to wait till the SuperGloo deployment is fully deployed and running. It’s similar to using the --wait option on a Helm install.

The supergloo install istio ... command declares a custom resource and the SuperGloo controller installs and configures Istio as declared. In this case, we are installing Istio version 1.0.6 with Istio’s Prometheus installation and with Istio deploying sidecars in all pods within namespaces with the label istio-injection=enabled, i.e., Istio’s default behavior for auto-injecting sidecars. This imperative supergloo install istio command creates the following manifest that you could kubectl apply if you prefer. Refer to the full Install specification for more details.

apiVersion: supergloo.solo.io/v1
kind: Install
metadata:
name: flagger-test
namespace: supergloo-system
spec:
installationNamespace: istio-system
mesh:
istio:
enableAutoInject: true
installPrometheus: true
version: 1.0.6

Install Weaveworks Flagger

The following install Flagger and its dependent parts. The following is a quick summary of installing Flagger. More details at Flagger Doc Site.

Add a reference to Flagger helm repo
Wait for Tiller to be fully running. Only an issue for quick scripts that create Kubernetes clusters from scratch
Create a cluster role binding that allows Flagger to modify SuperGloo/Istio resources
Install core Flagger referencing Istio’s provided Prometheus and telling Flagger that SuperGloo is the mesh controller
Install Flagger’s Grafana dashboards which are not used as part of this demo
Install Flagger’s LoadTester which can help generate test traffic during a Canary deployment if there is not enough user traffic

# Install Flagger and Canary example application
helm repo add flagger https://flagger.app && helm repo update

# Check, and wait, for Tiller to be fully running
kubectl --namespace kube-system rollout status deployment/tiller-deploy --watch=true

# Create a cluster role binding so that Flagger can manipulate SuperGloo custom resources
# The Flagger Helm chart will create the flagger service account
kubectl create clusterrolebinding flagger-supergloo \
--clusterrole=mesh-discovery \
--serviceaccount=istio-system:flagger

# Install Flagger
helm upgrade --install flagger flagger/flagger \
--namespace=istio-system \
--set metricsServer=http://prometheus.istio-system:9090 \
--set meshProvider=supergloo:istio.supergloo-system

# Install Flagger Grafana dashboards
helm upgrade --install flagger-grafana flagger/grafana \
--namespace=istio-system \
--set url=http://prometheus.istio-system:9090

# Port forward to access Flagger Grafana dashboards
# kubectl --namepspace istio-system port-forward service/flagger-grafana 3000:80 &>/dev/null &

# Install Flagger LoadTester
helm upgrade --install flagger-loadtester flagger/loadtester \
--namespace=test \
--set cmd.timeout=1h

Install Weaveworks Flagger example application

The example application podinfo is a simple golang web application. It is instrumented with Prometheus so we can tell if it’s performing well (or not) that helps with our Canary deployment to validate that the new version is handling incoming traffic. The example application also has hooks to allow you to generate faults if you want to explicitly fail a deployment to see how the Flagger Canary handles that situation. Full details on the options around the Flagger example application are here. The following is the summary of installation steps.

Install a test namespace, the example Kubernetes Deployment manifest and an (optional) horizontal pod autoscaler
Deploy the Canary policy for the example application. More details on that in a moment
Wait for the Canary controller to report that it’s fully ready, which means Istio and Flagger are fully deployed and running

# Install Flagger test application
export REPO=https://raw.githubusercontent.com/weaveworks/flagger/master
kubectl apply \
--filename ${REPO}/artifacts/namespaces/test.yaml \
--filename ${REPO}/artifacts/canaries/deployment.yaml \
--filename ${REPO}/artifacts/canaries/hpa.yaml

# Configure Flagger Canary Custom Resource that watches for version changes for example application
kubectl --namespace test apply --filename $SCRRIPT_DIR/podinfo-canary.yaml

# Wait for Flagger and demo Canary to be fully running
echo "Waiting for Flagger Canary to initialize..."
until [ "$(kubectl --namespace test get canary podinfo -o=jsonpath='{.status.phase}')" == "Initialized" ]; do
sleep 5
done
echo "Canary Status is $(kubectl --namespace test get canary podinfo -o=jsonpath='{.status.phase}')"

The Canary manifest has a target reference that associates it with the podinfo deployment. The Canary analysis says that for every interval (1 minute) Flagger increment by stepWeight (10%) more request traffic to the new version up to maxWeight (50%) as long as the metrics stay within the defined healthy ranges. If more than threshold (5) health checks fail, then rollback to 100% of traffic to the old version and delete the new version deployment. There is also an optional section to allow the Flagger loadtester to generate additional traffic to help validate the new Canary version, i.e., hard to know if the new version works if it has not handled any requests.

apiVersion: flagger.app/v1alpha3
kind: Canary
metadata:
name: podinfo
namespace: test
spec:
# deployment reference
targetRef:
apiVersion: apps/v1
kind: Deployment
name: podinfo
# the maximum time in seconds for the canary deployment
# to make progress before it is rollback (default 600s)
progressDeadlineSeconds: 60
# HPA reference (optional)
autoscalerRef:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
name: podinfo
service:
# container port
port: 9898
# Istio gateways (optional)
gateways:
- public-gateway.istio-system.svc.cluster.local
- mesh
# Istio virtual service host names (optional)
hosts:
- app.example.com
canaryAnalysis:
# schedule interval (default 60s)
interval: 1m
# max number of failed metric checks before rollback
threshold: 5
# max traffic percentage routed to canary
# percentage (0-100)
maxWeight: 50
# canary increment step
# percentage (0-100)
stepWeight: 10
metrics:
- name: request-success-rate
# minimum req success rate (non 5xx responses)
# percentage (0-100)
threshold: 99
interval: 1m
- name: request-duration
# maximum req duration P99
# milliseconds
threshold: 500
interval: 30s
# generate traffic during analysis
webhooks:
- name: load-test
url: http://flagger-loadtester.test/
timeout: 5s
metadata:
cmd: "hey -z 1m -q 10 -c 2 http://podinfo.test:9898/"

Deploy a new image version and watch the canary deployment

First, we check on the currently deployed image version and print that out to help us verify if the test updates the service as we expect; should be quay.io/stefanprodan/podinfo:1.4.0. Then to help make the changes more visible, we trigger a background process to update the image version to quay.io/stefanprodan/podinfo:1.4.1 after a five-second delay. We then loop and print out the change events for podinfo to see the traffic weight changes until the Canary reports Success. You’d need to change this loop if you want to try introducing errors to see the Canary rollback. Lastly, we’ll print out the image version which should be quay.io/stefanprodan/podinfo:1.4.1 is everything succeeded.

# Check current image version
echo "Current image is $(kubectl --namespace test get deployment podinfo -o=jsonpath='{.spec.template.spec.containers[0].image}')"

# Trigger a version change in the background so we can see changes to Kubernetes
( sleep 5; kubectl --namespace test set image deployment/podinfo podinfod=quay.io/stefanprodan/podinfo:1.4.1 &>/dev/null ) &

# Watch Canary event log
until [ "$(kubectl --namespace test get canary podinfo -o=jsonpath='{.status.phase}')" == "Succeeded" ]; do
output="$(kubectl --namespace test get event --field-selector involvedObject.name=podinfo,type=Normal)"
clear
echo -e "$output\n" && sleep 10
done
clear
kubectl --namespace test get event --field-selector involvedObject.name=podinfo,type=Normal

# Check current image version
echo "Current image is $(kubectl --namespace test get deployment podinfo -o=jsonpath='{.spec.template.spec.containers[0].image}')"

Clean up Kubernetes

The final step is to clean up the Kubernetes cluster, which in our case means to delete the kind cluster by running kind delete cluster and unsetting the KUBECONFIG environment variable.

See Everything Together

Here’s an Asciinema screen recording of the whole example script running, and afterward you can see the whole script if you wanted to try yourself. The Asciinema recorder speeds up any long running commands, that is, if a command takes more than two seconds to execute the playback delays up to two seconds. This speedup reduces the run time from 15+ minutes to around two minutes.

‍

#!/usr/bin/env bash

set -eu -o pipefail

function print_error {
read line file <<<$(caller) echo "An error occurred in line $line of file $file:" >&2
sed "${line}q;d" "$file" >&2
}
trap print_error ERR

# Get directory this script is located in to access script local files
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"

# Turn on Bash verbose modes to show all commands before exectuing
set -v

# Create Kubernetes cluster using Kubernetes IN Docker (kind)
# Install: `go get -u sigs.k8s.io/kind`
cat /dev/null ) &

# Check current image version
echo "Current image is $(kubectl --namespace test get deployment podinfo -o=jsonpath='{.spec.template.spec.containers[0].image}')"

# Delete Kubernetes cluster
unset KUBECONFIG

kind delete cluster

Summary

Hopefully, this example gave you a taste of how SuperGloo supports a Canary deployment engine like Flagger. Before SuperGloo you’d either need to learn how to install all of Istio yourself or be constrained to using a managed Istio or App Mesh installation from GKE or AWS respectively. Those are good managed offerings, but they do limit your choices to the versions and configurations they currently support.

SuperGloo provides a great abstraction and management layer to help extensions leverage one or more Service Meshes without needing to get deep into the weeds of the huge API surface area of any one of the meshes like Istio or App Mesh. SuperGloo makes it easy for applications to use just what they need of the underlying meshes. That helps with adoption on Service Meshes based on the feedback I’ve heard, that is, many are currently experimenting with Istio or App Mesh or Linkerd for just on capability, typically traffic shifting or mutual TLS, and they’re finding it difficult to manage and configure the whole mesh even though they aren’t using those other capabilities. SuperGloo to the rescue to help make it easier to use just the parts of Service Meshes that add value today, and allow you to add more as you need it including mixing and matching multiple service meshes easily to get the biggest return on your investment.

I strongly encourage you to learn more yourself as its fun to learn new technology, especially tech that helps you solve complex challenges and accelerates your ability to deploy larger and more sophisticated systems.

SuperGloo to the Rescue! Making it easier to write extensions for Service Mesh

Install Kubernetes and Helm

Use SuperGloo to install and configure Istio

Install Weaveworks Flagger

Install Weaveworks Flagger example application

Deploy a new image version and watch the canary deployment

Clean up Kubernetes

See Everything Together

Summary

Featured content

How Ambient Mesh Delivers Advanced Resource and Cost Savings

Getting Started with Ambient Mesh: From 0 to 100 mph

Agent Discovery, Naming, and Resolution - the Missing Pieces to A2A

Part Two: MCP Authorization The Hard Way

Part One: MCP Authorization The Hard Way

Agent Identity and Access Management - Can SPIFFE Work?

Deep Dive into llm-d and Distributed Inference

Gloo Mesh 2.8 simplifies service mesh operations with new enhanced user experience across multi-cluster environments.

Gloo Gateway 1.19 accelerates context-rich, real-time AI apps with Gateway API

llm-d: Distributed Inference Serving on Kubernetes

AI Reliability Engineering For More Dependable Humans

Kubernetes Identity the Right Way with SPIRE and Ambient

Optimizing GenAI in Production: High-Value Use Cases for AI Gateways

Solo.io Recognized as a Visionary in the 2024 Gartner® Magic Quadrant™ for API Management for the SECOND year in a row.

Guardians of the Governance: GenAI Gateway Guidance with GitOps and Gloo

Istio Ambient Waypoint Proxy explained

Hands-On with the Kubernetes Gateway API and Envoy Proxy: A Tutorial with GitOps and Gloo Gateway

Istio and the State of DevOps: Enhancing Key Metrics

What is an AI Gateway and its role in AI Applications?

Best practices for secure Istio deployment with Gloo Mesh Core

Gloo Mesh 2.6: Istio's Ambient mode now ready for production

HTTP Observability Without Compromises

Advance your knowledge of service mesh tech with Solo.io Academy certifications

Service Mesh for the developer workflow, a series

Challenges of adopting service mesh in enterprise organizations

Service Mesh in the Real World #2 — Ingress Traffic Control

Service Mesh in the Real World Video Series – Episode # 1: Egress Traffic

Service Mesh the easy way with AWS App Mesh and SuperGloo

Webinar Recap: Intro to Service Mesh Hub and SMI

D-TECK Uses Solo.io Gloo Gateway and Google Cloud to Help Businesses Make Better HR Decisions

Minimize the blast radius of changes with Solo.io Gloo Gateway and Weaveworks Flagger

Announcing Service Mesh Interface (SMI) Support and Collaboration

Service Mesh Interface (SMI) and our Vision for the Community and Ecosystem

The need for a standard, service mesh API

SuperGloo to the Rescue! Making it easier to write extensions for Service Mesh

Introducing The Service Mesh Hub -everything you need for your service mesh

Kubernetes Ingress Past, Present, and Future

Solo.io Streamlines Service Mesh and Serverless Adoption for Enterprises in Google Cloud

Ingenico

ParkMobile

Vonage

Domino’s Pizza

Gloo Mesh Feature Comparison

Service Mesh for Developers, Part 1: Exploring the Power of Observability and OpenTelemetry

Service Mesh at Scale

Compare Capabilities of the Top Service Mesh Platforms

Compare Capabilities of the Top API Gateways

Establishing zero trust security for modern cloud architectures

Unlocking the Power of Your API Gateway

API Gateways: Productivity, Resilience, and Security for Next-Generation Cloud Applications

Driving Business Value with Istio

Service Mesh Vendor Comparison

Istio Then & Now

4 Reasons Why You Need an AI Gateway

Gloo Gateway vs. Kong

Gloo Gateway vs. Apigee

3 Reasons You Need an API Gateway for Microservices Apps

Ambient Mesh Lab: SPIRE integration with Gloo Mesh in Istio Ambient Mode

Ambient Mesh Lab: Introduction to ztunnel in Ambient Mesh

Solo Academy Course: Service Mesh Basics

Solo Academy Course: Istio Basics

Solo Academy Course: Envoy Basics

Solo Academy Course: API Gateway Basics

Solo Academy Course: Get Started with Istio Service Mesh

Solo Academy Course: Introduction to Envoy Proxy

Solo Academy Course: Deploying Istio for Production

Kgateway Lab: Integrating kgateway with Istio at Ingress

Kgateway Lab: Kgateway as a Waypoint

Kgateway AI Lab: Consumption Reporting

Kgateway AI Lab: Deploying kgateway as an AI Gateway