Two-Phased Canary Releases with Gloo, Part 3

Rick Ducott | April 27, 2020

In the first part of this series, we tried to come up with a robust workflow that you could use to perform canary testing and progressive delivery, so teams can safely deliver new versions of their services to users in a production environment.

In the second part, we looked at how we can scale this to potentially many teams, while maintaining a clean separation between domain and route-level configuration. This helps minimize the amount of configuration dev teams need to maintain to facilitate these workflows, and makes the platform self-service while protecting against misconfiguration.

In this part, we’re going to operationalize this workflow using Helm. This means developers can manage their versioning and rollout strategy by updating helm values, and they can invoke the canary upgrade workflow by performing helm upgrades.

Doing this provides a number of benefits:

  • It lowers the barrier to entry; the workflow is really easy to execute.
  • It drastically reduces the amount of (often) copy-pasted configuration different teams need to maintain.
  • It provides nice guard rails to minimize the potential for misconfiguration.
  • It becomes trivial to integrate into a GitOps / continuous delivery workflow.

As a disclaimer, a lot of services ultimately require fairly extensive configuration, some of which is unique to the particular use case. However, to the extent your teams can follow standard conventions, those conventions can be encoded in a helm chart like we do here. For this part, we’re going to create a minimal set of customizations for our scenario; in the future, we’ll make it more general.

Setting up our chart

Helm includes a tool for creating a chart. After running helm create gloo-app, you’ll get a directory containing a Chart.yaml, a default values.yaml, an empty crds directory, and a templates directory with a number of files.

We’ll update the Chart.yaml, delete the empty crds directory, and we’ll delete the contents of our template directory.
Over the next few steps we’ll start to add in templates to model our services.

Breaking down our requirements

In the last part, we deployed each service — echo and foxtrot — by applying four resources: a deployment, service, gloo upstream, and gloo route table. These will become our four templates in our chart. We’ll assume that the namespace for each service is created in advance or by automation, and is not included in the chart itself.

Note that this chart will not contain the Gloo control plane itself, nor will it contain our single virtual service that binds all our route tables to our domain. Those will be set up by a central platform team, and the helm chart is intended to be used by development teams to onboard to the platform.

Designing our initial values

We’re going to extract the bare minimum values so that the chart can deploy either our echo or foxtrot services at
a particular version. There are four values we need to provide to our application:

  • Service name (echo): We’ll take this from the name of the Helm release, and in our templates access it with {{ .Release.Name }}
  • Namespace (echo): this will also be provided by the Helm command, so we’ll access it with {{ .Release.Namespace }}.
  • App Version (v1): this will determine the version label to use on the pods for subset routing. For now, we’ll access it
    with {{ .Values.appVersion }}.
  • Api Group (example): this will be the label we use on route tables so it matches our virtual service selector, and for
    now we’ll use {{ .Values.apiGroup }}

Creating our templates

Now we can create our templates.

Deployment

First, let’s create a deployment inside templates/deployment.yaml. Our original echo deployment looked like this:

In order to generalize this into a template that can deploy either echo or foxtrot, at a particular version, let’s extract the values. That results in a template like this:

NOTE: Throughout this blog post, you’ll notice bolded instances of View YAML. I’ve added these expand/collapse links to make the blog easier to read and so you can select the exact YAML file you want to view at that time.

Service

Let’s do the same with our service. Our original service looked like this:

View YAML

We’ll create a template based on this with our values extracted:

View YAML

 

Upstream

Now that we have our deployment and service, we can add our upstream. Our original upstream looked like this:

View YAML

We’ll extract into the following template:
View YAML

Route table

Finally, we need to create the route table containing our initial route. To start, this looks like:
View YAML

We’ll create the following template based on this route table:
View YAML

 

Default Values

We can update the default values.yaml in our chart to have the following:

Testing our chart

We should now have the pieces we need to deploy echo and foxtrot with the following commands:

Sidebar on testing helm charts

There are different schools of though on how to test Helm charts. In Gloo, we’ve created some libraries so we can make sure that charts render with the expected resources when different sets of values are provided. Helm 3 also has some built in starting points for testing. In practice, a chart that is used for production should be well tested, but for this post we’ll skip unit testing.

The other technique we may use when developing these templates is regularly running the helm install command with the --dry-run flag, to ensure the template doesn’t have a syntax error and the resources render as we expect.

Updating our chart to support the two-phased upgrade workflow

Now that we have a base chart, we need to expose values that help execute our phased upgrade workflow.
Remember we had the following phases in our rollout strategy:

  • Phase 1: canary test using a special header that routes a targeted, small amount of traffic to the new version for functional testing
  • Phase 2: shift load to the new version while monitoring performance

Designing our helm values

We can start by designing some values that might be able to express different points in the workflow. In particular,
if we think about which resources we need to modify during our workflow, we have two main sections in our values:

  • Deployments: this section will be used to control which versions of the deployment to install. We may want to deploy a single version, or multiple. It may be desirable to deploy a version without routing to it at all, for instance to do shadowing.
    It also may be desirable to configure different versions differently, so we’ll design with that assumption.
  • Routes: this section will be used to configure which stage of the flow we are in. We’ll use routes.primary to configure our main route, and routes.canary to make all the changes necessary for different stages of the canary workflow.

Given these requirements, we’ll make our deployment helm value a map, where the key is the version name and the value is the version-specific configuration. Then, to configure our primary and canary version, we’ll use these keys to specify a version.

We’ll also use a simple approach to phase 2. We’ll add a routes.canary.weight parameter so the canary destination
can be given a weight between 0 and 100. The primary weight will be computed as 100 - routes.canary.weight.

Canary workflow, expressed as Helm values

Now that we’ve settled on a design, let’s actually express this workflow in a series of Helm values.

First, we have our starting state, where we deploy v1 of the application:

Next, we can deploy v2, but not yet set up any canary routes:

Now, we can start Phase 1, and provide a canary configuration:

When we are set to begin phase 2, we can start to adjust the weight of the canary route. Note that the previous values are equivalent to this (0 weight) for simplicity:

When we are halfway through the workflow, our values may look like this:

Finally, we have shifted all the traffic to the canary version:

And the last step is to decommission the old version:

Fantastic – we can now express each step of our workflow in declarative helm values. This means that once we update the chart, we’ll be able to execute our workflow simply through running helm upgrade.

Updating our templates

Now that we know how we want to express our values, we can update the templates. Since the upstream and service templates did not use anything from the Helm values, we don’t need to change them, we just need to update the deployment and route table.

Deployment

Our deployment template needs to be updated now that deployment is a map of version to configuration. We want to create one deployment resource for each version in the map, so we’ll wrap our template to range over the deployment value. Here’s what that looks like:

View YAML

A few notes about this:

  • Using range affects the scope, so .Release.Name and .Release.Namespace aren’t available inside the range. Instead, we’ll save those to variables that we can access.
  • We added a YAML separator since we’ll sometimes end up generating multiple objects.

Otherwise, this looks very similar to our template from before.

Route table

Now let’s update our route table to start expressing our canary route configuration when those values are specified.

View YAML

Let’s go section by section. The only change we made to the metadata at the top is we moved the apiGroup to a new
Helm value.

Things start to get interesting in the routes section. First, we’ve added a conditional block that will render
the canary route:

This route will be included as long as canary isn’t nil. It’ll use the version and headers values we
introduced to specify how the matching and subset routing should work.

We also need to customize our route action on our other route, because during Phase 2, we change it from a single to a multi destination and establish weights:
View YAML

To keep it simple, we’ll switch to multi-destination as soon as a canary value is provided. Unless a user explicitly adds a weight to the canary, this will have no effect and all the traffic will continue to be routed to the primary version. This just keeps our template and values minimal. The other interesting note is that we are using Helm arithmetic functions on the weights, so that we always end up with a total weight of 100, and so that a user only needs to specify a single weight in the values.

Running the workflow with our new chart

We’re now ready to execute our entire canary workflow, by performing a helm install and then upgrades and changing the values. I’ll assume we have Gloo deployed on a Kubernetes cluster, and don’t have any existing virtual services. If you had echo and foxtrot installed from before, just run kubectl delete ns echo foxtrot. We’ll start by deploying our generic virtual service for our example API group. Then, as we install new services with our helm chart, these routes will automatically be picked up. View YAML

We can apply this to the cluster with the following command:

kubectl apply -f vs.yaml

Now we’re ready to start deploying services with our chart.

Install echo

Let’s create a namespace for echo.
kubectl create ns echo

Now we can install with our initial values:

We’ll issue the following commands to install with helm:

And we can verify things are healthy, and test the route:

Great! Our installation is complete. We were able to bring online a new route to a new service in Gloo by installing
our chart.

For good measure, let’s also install foxtrot:

This is a huge improvement from before, where we were copy pasting a ton of yaml, and needed to do a lot more manual work to bring new services online. Let’s see how this extends to driving our upgrade workflow.

Starting the upgrade to echo-v2

We’ll use the following values for our helm upgrade, to initially deploy echo-v2. Note that this doesn’t yet create any canary routes, it simply adds the new deployment:

With a helm upgrade, we can execute this step:

We can see that the new deployment has been created and a v2 pod is now running:

And our route is still working:

Entering phase 1

Now we want to set up a route to match on the header stage: canary and route to the new version. All other
requests should continue to route to the old version. We can deploy that with another helm upgrade. We’ll
use these values:

We’ll upgrade with this command:

Now, in our testing, we’ll be able to start using the canary route:

Entering phase 2

As we discussed above, the last set of values will set up our weighted destinations for phase 2, but will set the weight to 0 on the canary route. So now, we can do another helm upgrade to change the weights. If we want to change the weights so 50% of the traffic goes to the canary upstream, we can use these values:

And here is our command to upgrade:

Now the routes are behaving as we expect. The canary route is still there, and we’ve shifted half the traffic on
our primary route to the new version:

Finishing phase 2

Let’s update the routes so 100% of the traffic goes to the new version. We’ll use these values:

And we can deploy with this command:

When we test the routes, we now see all the traffic going to the new version:

Decommissioning v1

The final step in our workflow is to decommission the old version. Our values reduce to this:

We can upgrade with this command:

We can make sure our route is health:

And we can make sure our old pod was cleaned up:

Closing thoughts

And that’s it! We’ve installed our service using Helm, and then executed our entire two-phased upgrade workflow
by performing helm upgrade and customizing the values.

From here, we can now look at two different improvements.

First, we’ll need to introduce a lot more values to our chart in order to start supporting more workflows, routing features, and developer use cases. There are potentially many things we’d need to customize, such as the deployment image, resource limits, options enabled, and so forth. We also may need to support multiple routes and alternative paths. Hopefully, at this point you are convinced that we could tackle those as extensions to our work so far, and we may dig into this in a future part.

Second, we’re now ready to integrate fully into a CI/CD process. Many users interact with their production clusters by customizing helm values and using automation, such as the Helm operator or the Flux project, to help deliver those
updates to the cluster when the source of truth (usually a git repo) has changed. Onboarding and upgrading new services to our platform has now become trivial with Helm, and can become automated through GitOps.

Get Involved in the Gloo Community

Gloo has a large and growing community of open source users, in addition to an enterprise customer base. If you’d like to get in touch with me (feedback is always appreciated!), you can find me on Slack.

To learn more about Gloo:

  • Check out the repo, where you can see the code and file issues
  • Check out the docs, which have an extensive collection of guides and examples
  • Join the slack channel and start chatting with the Solo.io engineering team and user community
  • Watch the demo on canary releases with Gloo
Back to Blog