Migrating from sidecars to ambient with zero downtime

Learn how to migrate from Istio sidecars to ambient mesh with zero downtime. Step-by-step strategies, best practices, and tools to ensure a safe transition.

Istio is not only a very popular open source project but has been one of the most widely adopted service mesh platforms.

The Istio project has not stood still, and its implementation has evolved in a number of ways, both to keep up with changes in the cloud-native landscape, and to further improve and streamline Istio as a service mesh platform.

At the tip of that evolution is Istio's ambient mode, a sidecarless model that represents a departure from the original sidecar-based architecture.

Many organizations today run Istio in sidecar mode.

In this blog, we discuss the concerns associated with migrating from a service mesh running Istio in sidecar mode to ambient mesh.

Simply upgrading Istio will likely not produce the desired result. We must think about what has changed, and how we might adopt a step-wise strategy that ensures a successful migration. The approach I recommend is to assess the gap from current state to desired state, and to move carefully in discrete steps towards the desired state.

This is the same approach that software developers follow to avoid introducing bugs:  to isolate and test each change individually, avoiding the introduction of multiple changes at once, which helps us reason about the change, and how it might affect the behavior of the system under test.

Evolution of traffic capture and redirection

Take traffic capture and redirection as an example:  originally implemented using an Init Container that wrote iptables rules to reconfigure the routing of inbound and outbound network requests to the Envoy sidecar. Since then, Istio has evolved to use a CNI node agent, an approach that is not only more secure, but is also a requirement for running Istio in ambient mode.

If today you are still running Istio using the original Init Container approach, a simple first step towards moving closer to being able to run in ambient mode is to switch to using the CNI node agent.

Make a small change:  reconfigure sidecar mode to use the Istio CNI Agent, and run it in production for a while. These small changes will have the effect of dramatically lowering risk.

Evolution of traffic management API

On Kubernetes, we started out with the Ingress resource, a perhaps under-specified API that was inadequate to completely capture traffic management concerns, not only at ingress, but also in the mesh.  Istio's approach initially was to support the Ingress resource, and in parallel, also introduce its own resources:  Gateway, VirtualService, and DestinationRule, representing both a more complete and a more flexible API for capturing the configuration of the network to route traffic to its destination.

Today the Kubernetes Gateway API is a stable API and ready for production. Istio fully supports this new API, and in ambient mode, the Kubernetes Gateway API is an outright requirement.

If today you are still using Istio's Gateway and VirtualService resources, a simple second step towards moving closer to being able to run in ambient mode is to switch to Kubernetes Gateway API resources.

Make a small change: specify all of your traffic management configuration resources in terms of the new API. Run Istio in sidecar mode in this fashion for a while, to ensure that there are no issues with your "refactored" traffic policy.

Gateway provisioning

Another aspect of the Kubernetes Gateway API is the ability to provision Gateways on-demand. With the new API, you no longer need to pre-install the ingress and egress gateway components of Istio. The provisioning takes place on-demand upon applying the Gateway resource to a cluster.

If today you provision your gateways statically using those Istio ingress and egress components, migrate to using the Kubernetes Gateway API. You won't have to worry about upgrading your gateways in the context of performing a migration to ambient mode.

Versions and installation

If today you are running an older version of Istio, upgrade to the latest version first.

If you are currently installing and upgrading Istio using the CLI, switch to Helm. Helm is encouraged for installation and upgrades for production use in ambient mode.

From an installation or upgrade point of view, you will discover that the delta from sidecar to ambient will have shrunk:  other than enabling the ambient profile, you are already running all (istio/base, istio/istiod, and istio/cni) but the istio/ztunnel helm chart.

Leverage the migration assistant

Solo.io offers a free migration assistant. The assistant is a carefully developed tool that inspects your cluster and informs you of the specific steps needed to perform a migration.

The assistant divides the task of performing a migration into multiple phases:

  • pre-reqs:  check that your environment meets all prerequisites to run Istio in ambient mode.  This includes checking that your cluster's CNI implementation is compatible, that the Istio version you are running is compatible, etc..
  • cluster-setup:  informs you if you are not running the ztunnel component, if the configuration does not enable the ambient profile, if any required CRDs are not installed.
  • deploy-waypoints:  performs an analysis of your existing mesh configuration and informs you where you will need waypoints and how to deploy them
  • migrate-policies:  verifies that all mesh policies are compatible; will automatically translate your existing policies to ambient-compatible ones.
  • use-waypoints:  tells you how to label your namespaces and services to use the waypoints you have provisioned
  • policy-simplification:  this phase involves the detection of any redundant policies and their removal
  • remove-sidecars:  this final step is about switching from sidecar mode to ambient mode by relabeling your namespaces and restarting your workloads.  When this step is complete, your mesh is not running in ambient mode.

Leveraging this tool for your migration will pay dividends, and might just save you from overlooking important steps in your migration plan.

Sidecar interoperability

One important aspect of interoperability with sidecars is to ensure that sidecars use the HBONE protocol when communicating with ambient workloads.

The migration assistant will flag any sidecars that do not have the flag 'ENABLE_HBONE' set.  When switching to ambient mode, this flag is enabled by default, but the sidecars need to be restarted. You will see a warning from the assistant similar to:

* Sidecar xyz is missing 'ENABLE_HBONE'. Upgrade Istio with '--set profile=ambient' and restart the pod.


As indicated by the feedback from the assistant, the process of ensuring this interoperability during migration is simple, and involves reconfiguring your helm values to use the ambient profile, and then restarting each workload.  Be sure to perform a rolling deployment with multiple replicas to ensure no downtime.

Mesh policy enforcement

An important difference between sidecar and ambient modes is the location of layer 7 policy enforcement.

In sidecar mode, routing and load-balancing rules are assigned to and performed in the client's sidecar, while authorization policies are enforced and performed on the server's sidecar. When switching to ambient mode, any layer 7 traffic policies and authorization policies now shift to the waypoint.

When provisioning waypoints and associating them with specific namespaces and services, Istio ensures that requests to those services are routed through the waypoint for policy enforcement.

If you plan to perform a migration to ambient on a team by team, or service by service basis, over an extended time period, you will have transitional phases where a client is still using sidecars while a service has already transitioned to ambient mode.

It is important to realize that with Open-source Istio, those sidecars do not know to route those requests through the waypoint, resulting in those mesh policies getting bypassed. Solo.io offers a distribution of Istio which implements this sidecar-to-ambient interoperability and which ensures that policies remain enforced even in these transitional phases of migration.

If this scenario is applicable to your use case, be sure to utilize Solo.io's distribution of Istio.  The switch is simple and straight-forward, involving only the need to update Istio's HUB and TAG environment variables which alter and indicate the target repository and version (respectively) of Istio that you wish to install.

Summary

Solo.io publishes a whitepaper on sidecar to sidecarless migration which includes a concrete example that you can work through, to get a feel for the migration process and what it entails. We invite you to read the whitepaper and delve further into the details.

Cloud connectivity done right