If your enterprise has been running Istio, it is very likely that you have looked at ambient mesh, and evaluated the merits of migrating to ambient.
Ambient represents an evolution of the implementation of Istio, one that strives to be more transparent, less intrusive, where service mesh capabilities are embedded deeper into the platform.
There is a lot to like about ambient, but at the same time, any migration of a major component of an enterprise platform is not something that should be taken lightly.
This document is an effort to help you think through the challenges, risks, and benefits associated with migrating to ambient mesh.
Challenges Around Migrating from Sidecars to Sidecarless
Migrations have to be performed carefully and be planned properly. It requires effort and coordination among teams, and it requires a certain degree of operational maturity - having automated processes in place. Some of the automated processes you built around Istio in sidecar mode will have to be revisited and retrofitted.
But the reason we're doing this is because ultimately the effort is worthwhile: we land in a better place where, for example, we no longer need to perform restarts of applications after rolling out an Istio upgrade.
Perhaps the main challenge of migrating to ambient is ensuring that all configured mesh features continue to function. Let us review some of the differences between Istio in sidecar mode and ambient.
Ambient mesh depends on the Kubernetes Gateway API. The Kubernetes Gateway API is the clear future path for configuring traffic management. It is more flexible, easier to use, and has much more community investment. But not all features of Istio can be expressed with the Kubernetes Gateway API. Fault injection is one example, so you will continue to use Istio APIs for some things.
More broadly, not all of Istio's features are recommended for adoption. A migration is a good time to review and assess which features you might want to reduce your dependence on. A good reference is John Howard's Opinionated Istio Feature Recommendations.
In sidecar mode you have the choice of using either iptables or the Istio CNI plugin to configure networking inside the pod to traverse through the sidecar. In ambient mode the CNI plugin becomes a requirement. One intermediate step to smooth out the path to migrating to ambient is switching from iptables to the Istio CNI agent. It's a net benefit as it reduces privileges on pods.
You should also be aware that some features are not supported in open-source Istio, that includes support for EnvoyFilters, multi-cluster, and zero-downtime migrations. If these are important to you, consider Gloo Mesh, Solo.io's enterprise version, which does support these features.
With open-source Istio, to have Layer 7 telemetry throughout will require waypoints. Here also, Gloo Mesh offers an enterprise feature whereby the ztunnel proxies have been enhanced to collect Layer 7 telemetry.
On the flip side, certain features are improved in ambient mode. For example, egress gateways were completely reworked in ambient mode. Configuring routing of external traffic through an egress gateway with TLS origination in ambient mode is a pleasure. In contrast, in sidecar mode the process was terribly complex.
Potential Migration Risks
In comparison to Istio in sidecar mode, ambient mesh is of course newer. To a certain extent adopting a newer technology is considered riskier, as the technology is not yet deployed as widely as its predecessor. On the other hand ambient was created as an improvement or refinement on the sidecar architecture, and so in some respects can be seen as the more mature, or evolved option. It is also important to point out that Istio ambient mesh was promoted to Generally Available (GA) in version 1.24; it is stable and, in other words, ready to go to production. The latest stable version of Istio at the time of this writing is 1.26.2.
The saying "with great power comes great responsibility" applies: with Istio ambient mesh, all it takes to add your workloads to the mesh is a Kubernetes label. What are the implications of somehow failing or omitting to apply that label? It's an easy mistake. Environments these days guard against such issues with automation, and automated tests. These types of issues are mitigated by (and goes back to the subject of) operational maturity and automation.
Another risk is attempting to perform a migration on your own, perhaps just relying on the Istio open source slack channels for support. A much safer approach is to do this with the assistance and support from Solo.io who brings to bear the engineering talent that helped build ambient, and the field engineering experience to guide and assist the customer with the task. Leveraging the experience of an organization that has already helped other enterprises with their migration is the safer and more logical approach.
Another, related type of risk is making sure to properly plan this effort and not underestimate it. Allot the right resources, engineers with the right expertise, and give them the autonomy to put a plan in place that is reasonable and that is flexible. A plan that affords application teams the flexibility to migrate their applications on their own schedule is preferable and less risky compared to a one-shot migration.
Rehearsing the plan and vetting it in a lower-level environment is an important mechanism for lowering risk and increasing familiarity with the process, adjusting it as necessary as it evolves. Solo.io also offers a migration tool for ambient, which can act as a safety net and ensure that no steps are left out or overlooked during a migration. We want to verify that no traffic flows are altered in undesirable ways, that authorization policies continue to function after the migration, that telemetry continues to be collected and that there are no regressions with respect to observability. Automated tests should be in place to ensure continued function of all policies.
It's important to run tests in lower level environments with traffic flowing through the system. Production traffic can be captured and sanitized for the purpose of replaying in test environments.
Benefits with Ambient Mesh
Overall, the reason for migrating to ambient is because we end up in a better place, one where we have all the advantages of the mesh but with more flexibility, lower cost, higher performance, and are poised to take advantage of future advances in the technology.
We are getting rid of all of these mesh "tentacles", those sidecars that we allowed Istio to pry open our pods in order to insert. We get the same benefits but with a much cleaner implementation of a mesh, in terms of its separation from your deployed enterprise artifacts, one that is not invasive.
Ambient completely separates mesh concerns from running workloads. That means your workloads do not need to be disturbed to operate a mesh: no sidecar injection into your pods, which means no app restarts when adding your app to the mesh, when upgrading your data plane with a new version of Envoy, or when security patching your Envoy sidecars.
In his blog Getting Started with Ambient Mesh: From 0 to 100 mph, my colleague Cory Jett articulates the core benefits of ambient mesh as follows:
- Operational Simplicity: Forget about the complexities of sidecar injection, pod restarts for upgrades, and managing a proxy in every single pod. With Ambient Mesh, you simply label a namespace, and your workloads are automatically part of the mesh.
- Significant Cost Savings: By replacing resource-intensive per-pod sidecars with lightweight, node-level ztunnels, Ambient Mesh drastically reduces the CPU and memory footprint of your service mesh infrastructure. This translates directly into lower cloud bills.
- Improved Performance: With a streamlined data plane, particularly for Layer 4 traffic, Ambient Mesh can offer better network performance and reduced latency, especially for workloads that don't need constant Layer 7 processing.
- Incremental Adoption: You don't have to go all-in at once. Start with the foundational Layer 4 security and observability, and then layer on Layer 7 features with waypoint proxies only for the services that genuinely need them. This "pay-as-you-go" model makes service mesh adoption less daunting.
Learn More About Ambient Mesh
Solo.io makes available a number of resources to support your migration to ambient.
The whitepaper Migrating from Sidecars to Sidecarless provides an end-to-end example and helps you think through approaches for performing a migration.
The ambientmesh.io blogs have a number of posts that discuss migration from the perspective of observability, traffic management, and security.
Try out a lab that walks you through a migration in a sandboxed environment, to give you a sense of the steps involved in performing a migration.