Deciding on Rust-based ztunnel for Istio ambient mesh
When Istio ambient service mesh was announced on Sept 7, 2022, the ztunnel (zero-trust tunnel) was implemented using Envoy proxy. We chose to implement ztunnel using Envoy proxy initially so that we could get Istio ambient mesh for everyone to install and explore as early as possible. Envoy has been the default sidecar choice for Istio since the very beginning of the project in 2017, and we have observed massive adoption of Envoy along with Envoy rising to a top-level graduation project in CNCF. Given that we use Envoy for Layer 7 processing for sidecars, gateways, and waypoint proxies, it was natural for us to start implementing ztunnel using Envoy.
In mid-Oct 2022, John Howard from Google proposed to the Istio community to rewrite ztunnel in Rust. At Solo we are huge fans of Envoy, so we went through every reason why Envoy is a good choice for ztunnel.
Below are the top few things we love about Envoy-based ztunnel:
Developer efficiency
Most developers at Solo.io and in the Istio community don’t know Rust; it’s a new language we need to spend time learning. Given that waypoint proxies, sidecars, and gateways will continue to use Envoy (C++), and the Istio control plane uses Golang, developers would have to learn Rust as a third language. Rust is famous for being hard to pick up and compile for newcomers.
It is worth calling out that in the long term in Rust it is much easier to develop safe and performant code, especially for async requests when compared to Envoy’s current callback-based C++ framework. This impact is somewhat negated by the simplicity of ztunnel; once complete there are not many anticipated changes.
CVEs and common code
The Istio community already handles CVEs for Envoy as the default proxy for sidecars, gateways, and waypoint proxies; there are no net-new CVEs to distribute fixes for with an Envoy-based ztunnel. A good portion of common code can be reused easily among sidecars, gateways, waypoint proxies, and ztunnels. For example, sending Certificate Signing Requests (CSRs) to sign keys and certificates, upgrading the connection to the HTTP (or QUIC/HTTP3 soon) based overlay tunnel, and more. Further, Envoy-based ztunnel allows us to leverage the existing CI/CD system easily so we can reuse our existing investments.
Better customer onboarding/UX
We believe it is conceptually simpler for users to understand that all proxies are using Envoy in the Istio project. For users who are already familiar with Envoy, Envoy-based ztunnels and waypoint proxies will have similar access logs, metrics, and tracing that they are comfortable with. Similarly to how developers need to learn Rust as a new programming language, operators will operate not only Envoy-based proxies but also Rust-based ztunnels.
Production-ready faster
With all the Envoy expertise we have at Solo and the Istio community, we believe we can reach production-ready for Envoy-based ztunnel faster than rewriting ztunnel using Rust and getting it to production-ready. As a small company, production-ready faster is one of the most important criteria for Solo.
What about the complex Envoy configuration for ztunnel?
We don’t talk about this often, but if you go through the Envoy configuration for Envoy-based ztunnel, you’ll see tens of thousands of lines for just 2 or 3 services added to your ambient mesh. Envoy wasn’t designed for multi-tenancy, however, ztunnel is multi-tenant and manages all the incoming and outgoing traffic for all co-located pods on the same node.
To match a given source based on its service account to its destination service, which may support HBONE tunneling or not, Envoy internal listener was used in the initial implementation, which resulted in the huge complexity of the ztunnel configurations. If you are interested in learning more details about the complexity, the Huawei team gave a deep explanation of Envoy configuration for ztunnel in this blog in Chinese.
Before ambient mesh was initially launched, we realized there were only a few developers who can debug the complex Envoy-based ztunnel config so we had to simplify it. Furthermore, we ran numerous benchmark tests among sidecars and ambient mesh, and the performance wasn’t as promising as we had hoped, which we believe the complex ztunnel configuration contributed to.
To make Envoy-based ztunnel more performant, we would have to lower its RAM and CPU usage by simplifying what xDS config Envoy needs to remember and traverse while serving requests:
- First of all, we can reduce the number of clusters by using a certificate provider, which enables us to define a single cluster and swap the certificates used for connecting to it based on source IP.
- Then we could write a custom filter to replace the routing filter that leverages a new simple xDS (e.g. Workload DS) that lets us quickly look up whether a workload instance is in the mesh and whether its IP is real or a VIP. This reduces the number of filter chains from scaling with services and pods to a single filter chain.
- There could be a couple more optimizations, for example, on-demand xDS support for the new Workload xDS.
Refer to more details in this great doc from Steven Landow.
What did we miss?
While we were almost convinced Envoy is the right choice in ztunnel, we overlooked one major factor. While Envoy supports multiple threads (for example, Istio’s default sidecar number of worker threads is 2 and you can overwrite it using the concurrency field in ProxyConfig), Envoy’s threading model doesn’t support the work stealing scheduling strategy. With Envoy’s threading model, memory is copied on each thread to avoid locks. Each thread handles a single request with its own connection pool; there is no way to pause work on a thread, share that memory, and have another thread pick it up when it becomes inactive. Work stealing is predicated on memory being shared across multiple threads which Envoy’s thread model does not support today.
Instead of having one connection pool per thread like Envoy, it would be ideal if ztunnel had one connection pool shared across all worker threads. Then we wouldn’t need to duplicate connections across worker threads and we could reuse more connections. It would require a nearly complete rewrite of Envoy to support work stealing while async programming in Rust supports this natively via its Tokio library.
The best equivalent library for C++ async programming with work stealing would be Boost.Asio, but rewriting Envoy to leverage it is harder than just doing a new proxy altogether. While researching this, we found the Cloudflare team published an excellent blog on why the team built a new proxy vs reusing the existing Nginx proxy and how much performance (increased the connection reuse ratio from 87.1% to 99.92% with much less CPU) is improved with work stealing support from the new proxy.
The conclusion we came to is that if we don’t support work stealing, Envoy-based ztunnel won’t be as performant as a more modern proxy that supports work stealing and we can’t do anything about it other than rewriting it.
Why is this not an issue for sidecars?
Interestingly, working with various customers adopting Istio service mesh, we haven’t noticed Envoy not supporting work stealing being an issue. We believe that work stealing can help with processing more requests with lighter CPU requirements, which is a larger concern for ztunnel than sidecar or waypoint proxies.
Ztunnel has different request characteristics from sidecars and is designed to process huge amounts of incoming and outgoing traffic for hundreds of pods co-located on the same node. The traffic for ztunnel could be more aggressive and would need to handle much higher throughput when compared to the various application pods on the same node. Further, ztunnel will primarily connect to other ztunnels, making connection reuse easier and even more important. If you recall, as the picture below illustrates, source ztunnel will try to reuse the existing HBONE tunnel per source and destination service accounts pair to avoid creating new tunnels when possible.
Why is this not an issue for gateways?
Ztunnel is deployed as Kubernetes DaemonSet, which means you can only run 1 replica per node. With gateways you can always deploy more than 1 replica to balance more connections. Gateways often terminate client connections and establish new connections with the gateway’s identity to destination pods, which will likely far outnumber the ztunnel pod count. While ztunnel has to represent all identities in ambient mesh on the co-located node, it primarily connects to other ztunnels so it can more easily reuse existing connections.
How is work stealing different from simply configuring Envoy to balance connections properly?
Neeraj Poddar wrote two really good blogs about how his team debugged imbalance connection issues of Envoy sidecar proxy in Istio for inbound and outbound traffic. The resolution provided by the blogs is to configure Envoy with the proper connection_balance_config configuration to enable exact balance, which affects new connections, but doesn’t allow for the reuse of existing connections.
Connection balancing solves the problem of work not being distributed evenly among worked threads. The problem of reusing existing connections rather than making more connections can be alleviated by work stealing. Work stealing helps to provide better performance with less resource usage.
Wrapping up
After weeks of debates, we are happy to conclude that Envoy proxy isn’t the best choice for ztunnel due to the following:
- Ztunnel is multi-tenant yet Envoy wasn’t designed for multi-tenancy.
- Ztunnel provides only the secure overlay layer with much-reduced functionality and attack surface so it is easier to write compared with a full-feature proxy.
- Work stealing is important for ztunnel to reuse connections effectively.
- Envoy doesn’t support work stealing
- Rust supports work stealing via its Tokio library natively
- C++ supports work stealing via Boost, but Rust is safer and easier than C++ for a net-new proxy
It becomes obvious to us that we should not invest further efforts to make the Envoy-based ztunnel better, so we have joined forces with others in the community to build the new Rust-based ztunnel.
Take a look at the video below to watch Lin and Kevin to discuss deciding on Rust-based ztunnel, along with live demos:
To learn how to get started with Istio ambient service mesh, download our report Istio Ambient Explained.