HTTP Observability Without Compromises

August 13, 2024
John Howard

In development of Istio ambient mode, we at Solo.io embraced the “Making Easy Things Easy, and Hard Things Possible” mindset.

Ambient mode is specifically designed to allow the easiest possible on-ramp to value, with a purpose-built L4 proxy, ztunnel, developed to deliver on these goals. ztunnel is easy to roll out to the entire cluster, and automatically gives secure, FIPS compliant, mutual TLS, as well as TCP level authorization and observability.

The rest of service mesh’s extensive L7 functionality, including rich HTTP controls, can be enabled on a service-by-service basis with a Waypoint proxy. This layered approach allows users to incrementally adopt the capabilities they need and not bear the infrastructural costs for those they don’t. An important design goal for ztunnel is that its use should impose as low a CPU, latency and throughput overhead as possible while still delivering mTLS. At Solo.io, we have been pushing the boundaries of what functionality we can push into ztunnel without violating these requirements.

Slicing the layers

When we designed ambient mode, HTTP processing is specifically split into the waypoint layer. This is done for a few reasons.

  • Performance: HTTP processing is typically very expensive, often accounting for over 75% of the performance overhead of a service mesh.
  • Safety: HTTP is a complex protocol prone to a variety of attacks, such as protocol exploits, request smuggling, and parsing issues.
  • Compatibility: Proxying HTTP traffic inherently changes the behavior of the application. This includes changes like load balancing at the request level, retries, header manipulation, and more. While many applications handle this without issue, others may see issues with these changes.

Simply putting these all on a shared node proxy complicates operations significantly, which is why Istio does not do that. Instead, Istio intentionally puts all HTTP functionality in the waypoint layer.

With Gloo, we have developed an industry-first solution that pushes HTTP observability into the node proxy, without compromising on these principles. This information is emitted in standardized telemetry & log formats, allowing you to instantly get a complete view of your traffic, across the cluster, in your preferred observability platform:

TelemetryInfoZtunnel
Telemetry information generated by Ztunnel, visualized in a variety of tools that integrate with Gloo
Top Left: traces visualized by Jaeger
Top Right: metrics visualized by Gloo
Bottom Left: access logs visualized by Grafana

The best part is this comes with nearly zero performance impact!

PerformanceHTTP
Performance numbers for HTTP observability, showing a less than 1% overhead.

How it works

In Gloo, we built out an HTTP observability system specifically designed to meet the constraints of a node proxy.

  • Performance: In Gloo, HTTP processing comes at a less than 1% overhead on application latency. This is achieved by ultra-high performance HTTP parsers specifically tuned to meet the needs of telemetry extraction, without all the complexity required for a full-fledged HTTP proxy. For instance, a full HTTP proxy may need to buffer requests and create mutable data structures to implement things like retries, whereas Gloo can stream requests as they go by and record aggregated telemetry information. Stay tuned for an upcoming post giving a deep dive into ambient performance.
  • Safety: while any feature comes with some risk, Gloo observability was designed with a “safety first” mindset. Because ztunnel does not modify requests at all, entire classes of vulnerabilities are impossible by design. All processing is written in modern Rust, ensuring memory safety (one of the largest attack vectors) throughout the stack. If something unexpectedly goes wrong, we can gracefully degrade by stopping HTTP parsing for a single connection; when a full HTTP proxy encounters a state it cannot handle, it is forced to terminate the connection entirely.
  • Compatibility: Unlike a traditional HTTP proxy, Gloo’s ztunnel does not modify HTTP traffic in any way. Instead, it merely observes the traffic and forwards it untouched. This ensures full backward compatibility for any production environment.

All of this comes together to provide full HTTP observability without any compromise.

Learn More About Gloo Mesh

Gloo Mesh delivers the most capable service mesh solution, powered by Istio ambient mode, for your cloud-native workloads. Boost security, resiliency, and observability at your organization. Learn more about Gloo Mesh today.

Cloud connectivity done right