Infrastructure Resilience: Handling Invalid Configuration in the Envoy Proxy

As the size and scale of microservice deployments increases, the problem of managing traffic routing becomes increasingly complex. When configuration goes awry, it can cause everything from control plane downtime to service outages.

This short blog post will cover how Gloo, the Envoy-based API Gateway, handles the challenge of managing large-scale configurations for Envoy to prevent invalid configurations from disrupting service.

Users of API Gateways and Service meshes often deal with a cavalcade of configuration objects that have to be managed independently (i.e. per-service), but are often merged into a configuration for a single proxy. These configurations contain a large number of dependencies across objects, as demonstrated in the following config diagram:

Users often use templating techniques such as Helm and Jsonnet to manage large sets of routing configuration (along with their services, deployments, configmaps, and other CRDs). This helps reduce the pain of managing all this config, but does little to reduce the problem of config interdependencies. Note what happens when a single piece of configuration is invalid:

In the provided diagram, both route tables become invalid as they can no longer route to Service4 as the TLS secret is invalid.

Syntactic and functional validation of the configuration becomes increasingly difficult for operators, particularly for routing configuration which may span multiple clusters, cloud environments, out-of-k8s workloads. Often there are dependencies between resources — services, TLS Secrets, delegated routes, and kubernetes services require coordination between disparate pieces of config.

When managing traffic in large scale microservice environments, invalid configurations can lead to vulnerabilities and service outages. A robust control plane is therefore responsible for preventing invalid configurations from affecting the data plane (proxies).

This article will explore how Gloo, the Envoy-based API Gateway, handles and prevents invalid routing configurations.

How Gloo Prevents Config Errors

When a user requests to create or modify one of Gloo’s Custom Resources, the request is processed by Gloo’s Validating Admission Webhook. Gloo runs an internal translation loop to validate that each individual update to the whole cluster configuration produces a valid Envoy config. If Gloo detects that the requested change would produce an invalid Envoy configuration, it is rejected before it is admitted to Kubernetes storage, and the client (either kubectl or another tool for interacting with the Kubernetes API) returns an error to the user.

Administrators can configure the strictness of Gloo’s validation via the Settings API, as well as options for how Gloo should handle invalid config that has been admitted (for example, in the case that validation is disabled).

Try out the validation tutorial to see how Gloo safeguards API Gateway configuration.

Gloo is available open source or enterprise and we invite you to give it a try and are looking forward to your feedback.

Thanks for reading, stay tuned for more blogs and exciting open source software from Solo.io!