Webinar Recap – Advanced Rate Limiting with Envoy Proxy and Gloo API Gateway

In last week’s webinar, Rick Ducott covered advanced use cases of rate limiting with Envoy as the edge proxy managed by Gloo API Gateway. Rate limiting is a strategy that can prevent service outages by protecting the service from being overrun with more requests than it’s resources can process and respond to within the agreed service levels. Rate limits can be configured in a variety of ways to limit the volume or requests over a time period to the endpoint, by the client ID, by HTTP method, with integrated security, and more. 

Why Rate Limit? Applications of all types including monoliths, microservices, serverless functions, and any combination of them often need a way for external clients and users to access them in a safe and secure way. As incoming requests can be numerous and varied, protecting backend services and globally enforcing business limits can become incredibly complex being handled at the application level.

This demo filled sessions covered the following rate limiting use cases:

  • Multiple Rate Limits per Client ID
  • Rate Limit Traffic Prioritization on HTTP 
  • Integrating Rate Limits with JSON Web Tokens (JWT)

Keep reading to get the links to view the recording, Q&A, download the slides, and to sign up for our next event

Watch the replay here

Highlights from the Q&A 

Can we deploy Gloo to in-house Kubernetes?

Yes. Gloo can be deployed to any self-hosted or cloud-managed Kubernetes clusters. For non-Kubernetes environments, Gloo can be deployed with the HashiCorp suite of infrastructure. Read the platform configuration instructions here.

What filters are being used for this and is it using WebAssembly?

This demo is using Envoy’s rate limiting filter. This filter can be configured in a number of different ways to support basic and advanced user cases. The configurations are defined and managed by the Gloo control plane for the Envoy proxies deployed at the edge. Learn more about rate limiting.

When there are multiple instances of Envoy deployed in the gateway cluster, do the rate limits apply per proxy or the entire cluster of proxies?

Gloo’s architecture is designed with separate data and control planes so they are independently scalable so the application of the rate limit configurations don’t depend on how many replicas of the Envoy proxy you have deployed in the cluster, instead it depends on where you have bound your rate limit configurations and on which listeners those rate limits exist. 

As an example if you have a single virtual service bound to a standard HTTP port, then any request going into the HTTP port would trigger those limits regardless of the number of envoy proxies as they are talking to the same rate limit backend. This allows you to scale Envoy up and down without having to worry about configuring each proxy replica. 

Are these rate limiting use cases mutually exclusive or can I set multiple rate limits together?

You can configure one or many rate limits together, it really just depends on how you want to design and expose your APIs. The two building blocks of Envoy are descriptors which define for a particular tuple of values what the “limit” is and the “actions” on how to construct those tuples. You can construct the tuples in any way to enforce your different traffic and security policies. Read more about tuples in the advanced concepts here

What’s the performance hit of using rate limits and Envoy filters?

The filters that you have in your filter chain and how they need to be validated can potentially impact performance. For example, some filters use more IO, while others call out to an external server or process in memory for validation. Additionally your environment could have hundreds of routes with a variety of filters configured across them. Ultimately the performance depends on how you’ve configured your filters in the chain. In the latest Gloo 1.4 release, we’ve added a new feature to collect proxy latency metrics so that you can get pinpoint accuracy on the exact timing for a request to enter and exit the proxy and a breakdown of where the time is spent. This will help you understand where the performance bottleneck is so you can adjust it. 

Download the presentation

Learn more