Security at Scale with Gloo Edge
Gloo Edge is our Kubernetes native API gateway based on Envoy.
It provides authentication (with OAuth, API keys, and JWT), authorization (with OPA or custom approaches), a web application firewall (WAF – based on ModSecurity), function discovery (with OpenAPI and AWS Lambda), advanced transformations, and much more.
In our previous blog post, Envoy at Scale with Gloo Edge, we performed benchmarks to determine how Gloo Edge was able to scale in terms of requests per second (RPS.) We’ve also provided information about the throughput one can expect based on the number of CPUs allocated to the gateway-proxy
(Envoy) pod.
In this post, we’ll measure the impact of enabling different security features (such as HTTPS, JWT, API keys, and WAF.) We know based on my previous tests that we could get close to 90,000 RPS with standard HTTP requests when we don’t set a CPU limit on the gateway-proxy
Pod.
In the following tests, we’ll see the impact with a limit configured to eight CPUs and without a limit. With a limit of eight CPUs, we will get more than 16,000 RPS with standard HTTP requests.
HTTPS
Let’s start by assessing the impact of implementing Server Transport Layer Security (TLS) in Gloo Edge.
We first need to create a Kubernetes secret containing the self-signed certificate we want to use:
openssl req -x509 -nodes -days 365 -newkey rsa:2048 \ -keyout tls.key -out tls.crt -subj "/CN=*" kubectl create secret tls upstream-tls --key tls.key \ --cert tls.crt --namespace gloo-system
Then, we will update the VirtualService
as follows:
apiVersion: gateway.solo.io/v1 kind: VirtualService metadata: name: wasm namespace: gloo-system spec: sslConfig: secretRef: name: upstream-tls namespace: gloo-system virtualHost: domains: - "*" routes: - matchers: - prefix: / routeAction: single: upstream: name: default-echoenv-service-1-8080 namespace: gloo-system
Finally, we’ve updated the test plan to use the https
protocol. We can now launch the benchmark and check the results using Grafana.
Here is the result without a limit.
There’s no significant impact.
Here is the result with a limit of eight CPUs.
Again, the impact is minimal. The throughput decreased by approximately 7%.
The conclusion is that enabling HTTPS doesn’t really have a significant performance impact (at least for small requests.)
Authentication with JWT tokens
JSON Web Tokens (JWT), are a standard way to carry verifiable identity information. They can be used for authentication. The advantage of using JWTs is that since they are a standard format and cryptographically signed, they can be verified without contacting an external authentication server.
Let’s update the VirtualService
as follows:
apiVersion: gateway.solo.io/v1 kind: VirtualService metadata: name: wasm namespace: gloo-system spec: virtualHost: domains: - "*" routes: - matchers: - prefix: / routeAction: single: upstream: name: default-echoenv-service-1-8080 namespace: gloo-system options: jwt: providers: kube: issuer: solo.io jwks: local: key: | -----BEGIN PUBLIC KEY----- MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAyOkdHAkrlpoY7Q7n4gZi m076SoCtBWuVqHJq85xVA6+wLrB/ef4+BwE4fJcu//1aqsqoQT5iQpMBHbOIatXi LOgiRlvx3bAMl/vRpEDFufmRmQJMgVL+/BFb/Wf6KWpjylZzqQa3iIYRg3ZLoAxA /RyA1PMbKkxAXuguRe5KRohggy4/iK/EKAEtqNs6OQMHZLOAsfSdxIDX/SDMqgTC 0xrXyqrck4UdZwtstMtHzA4mbio3eiBGYlnN/XdqWhQdvOcjDc4JucNWIqLamo0R XE1waNc84PHtT8r4MJGCMJlGvIRlZYL67pTgvbTHgKEnj8mq1HB5e0TYXxk+fUmv WQIDAQAB -----END PUBLIC KEY-----
And we can add the following section to the test plan:
<hashTree> <HeaderManager guiclass="HeaderPanel" testclass="HeaderManager" testname="HTTP Header Manager" enabled="true"> <collectionProp name="HeaderManager.headers"> <elementProp name="" elementType="Header"> <stringProp name="Header.name">Authorization</stringProp> <stringProp name="Header.value">Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzb2xvLmlvIiwic3ViIjoiMTIzNDU2Nzg5MCIsInNvbG8uaW8vY29tcGFueSI6InNvbG8ifQ.eeH-Pjzw4l8xdrmZBmzUw5T4jYvo6wkBOIiwYROw8y0fK6RwXamAvfmCmR0l9mo4cdrf834Xqib-XQSVMJIFLdiU2WuBqWeEIT-eSXINRD15WtIq6i7L6B8Kwb8g6PtkoCfkVtWbg__62rePoSNnwW5G11Q5UhEO4dCZcZvGn9tQPPUSxmc75jOZVghmauasCfrp24PI3RFPsXEljW-hZRycinDZJK9henHv-dgXR6hjjfaCPoSvxpzv2j_h_beFPYGDsoxQZm6AXkeTv_PnV9y8ENzLQrsr3JldECf-F5DaoA1YkFQTpYDmDkiyogIVGGqiv0W3sdO61VUB4dNAZQ</stringProp> </elementProp> </collectionProp> </HeaderManager> <hashTree/> </hashTree>
This way, we will tell JMeter to add an Authorization
header with a valid JWT token. We can now launch the benchmark and check the results using Grafana.
Here is the result without a limit:
As you can see, we will still get really good performance with JWT. The throughput decreased by approximately 25%.
Here is the result with a limit of eight CPUs:
The impact is higher when we set a CPU limit. The throughput decreased by approximately 55%. It’s because the JWT authentication is performed directly in an Envoy filter, so it’s consuming the same CPUs defined in the limit.
Authentication with API keys
Sometimes when you need to protect a service, the set of users that will need to access it is known in advance and does not change frequently. For example, these users might be other services or specific persons or teams in your organization.
You might also want to retain direct control over how credentials are generated and when they expire. If one of these conditions applies to your use case, you should consider securing your service using API keys. API keys are secure, long-lived Universally Unique Identifiers (UUIDs) that clients must provide when sending requests to a service that is protected using this method.
We first need to create a Kubernetes secret containing the API key we want to use:
glooctl create secret apikey infra-apikey \ --apikey N2YwMDIxZTEtNGUzNS1jNzgzLTRkYjAtYjE2YzRkZGVmNjcy \ --apikey-labels team=infrastructure
Then, we will create an AuthConfig
object:
apiVersion: enterprise.gloo.solo.io/v1 kind: AuthConfig metadata: name: apikey-auth namespace: gloo-system spec: configs: - apiKeyAuth: headerName: api-key labelSelector: team: infrastructure
Next, we will update the VirtualService
as follows:
apiVersion: gateway.solo.io/v1 kind: VirtualService metadata: name: wasm namespace: gloo-system spec: virtualHost: domains: - "*" routes: - matchers: - prefix: / routeAction: single: upstream: name: default-echoenv-service-1-8080 namespace: gloo-system options: extauth: configRef: name: apikey-auth namespace: gloo-system
Finally, we will add the following section to the test plan:
<hashTree> <HeaderManager guiclass="HeaderPanel" testclass="HeaderManager" testname="HTTP Header Manager" enabled="true"> <collectionProp name="HeaderManager.headers"> <elementProp name="" elementType="Header"> <stringProp name="Header.name">api-key</stringProp> <stringProp name="Header.value">N2YwMDIxZTEtNGUzNS1jNzgzLTRkYjAtYjE2YzRkZGVmNjcy</stringProp> </elementProp> </collectionProp> </HeaderManager> <hashTree/> </hashTree>
This way, we are telling JMeter to add an api-key
header with a valid API key.
To perform this kind of authentication, Envoy is leveraging an external authentication server which is included in Gloo Edge. So, we expect an higher impact on the throughput. But, the extauth
pod will need some CPU resources, so we’ll limit the amount of CPU that the gateway-proxy
can use to 16, and we’ll compare the results with what we obtained in my previous post with 16 CPUs (around 35,000 RPS).
We can now launch the benchmark and check the results on Grafana.
Here is the result without a limit of 16 CPUs:
We still get really good numbers, even if there’s one extra hop here (between the gateway-proxy
and extauth
Pods). The throughput decreased by approximately 35%.
Here is the result with a limit of eight CPUs:
With a limit the throughput decreases the same way. It’s because the extauth
pod is consuming its own CPU. Note that the amount of CPU used by the gateway-proxy
and extauth
pods is equivalent:
kubectl top pods -n gloo-system NAME CPU(cores) MEMORY(bytes) discovery-59d777559d-vzxkk 13m 48Mi extauth-6979db7474-5hs2g 16442m 228Mi gateway-869494ccf5-n8phh 5m 34Mi gateway-proxy-58955c979c-xc7kc 16012m 349Mi gloo-5977d8b5f6-5b928 92m 66Mi glooe-grafana-78c6f96db-rvqph 2m 50Mi glooe-prometheus-kube-state-metrics-5dd77b76fc-lf8tw 2m 18Mi glooe-prometheus-server-59dcf7bc5b-x469j 42m 438Mi observability-f54cf5485-qtjtc 4m 34Mi rate-limit-7f885f7b4c-tdrz9 2m 21Mi redis-55d6dbb6b7-wxckw 2m 8Mi
Web Application Firewall (WAF)
A web application firewall (WAF) protects web applications by monitoring, filtering, and blocking potentially harmful traffic and attacks that can take over or exploit the applications. WAFs do this by intercepting and inspecting the network packets and use a set of rules to determine access to the web application. In enterprise security infrastructure, WAFs can be deployed to an application or group of applications to provide a layer of protection between the applications and the end users.
Gloo Edge supports the popular Web Application Firewall framework/ruleset ModSecurity.
To enable WAF, we just need to update the Gateway
as follows:
apiVersion: gateway.solo.io/v1 kind: Gateway metadata: labels: app: gloo name: gateway-proxy namespace: gloo-system spec: bindAddress: '::' bindPort: 8080 httpGateway: options: waf: customInterventionMessage: ModSecurity intervention! ruleSets: - ruleStr: | # Turn rule engine on SecRuleEngine On SecRule REQUEST_HEADERS:User-Agent "scammer" "deny,status:403,id:107,phase:1,msg:'blocked scammer'" virtualServices: - name: default namespace: gloo-system proxyNames: - gateway-proxy useProxyProto: false
This rule will block any HTTP request that includes the header User-Agent
with the value scammer
.
Note that we apply this rule at the Gateway
level to enforce it globally, but we could also set it at the VirtualService
level.
Let’s try it out:
curl -H 'User-Agent: scammer' http://172.18.1.1 ModSecurity intervention! Custom message details here..
WAF doesn’t require any extra hops, so let’s remove the CPU limit we set on the gateway-proxy
pod in the previous test. We can now launch the benchmark and check the results using Grafana.
Here is the result without a limit:
Again, the results are really good. The throughput decreased by approximately 15%. It’s even less than with JWT authentication.
Here is the result with a limit of eight CPUs:
The impact is higher when we set a CPU limit. The throughput decreased by approximately 33%. Again, it’s because WAF is performed directly in an Envoy filter, so it’s consuming the same CPUs defined in the limit.
Note that we’ve applied security on headers. WAF can also be used to inspect the body of the requests and the impact would probably be higher. Let’s try it!
Let’s update the Gateway
as follows:
apiVersion: gateway.solo.io/v1 kind: Gateway metadata: labels: app: gloo name: gateway-proxy namespace: gloo-system spec: bindAddress: '::' bindPort: 8080 httpGateway: options: waf: customInterventionMessage: Username should only contain letters ruleSets: - ruleStr: | # Turn rule engine on SecRuleEngine On SecRule ARGS:/username/ "[^a-zA-Z]" "t:none,phase:2,deny,id:6,log,msg:'allow only letters in username'" virtualServices: - name: default namespace: gloo-system proxyNames: - gateway-proxy useProxyProto: false
This rule will block any HTTP request that includes a username containing something else than letters in the body. Let’s try it out:
curl -X POST -F 'username=denis1' 172.18.1.1 Username should only contain letters.
We can now launch the benchmark and check the results using Grafana. Here is the result without a limit:
The difference when WAF needs to parse the body isn’t that high. The throughput decreased by approximately 30%. Here is the result with a limit of eight CPUs:
The impact is higher when we set a CPU limit. The throughput decreased by approximately 37%. Again, it’s because WAF is performed directly in an Envoy filter, so it’s consuming the same CPUs defined in the limit.
Conclusion
The data we obtained when limiting the amount of CPU used by the gateway-proxy
(Envoy) pod is the most useful. It can be used to determine the throughput you’ll get with eight CPU depending on the security feature you enable on Gloo Edge. Here is a table that summarize the results:
Keep in mind that 1,000 RPS means more than 86 millions requests a day. Next in the series, we’ll write a blog post about benchmarking WebAssembly (Wasm.) Check it out!