Security at Scale with Gloo Edge

Denis Jannot
| April 14, 2021

Gloo Edge is our Kubernetes native API gateway based on Envoy.

It provides authentication (with OAuth, JWT, API keys, and JWT), authorization (with OPA or custom approaches), a web application firewall (WAF – based on ModSecurity), function discovery (with OpenAPI and AWS Lambda), advanced transformations, and much more.

In our previous blog post, Envoy at Scale with Gloo Edge, we performed benchmarks to determine how Gloo Edge was able to scale in terms of requests per second (RPS.) We’ve also provided information about the throughput one can expect based on the number of CPUs allocated to the gateway-proxy (Envoy) pod.

In this post, we’ll measure the impact of enabling different security features  (such as HTTPS, JWT, API keys, and WAF.) We know based on my previous tests that we could get close to 90,000 RPS with standard HTTP requests when we don’t set a CPU limit on the gateway-proxy Pod.

In the following tests, we’ll see the impact with a limit configured to eight CPUs and without a limit. With a limit of eight CPUs, we will get more than 16,000 RPS with standard HTTP requests.

HTTPS

Let’s start by assessing the impact of implementing Server Transport Layer Security (TLS) in Gloo Edge.

We first need to create a Kubernetes secret containing the self-signed certificate we want to use:

openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
   -keyout tls.key -out tls.crt -subj "/CN=*"

kubectl create secret tls upstream-tls --key tls.key \
   --cert tls.crt --namespace gloo-system

Then, we will update the VirtualService as follows:

apiVersion: gateway.solo.io/v1
kind: VirtualService
metadata:
  name: wasm
  namespace: gloo-system
spec:
  sslConfig:
    secretRef:
      name: upstream-tls
      namespace: gloo-system
  virtualHost:
    domains:
      - "*"
    routes:
    - matchers:
      - prefix: /
      routeAction:
        single:
          upstream:
            name: default-echoenv-service-1-8080
            namespace: gloo-system

Finally, we’ve updated the test plan to use the https protocol. We can now launch the benchmark and check the results using Grafana.

Here is the result without a limit.

There’s no significant impact.

Here is the result with a limit of eight CPUs.

Again, the impact is minimal. The throughput decreased by approximately 7%.

The conclusion is that enabling HTTPS doesn’t really have a significant performance impact (at least for small requests.)

Authentication with JWT tokens

JSON Web Tokens (JWT), are a standard way to carry verifiable identity information. They can be used for authentication. The advantage of using JWTs is that since they are a standard format and cryptographically signed, they can be verified without contacting an external authentication server.

Let’s update the VirtualService as follows:

apiVersion: gateway.solo.io/v1
kind: VirtualService
metadata:
  name: wasm
  namespace: gloo-system
spec:
  virtualHost:
    domains:
      - "*"
    routes:
    - matchers:
      - prefix: /
      routeAction:
        single:
          upstream:
            name: default-echoenv-service-1-8080
            namespace: gloo-system
    options:
      jwt:
        providers:
          kube:
            issuer: solo.io
            jwks:
              local:
                key: |
                  -----BEGIN PUBLIC KEY-----
                  MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAyOkdHAkrlpoY7Q7n4gZi
                  m076SoCtBWuVqHJq85xVA6+wLrB/ef4+BwE4fJcu//1aqsqoQT5iQpMBHbOIatXi
                  LOgiRlvx3bAMl/vRpEDFufmRmQJMgVL+/BFb/Wf6KWpjylZzqQa3iIYRg3ZLoAxA
                  /RyA1PMbKkxAXuguRe5KRohggy4/iK/EKAEtqNs6OQMHZLOAsfSdxIDX/SDMqgTC
                  0xrXyqrck4UdZwtstMtHzA4mbio3eiBGYlnN/XdqWhQdvOcjDc4JucNWIqLamo0R
                  XE1waNc84PHtT8r4MJGCMJlGvIRlZYL67pTgvbTHgKEnj8mq1HB5e0TYXxk+fUmv
                  WQIDAQAB
                  -----END PUBLIC KEY-----

And we can add the following section to the test plan:

        <hashTree>
          <HeaderManager guiclass="HeaderPanel" testclass="HeaderManager" testname="HTTP Header Manager" enabled="true">
            <collectionProp name="HeaderManager.headers">
              <elementProp name="" elementType="Header">
                <stringProp name="Header.name">Authorization</stringProp>
                <stringProp name="Header.value">Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzb2xvLmlvIiwic3ViIjoiMTIzNDU2Nzg5MCIsInNvbG8uaW8vY29tcGFueSI6InNvbG8ifQ.eeH-Pjzw4l8xdrmZBmzUw5T4jYvo6wkBOIiwYROw8y0fK6RwXamAvfmCmR0l9mo4cdrf834Xqib-XQSVMJIFLdiU2WuBqWeEIT-eSXINRD15WtIq6i7L6B8Kwb8g6PtkoCfkVtWbg__62rePoSNnwW5G11Q5UhEO4dCZcZvGn9tQPPUSxmc75jOZVghmauasCfrp24PI3RFPsXEljW-hZRycinDZJK9henHv-dgXR6hjjfaCPoSvxpzv2j_h_beFPYGDsoxQZm6AXkeTv_PnV9y8ENzLQrsr3JldECf-F5DaoA1YkFQTpYDmDkiyogIVGGqiv0W3sdO61VUB4dNAZQ</stringProp>
              </elementProp>
            </collectionProp>
          </HeaderManager>
          <hashTree/>
        </hashTree>

This way, we will tell JMeter to add an Authorization header with a valid JWT token. We can now launch the benchmark and check the results using Grafana.

Here is the result without a limit:

As you can see, we will still get really good performance with JWT. The throughput decreased by approximately 25%.

Here is the result with a limit of eight CPUs:

The impact is higher when we set a CPU limit. The throughput decreased by approximately 55%. It’s because the JWT authentication is performed directly in an Envoy filter, so it’s consuming the same CPUs defined in the limit.

Authentication with API keys

Sometimes when you need to protect a service, the set of users that will need to access it is known in advance and does not change frequently. For example, these users might be other services or specific persons or teams in your organization.

You might also want to retain direct control over how credentials are generated and when they expire. If one of these conditions applies to your use case, you should consider securing your service using API keys. API keys are secure, long-lived Universally Unique Identifiers (UUIDs) that clients must provide when sending requests to a service that is protected using this method.

We first need to create a Kubernetes secret containing the API key we want to use:

glooctl create secret apikey infra-apikey \
    --apikey N2YwMDIxZTEtNGUzNS1jNzgzLTRkYjAtYjE2YzRkZGVmNjcy \
    --apikey-labels team=infrastructure

Then, we will create an AuthConfig object:

apiVersion: enterprise.gloo.solo.io/v1
kind: AuthConfig
metadata:
  name: apikey-auth
  namespace: gloo-system
spec:
  configs:
  - apiKeyAuth:
      headerName: api-key
      labelSelector:
        team: infrastructure

Next, we will update the VirtualService as follows:

apiVersion: gateway.solo.io/v1
kind: VirtualService
metadata:
  name: wasm
  namespace: gloo-system
spec:
  virtualHost:
    domains:
      - "*"
    routes:
    - matchers:
      - prefix: /
      routeAction:
        single:
          upstream:
            name: default-echoenv-service-1-8080
            namespace: gloo-system
    options:
      extauth:
        configRef:
          name: apikey-auth
          namespace: gloo-system

Finally, we will add the following section to the test plan:

        <hashTree>
          <HeaderManager guiclass="HeaderPanel" testclass="HeaderManager" testname="HTTP Header Manager" enabled="true">
            <collectionProp name="HeaderManager.headers">
              <elementProp name="" elementType="Header">
                <stringProp name="Header.name">api-key</stringProp>
                <stringProp name="Header.value">N2YwMDIxZTEtNGUzNS1jNzgzLTRkYjAtYjE2YzRkZGVmNjcy</stringProp>
              </elementProp>
            </collectionProp>
          </HeaderManager>
          <hashTree/>
        </hashTree>

This way, we are telling JMeter to add an api-key header with a valid API key.

To perform this kind of authentication, Envoy is leveraging an external authentication server which is included in Gloo Edge. So, we expect an higher impact on the throughput. But, the extauth pod will need some CPU resources, so we’ll limit the amount of CPU that the gateway-proxy can use to 16, and we’ll compare the results with what we obtained in my previous post with 16 CPUs (around 35,000 RPS).

We can now launch the benchmark and check the results on Grafana.

Here is the result without a limit of 16 CPUs:

We still get really good numbers, even if there’s one extra hop here (between the gateway-proxy and extauth Pods). The throughput decreased by approximately 35%.

Here is the result with a limit of eight CPUs:

With a limit the throughput decreases the same way. It’s because the extauth pod is consuming its own CPU. Note that the amount of CPU used by the gateway-proxy and extauth pods is equivalent:

kubectl top pods -n gloo-system 
NAME                                                   CPU(cores)   MEMORY(bytes)   
discovery-59d777559d-vzxkk                             13m          48Mi            
extauth-6979db7474-5hs2g                               16442m       228Mi           
gateway-869494ccf5-n8phh                               5m           34Mi            
gateway-proxy-58955c979c-xc7kc                         16012m       349Mi           
gloo-5977d8b5f6-5b928                                  92m          66Mi            
glooe-grafana-78c6f96db-rvqph                          2m           50Mi            
glooe-prometheus-kube-state-metrics-5dd77b76fc-lf8tw   2m           18Mi            
glooe-prometheus-server-59dcf7bc5b-x469j               42m          438Mi           
observability-f54cf5485-qtjtc                          4m           34Mi            
rate-limit-7f885f7b4c-tdrz9                            2m           21Mi            
redis-55d6dbb6b7-wxckw                                 2m           8Mi

Web Application Firewall (WAF)

A web application firewall (WAF) protects web applications by monitoring, filtering, and blocking potentially harmful traffic and attacks that can take over or exploit the applications. WAFs do this by intercepting and inspecting the network packets and use a set of rules to determine access to the web application. In enterprise security infrastructure, WAFs can be deployed to an application or group of applications to provide a layer of protection between the applications and the end users.

Gloo Edge supports the popular Web Application Firewall framework/ruleset ModSecurity.

To enable WAF, we just need to update the Gateway as follows:

apiVersion: gateway.solo.io/v1
kind: Gateway
metadata:
  labels:
    app: gloo
  name: gateway-proxy
  namespace: gloo-system
spec:
  bindAddress: '::'
  bindPort: 8080
  httpGateway:
    options:
      waf:
        customInterventionMessage: ModSecurity intervention!
        ruleSets:
        - ruleStr: |
            # Turn rule engine on
            SecRuleEngine On
            SecRule REQUEST_HEADERS:User-Agent "scammer" "deny,status:403,id:107,phase:1,msg:'blocked scammer'"
    virtualServices:
    - name: default
      namespace: gloo-system
  proxyNames:
  - gateway-proxy
  useProxyProto: false

This rule will block any HTTP request that includes the header User-Agent with the value scammer.

Note that we apply this rule at the Gateway level to enforce it globally, but we could also set it at the VirtualService level.

Let’s try it out:

curl -H 'User-Agent: scammer' http://172.18.1.1
ModSecurity intervention! Custom message details here..

WAF doesn’t require any extra hops, so let’s remove the CPU limit we set on the gateway-proxy pod in the previous test. We can now launch the benchmark and check the results using Grafana.

Here is the result without a limit:

Again, the results are really good. The throughput decreased by approximately 15%. It’s even less than with JWT authentication.

Here is the result with a limit of eight CPUs:

The impact is higher when we set a CPU limit. The throughput decreased by approximately 33%. Again, it’s because WAF is performed directly in an Envoy filter, so it’s consuming the same CPUs defined in the limit.

Note that we’ve applied security on headers. WAF can also be used to inspect the body of the requests and the impact would probably be higher. Let’s try it!

Let’s update the Gateway as follows:

apiVersion: gateway.solo.io/v1
kind: Gateway
metadata:
  labels:
    app: gloo
  name: gateway-proxy
  namespace: gloo-system
spec:
  bindAddress: '::'
  bindPort: 8080
  httpGateway:
    options:
      waf:
        customInterventionMessage: Username should only contain letters
        ruleSets:
        - ruleStr: |
            # Turn rule engine on
            SecRuleEngine On
            SecRule ARGS:/username/ "[^a-zA-Z]" "t:none,phase:2,deny,id:6,log,msg:'allow only letters in username'"
    virtualServices:
    - name: default
      namespace: gloo-system
  proxyNames:
  - gateway-proxy
  useProxyProto: false

This rule will block any HTTP request that includes a username containing something else than letters in the body. Let’s try it out:

curl -X POST -F 'username=denis1' 172.18.1.1
Username should only contain letters.

We can now launch the benchmark and check the results using Grafana. Here is the result without a limit:

The difference when WAF needs to parse the body isn’t that high. The throughput decreased by approximately 30%. Here is the result with a limit of eight CPUs:

The impact is higher when we set a CPU limit. The throughput decreased by approximately 37%. Again, it’s because WAF is performed directly in an Envoy filter, so it’s consuming the same CPUs defined in the limit.

Conclusion

The data we obtained when limiting the amount of CPU used by the gateway-proxy (Envoy) pod is the most useful. It can be used to determine the throughput you’ll get with eight CPU  depending on the security feature you enable on Gloo Edge. Here is a table that summarize the results:

Keep in mind that 1,000 RPS means more than 86 millions requests a day. Next in the series, we’ll write a blog post about benchmarking WebAssembly (Wasm.) Check it out!

Back to Blog