Guest blog: How Snyk is normalizing authentication strategies with Gloo Edge
This post about authentication strategies with Gloo Edge was originally published on the Snykblog on July 20, 2021
Additional authors: David Harrigan, Gareth Visagie, James Bowes
Snyk’s approach to authentication strategies with Gloo Edge
Snyk supports multiple authentication (authN) strategies on its APIs. Historically, API keys have been the primary form of authN, but more recently we introduced support for authN using signed JWTs produced as a result of an OAuth integration. This is currently in use by both our AWS CodePipeline and Bitbucket integrations.
In the beginning, Snyk began with a hub and spoke architecture with a central monolith making authN decisions. This worked great as Snyk was finding its market fit and iterating quickly. But as the scale of the organisation and its customer base grew, so did the demand to adopt a service-oriented architecture (SOA). However, making that architectural change would require each service to make authN decisions. Meaning each service would need to be aware of multiple authentication strategies, which would become cumbersome and hard to scale over time.
In order to support our drive towards adopting an SOA, we concluded that the most secure and pragmatic approach was to have a single authority normalize authN strategies into a cryptographically assured identity token. This token could then be consumed and validated by services without them needing to handle the complexity of supporting multiple authN strategies. More simply, we would have a single authority make an authN decision and then have it pass on a signed JWT for the rest of the system to consume.
In order to achieve this, we needed to introduce a new layer in our system which could authenticate requests and produce the normalized token. Enter Gloo Edge, a form of API Gateway.
Gloo-ing things together!
What are API gateways?
While we’re not an authority on API gateways, we can give you a basic list of things they can do:
- Request validation
- Rate limiting
- Authentication (or can be extended with it!)
- Much, much more! (We told you we’re not an authority…)
And here’s the kicker… it’s all at the edge! If any of your apps are doing one of the things we mentioned above, you likely can offload that to your API gateway. Meaning you get to remove lines of code, which every engineer loves to do! ?
We looked at various options of API gateways and decided to use Gloo Edge (Gloo for short in this article) to help us along our platform journey. This blog post will not outline how or why we decided on Gloo. That’s a subject for a future blog post!
What is Gloo Edge?
Gloo Edge, is an API Gateway developed by Solo.io. It functions as an abstraction on Envoy, and allows you to use Kubernetes custom resource definitions (CRDs) to produce Envoy configuration — making the barrier to entry really low. If you know how to write Kubernetes config, chances are you can write Envoy configuration using Gloo’s own CRDs.
To read more about Gloo, what it is, and how it leverages Envoy, read the Gloo Edge docs (they are genuinely quite great)!
Implementing Gloo Edge at Snyk
The following sections will outline some of the work we’ve performed to get Gloo to front the API, and being consumed by some really interesting partnership integrations!
Identifying a migration path (if your API already is being used)
The Snyk API is consistently receiving traffic, so just fronting it with Gloo wasn’t really an option. As mentioned above, historically the customer API exclusively used API Keys to authenticate requests. But the shift to allowing JWTs and API Keys on the API Gateway created a split between old and new.
Thankfully for us, the group was already excited about changing our API domain from https://snyk.io/api/ to https://api.snyk.io. But we didn’t want to redirect all requests from our original DNS to the new one if we were going to front it with the new API Gateway. Instead, we are performing a weighted, percentage-based, canary release of the API Gateway across our old API domain.
In other words, we are currently fronting some traffic against our original DNS with the API Gateway. The rest are passed to the monolith for it to make any authN decisions.
Out of the box, Gloo supports using Canary releasing. But because we haven’t fronted all of the API traffic with the Gateway, we are instead using Envoy Proxy’s traffic shifting pattern for this. See the below figure for a very high-level example of how this and how it could be applicable for you.
Moving JWT validation to the edge
Although the aforementioned OAuth PoC was implementing OAuth, it was too tightly coupled with our identity hub (and was implemented in our monolith). This new authN library had to deal with validating the JWT sent from the identity hub; which would look something like:
- Validating the JWT payload (sub, iat, aud)
- Validating the JWT signature using JWKs (JSON Web Key Sets) endpoints
Even though this technically is a trivial challenge, it normally involves a fair bit of coding (less than 100 lines of code), not including tests. The same can be done by adding less than 20 lines of code to your VirtualService in Gloo:
issuer: <a valid jwt issuer> # where did the token come from?
- <a valid jwt audience> # what service expects the token?
headers: # Regex currently not supported
- header: Authorization
- header: Authorization
name: <upstream name>
namespace: <upstream namespace>
url: <a valid jwks endpoint>
This config will extract the JWT from the header and perform a few basic JWT validation steps before sending the JWT across to an External Auth (ExtAuth) plugin (more on that later!):
- Validate that the current time is within the JWT’s iat (issued at) and exp (expiration) claims
- Validate the audience
- Validate the issuer
- Validate the signature using an external JWKs endpoint (upstream)
We were able to extract a lot of validation from the monolith, onto the edge! Neat, right? ??
Unifying authN strategies
Even if we are doing JWT validation at the edge, we have only helped our system by offloading that from the monolith. As you may have guessed based on the title of this article, and the introduction, just doing JWT validation at the edge isn’t nearly enough. We also want to offload any authN decisions onto the edge, enabling Gloo to make authN decisions based on the key material provided by the user.
To enable this, we have introduced an External Authorization (ExtAuth) plugin, a unique form of plugin created by Gloo (only available in their Enterprise suite), written in Go and compiled into an Envoy External Authorization filter. The plugin first identifies the type of key material the user has supplied and establishes their identity. When the identity has been established, the plugin cryptographically signs the token and passes it to the downstream API, which will validate the token’s signature and perform the requested operation on behalf of the user. The below figure provides a simple example of how this could look.
In other words, at this point we have created an internal token that could be used by any service in our cluster, containing some user information that can be mapped to a user in our user database! Services routed from Gloo will ensure that the JWT has been signed by Gloo. This might not seem like much, but previously we would have to explicitly request user data from our monolith to get anything pertaining to a user’s identity from each individual service.
In addition to that, Gloo enables our services to manage one authN method, while Gloo deals with the complexity of managing multiple authN strategies at the edge.
This is all done relatively pain-free, with few lines of code (most of it is boilerplate created by Solo), compared to having to implement bespoke authN strategies on each service.
In upcoming sprints, the Extensibility team will be focusing on two key areas related to our API Gateway:
- Slowly letting Gloo handle all our API traffic (read: canary release!). We want to start routing “normal”, non-partner specific traffic through our API Gateway. This means putting in place the necessary observability and alerting , ensuring our oncall engineers can support the new infrastructure. It also means doing the hard work of routing some % of traffic that the API’s current ingress is receiving to Gloo’s Envoy proxy.
- Extending Gloo to do request validation at the edge. Currently Gloo doesn’t support this out of the box.
We hope you’ve found this post useful if you’re thinking about using Gloo in your environment! In future Snykblog posts, we’ll be looking at how Gloo has provided our Engineering and Product teams with great baseline observability and how Gloo pushed us to improve our CI/CD practices at Snyk.