Technical

Using Gloo Gateway to Support LLM APIs

LLM providers like OpenAI and Gemini offer advanced natural language processing capabilities, enabling innovative applications. An API gateway like Gloo Gateway is essential to efficiently and securely access these APIs. Gloo Gateway acts as a central hub, providing a unified endpoint and essential functionalities such as request routing, security enforcement, and monitoring.

If you’re building or considering building with LLM/GenAI APIs, please take our 30-second survey — we’d love to talk with you.

This article explores how Gloo Gateway can be configured to support LLM APIs, focusing on accessing the OpenAI API. We’ll cover topics such as enforcing security policies, managing API keys, and optimizing traffic routing. By integrating Gloo Gateway, organizations can streamline access to LLM APIs, enforce security policies, and simplify API management, enabling the development of innovative AI-driven applications.

What Is an API Gateway?

API gateways serve as a bridge between clients (API consumers) and backend services, serving as a single point of contact.

They offer a range of functionalities, such as request routing, protocol translation, security, monitoring, and API version management. By consolidating these features into a single service, API gateways simplify managing and securing API traffic, enhancing the system’s overall reliability and performance.

What Is an LLM API?

Large Language Model (LLM) APIs, such as Open AI, Mistral, and Gemini provide access to advanced natural language processing capabilities offered by language models like GPT. These APIs allow developers to use natural language prompts as inputs to the API, enabling tasks such as text analysis, content generation, translation, and conversational interfaces.

Compliance With Organizational Standards

API gateways play a crucial role in enforcing security policies and ensuring that applications comply with organizational standards established by security and infosec teams. LLM APIs are not subject to different requirements from any other API within the organization when it comes to preventing unauthorized access and maintaining the integrity and confidentiality of the data processed.

As the industry rushes to build new applications that consume LLM APIs, careful consideration needs to be made by the organization on establishing clear usage policies and high security standards of these public AI models.

LLM API Security

Let’s illustrate this scenario: To mitigate the risk of leaking customer data, the security team mandates that all applications need to use an internal gateway as the endpoint to access LLM APIs rather than connecting directly. By doing so, the gateway team can centralize control over traffic routing, observability, and security policies, enhancing the organization’s overall security posture.

While a security strategy is important, it is also critical to implement a strategy that minimally compromises developer agility. Platform and security teams looking to better control or at minimum monitor the outflow of requests to public LLMs should look for solutions that provide benefits back to their consumers, in this case the developers of these AI-enabled applications!

API gateways such as Gloo Gateway can provide:

  • A unified endpoint to standardize internal API interfaces, with access to multiple LLM backends
  • Authentication strategies to streamline access management and security in accordance with org-wide authentication standards (for example issuing and refreshing individual API keys)
  • Rate limiting strategies at the gateway and route level
  • Traffic monitoring and logging
  • Transformations for request/response and header shaping
  • And many other features

Leveraging an API gateway to support LLM APIs can provide a robust solution for organizations seeking to enhance the security and consistency of their API infrastructure.

Configuring Gloo Gateway for LLMs

Assuming you’ve already set up Gloo Gateway, the next step is to use the ExternalService CRD and configure the LLM APIs you want the Gloo Gateway to handle. In this example, we’ll be using the OpenAI API; however, you can do this with other LLM APIs as well.

Here’s how you would create an ExternalService to represent the OpenAI API:

apiVersion: networking.gloo.solo.io/v2
kind: ExternalService
metadata:
  name: openai-api
spec:
  hosts:
  - api.openai.com
  ports:
  - name: https
    number: 443
    protocol: HTTPS
    clientsideTls: {}

Then we can use a route table to configure Gloo Gateway to route to the external service. For example:

apiVersion: networking.gloo.solo.io/v2
kind: RouteTable
metadata:
  name: direct-to-openai-routetable
spec:
  hosts:
    - '*'
  virtualGateways:
    - name: istio-ingressgateway
      namespace: gloo-mesh-gateways
      cluster: mycluster
  workloadSelectors: []
  http:
    - name: catch-all
      labels:
        route: openai
      matchers:
      - uri:
          prefix: /openai
      forwardTo:
        pathRewrite: /v1/chat/completions
        hostRewrite: api.openai.com
        destinations:
        - kind: EXTERNAL_SERVICE
          port:
            number: 443
          ref:
            name: openai-api
            namespace: default

The above RouteTable is attached to a virtual gateway and listens for all hosts. Note that this is where you’d configure the actual hostname for your gateway, for example, mycompany.domain.com.

spec:
  hosts:
    - mycompany.domain.com
  virtualGateways:
    - name: istio-ingressgateway
      namespace: gloo-mesh-gateways
      cluster: mycluster

The RouteTable matches the incoming requests based on the configuration in the routes specified. In this case the prefix /openai on the catch-all route.

http:
    - name: catch-all
      labels:
        route: openai
      matchers:
      - uri:
          prefix: /openai
      forwardTo:
        pathRewrite: /v1/chat/completions
        hostRewrite: api.openai.com
        destinations:
        - kind: EXTERNAL_SERVICE
          port:
            number: 443
          ref:
            name: openai-api
            namespace: default

Once the incoming requests are matched on the prefix, the request is forwarded to the OpenAI external service. Before the request is forwarded, the host is re-written to api.openai.com and the path to /v1/chat/completions.

This configuration allows us to accept traffic destined for example to mycompany.domain.com/openai to resolve to the OpenAI Completions API external service endpoint at api.openai.com/v1/chat/completions. Note that here we are specifying /v1/chat/completions in the prefix rewrite as an additional mechanism to scope the usage of this route down to the Completions API endpoint only – however this is completely configurable by the platform team. This provides precise control over gradually rolling out additional LLM capabilities such as routing to the image generations endpoint at /v1/images/generations using an organization specific route path such as /openai/images.

At this point, you can send the request to Gloo Gateway and see that it is forwarded to the OpenAI API. Note the $GATEWAY points to the Gloo Gateway external address.

Let’s try this:

curl http://$GATEWAY/openai -H "Content-Type: application/json"   -H "Authorization: Bearer $OPENAI_API_KEY" -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {
        "role": "system",
        "content": "You are a solutions architect for kubernetes networking, skilled in explaining complex technical concepts surrounding API Gateway and LLM Models"
      },
      {
        "role": "user",
        "content": "Write me a 30 second pitch on why I should use an API gateway in front of my LLM backends"
      }
    ]
  }'

It works! We can access the Open AI API through the Gloo Gateway with the /openai path routing to the OpenAI LLM completions endpoint.

We can already see the benefits of implementing an “LLM Proxy” in comparison to configuring a curl command to each public LLM directly.

OpenAI requires the API key as a header:

curl https://api.openai.com/v1/chat/completions   -H "Content-Type: application/json"   -H "Authorization: Bearer $OPENAI_API_KEY" -d ‘{$PROMPT}’

Gemini requires the API key as a query parameter:

curl https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent?key='$GEMINI_API_KEY' -H "Content-Type: application/json" -d ‘{$PROMPT}’

Instead we can provide a unified consumer endpoint such as
mycompany.domain.com/openai
mycompany.domain.com/gemini

Protecting the Gloo Gateway With an API Key

Let’s explore how we can continue to optimize the security and developer productivity experience. Using an ExtAuthPolicy resource, you can enforce authentication and authorization for the traffic reaching the gateway. Gloo Gateway supports multiple types of external auth policies, but for this example we’ll use a simple API key to authenticate the requests.

For the following example, we will edit our current direct-to-openai-routetable so that we rewrite to / as this example consumes the /v1/models endpoint:

forwardTo:
  pathRewrite: /
  hostRewrite: api.openai.com
  destinations:
  - kind: EXTERNAL_SERVICE
    port:
      number: 443
    ref:
      name: openai-api
      namespace: default

Now create a Kubernetes Secret, in which we specify the API key a caller must provide to be authenticated by the Gloo Gateway:

apiVersion: v1
kind: Secret
metadata:
  name: gw-api-key
  labels:
    api-key: api-gateway
type: extauth.solo.io/apikey
data:
  api-key: bXlzZWNyZXRrZXk= # Base64 encoded value "mysecretkey"

Note the value for the api-key must be base64 encoded.

Before you can create an ExtAuthPolicy, you must configure the external auth server using the ExtAuthServer resource:

apiVersion: admin.gloo.solo.io/v2
kind: ExtAuthServer
metadata:
  name: ext-auth-server
spec:
  destinationServer:
    port:
      number: 8083
    ref:
      cluster: mycluster
      name: ext-auth-service
      namespace: gloo-mesh-addons

For Gloo Gateway to enforce the API key authentication, you’ll create an ExtAuthPolicy resource. The resource tells the external auth server you configured previously to use the API key authentication with the key provided in the Kubernetes Secret.

apiVersion: security.policy.gloo.solo.io/v2
kind: ExtAuthPolicy
metadata:
  name: gateway-apikey
spec:
  applyToRoutes:
    - route:
        labels:
          route: openai 
  config:
    server:
      name: ext-auth-server
      namespace: default
      cluster: mycluster
    glooAuth:
      configs:
      - apiKeyAuth:
          headerName: api-key
          k8sSecretApikeyStorage:
            labelSelector:
              api-key: api-gateway

The above policy configures the apiKeyAuth by pointing to a Kubernetes Secret using label selectors and tells the Gloo Gateway to compare the value provided in the api-key header name with the value read from the Kubernetes Secret.

This time, If you repeat the same request as before, you’ll get an HTTP 401 Unauthorized message:

curl -v http://$GATEWAY/openai/v1/models
...

< HTTP/1.1 401 Unauthorized
< www-authenticate: API key is missing or invalid

The message is telling us that the Gloo Gateway is expecting an API key, but we haven’t provided one. Let’s try again with the API key value we set in the Kubernetes Secret:

curl -v -H "api-key: mysecretkey" http://$GATEWAY/openai/v1/models

Gloo Gateway lets the request through; however, we still received an HTTP 401 message, this time from the OpenAI API.

The OpenAI API expects an OpenAI API key in the authorization header, following this format: Authorization: Bearer OPENAI_API_KEY. Instead of including this header for each request, we could automatically attach it for all requests sent to the OpenAI API at the gateway level.

Let’s modify the original Kubernetes Secret to include the OpenAI API key. Then, we can extract the OpenAI API key from the secret and add it to a header passed to the Gloo Gateway.

apiVersion: v1
kind: Secret
metadata:
  name: gw-api-key
  labels:
    api-key: api-gateway
type: extauth.solo.io/apikey
data:
  api-key: bXlzZWNyZXRrZXk= # Base64 encoded value "mysecretkey"
  openai-api-key: <REPLACE WITH BASE64 ENCODED OPEN AI API KEY>

You can update the ExtAuthPolicy to read the openai-api-key value from the Secret and put it in a header called x-api-key:

apiVersion: security.policy.gloo.solo.io/v2
kind: ExtAuthPolicy
metadata:
  name: gateway-apikey
spec:
  applyToRoutes:
    - route:
        labels:
          route: openai
  config:
    server:
      name: ext-auth-server
      namespace: default
      cluster: mycluster
    glooAuth:
      configs:
      - apiKeyAuth:
          headerName: api-key
          headersFromMetadataEntry:
            x-api-key: 
              name: openai-api-key
          k8sSecretApikeyStorage:
            labelSelector:
              api-key: api-gateway

At this point, the external auth policy assigns the OpenAI API key value to the header called x-api-key. Still, we must tell the Gloo Gateway to construct the authorization header in a format that OpenAI API expects. We can do that with the TransformationPolicy resource.

apiVersion: trafficcontrol.policy.gloo.solo.io/v2
kind: TransformationPolicy
metadata:
  name: openai-transformations
spec:
  applyToRoutes:
  - route:
      labels:
        route: openai
  config:
    request:
      injaTemplate:
        headers:
          Authorization:
            text: 'Bearer {{ openai_api_key }}'
        extractors:
          openai_api_key:
            header: 'x-api-key'
            regex: '.*'

The policy uses an inja template to extract a value from the x-api-key header and store it in the openai_api_key variable. In the headers field we’re then specifying the value of the authorization header by concatenating the word Bearer with the value of the openai_api_key.

If you send a request to the gateway, you’ll get back a valid response from the OpenAI API:

curl -v -H "api-key: mysecretkey" $GATEWAY/openai/v1/models
... 
{
  "object": "list",
  "data": [
    {
      "id": "whisper-1",
      "object": "model",
      "created": 1677532384,
      "owned_by": "openai-internal"
    },
    {
      "id": "davinci-002",
      "object": "model",
      "created": 1692634301,
      "owned_by": "system"
    },
...

Let’s also try sending a simple prompt to the /v1/chat/completions endpoint:

curl -H "content-type: application/json" -H "api-key: mysecretkey" -d '{"model": "gpt-3.5-turbo", "messages": [{ "role": "user", "content": "Tell me about Gloo Gateway."}]}' http://$GATEWAY/openai/v1/chat/completions 
...
{
  "id": "chatcmpl-9GwUcUu3WOQNNUoTDXQfKl4W9Mrpq",
  "object": "chat.completion",
  "created": 1713825122,
  "model": "gpt-3.5-turbo-0125",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Gloo Gateway is an open-source API gateway and ingress controller built on top of the Envoy Proxy, which is a high-performance proxy developed by Lyft. It is designed to provide a secure, reliable, and scalable way to manage and control traffic to microservices and serverless applications.\n\nGloo Gateway offers features such as routing, load balancing, caching, and security capabilities like rate limiting, authentication, and authorization. It can be easily integrated with various platforms and services, making it a versatile tool for managing and securing APIs.\n\nGloo Gateway also supports modern cloud-native architectures, such as Kubernetes, Istio, and Knative, making it a good choice for organizations looking to adopt microservices or serverless technologies.\n\nOverall, Gloo Gateway is a powerful and flexible API gateway solution that helps organizations efficiently manage and secure their APIs in a cloud-native environment."
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 14,
    "completion_tokens": 171,
    "total_tokens": 185
  },
  "system_fingerprint": "fp_c2295e73ad"
}

To summarize, in this example we have added an additional layer of security on our API gateway that allows us to mask our LLM API key(s) using custom headers. This shifts the control of what public LLM API keys are being consumed and how they are created and rotated into the platform team’s hands, reducing individual/team API key sprawl.

Customizing LLM API Definition

Notice the requests to the chat completion API are more complex and can include things such as:

  • Prompts (user and/or system)
  • Model
  • Temperature
  • Frequency penalty
  • Logit bias
  • Maximum number of tokens
  • Number of chat completions
  • and many more.

Just like we automatically attach an API key to the requests, we can also use transformation policies to create prompt templates, handle query parameters and API Key substitution, modify the request body before being sent to the OpenAI API, and much more. Take a look at some additional in-depth examples of how to configure Gloo Gateway for LLM backends.

Additionally, not all LLM APIs have the exact API definition, which requires developers to learn each API separately. To solve this, you can customize the LLM API definition you want to provide to your applications and let the Gloo Gateway transform it into the correct backed LLM API definition.

Here’s an example of a transformation policy that takes a prompt from the client requests and adds a model name and a system message at the API gateway level:

apiVersion: trafficcontrol.policy.gloo.solo.io/v2
kind: TransformationPolicy
metadata:
  name: openai-transformations
spec:
  applyToRoutes:
  - route:
      labels:
        route: openai
  config:
    request:
      injaTemplate:
        body:
          text: |
            {
              "model": "gpt-3.5-turbo",
              "messages": [
                {
                  "role": "system",
                  "content": "You will always respond in JSON that includes the product and pitch attribute."
                },
                {
                  "role": "user",
                  "content": "{{ prompt }}"
                }
              ]
            }

Note the {{ prompt }} is Inja notation that will extract the attribute value (prompt) from the request body. If you send a request to the gateway, you’ll see the response is formatted according to the prompt, and we didn’t have to follow the API definition of the backing LLM API – we only sent the prompt value.

Here’s the response we passed through the jq tool to retrieve only the actual content of the message:

curl http://$GATEWAY/v1/chat/completions -H "Content-Type: application/json" -d '{"prompt":  "Write me a 30 second pitch on why I should use Gloo Gateway in front of my LLM backends"}' | jq -r '.choices[0].message.content' 

{
    "product": "Gloo Gateway",
    "pitch": "With Gloo Gateway, you can easily manage traffic, secure communication, and optimize performance for your LLM backends. By using Gloo Gateway, you can ensure reliability, scalability, and flexibility for your architecture. Simplify your deployment process and gain full control over your API traffic with built-in security features. Increase your backend's efficiency and streamline your operations with Gloo Gateway."
}

We can take this even further. Instead of using jq to parse the response at the client, we could update the response from OpenAI and only return the content that the caller cares about:

apiVersion: trafficcontrol.policy.gloo.solo.io/v2
kind: TransformationPolicy
metadata:
  name: openai-transformations
spec:
  applyToRoutes:
  - route:
      labels:
        route: openai
  config:
    request:
      injaTemplate:
        body:
          text: |
            {
              "model": "gpt-3.5-turbo",
              "messages": [
                {
                  "role": "system",
                  "content": "You will always respond in JSON that includes the product and pitch attribute."
                },
                {
                  "role": "user",
                  "content": "{{ prompt }}"
                }
              ]
            }
    response:
      injaTemplate:
        body:
          text: | 
            {{ choices.0.message.content  }}

We added the response field to the configuration and defined another inja template that extracts the specific field from the response we receive from the OpenAI.

Take a look at some additional in-depth examples of how to configure Gloo Gateway for LLM backends.

Get Started With Gloo Gateway and LLM APIs

Using Gloo Gateway to support LLM APIs offers a robust solution for organizations seeking to enhance their API infrastructure’s security, reliability, and performance.

Organizations can enforce security policies, control traffic routing, and simplify API management by centralizing access through the gateway. Developers benefit from a unified internal endpoint, reducing the complexity of managing individual API keys and ensuring application consistency.

If you’re building or considering building with LLM/GenAI APIs, please complete this 30-second survey.

Contact Solo.io to learn how we can help you leverage Gloo Gateway for your LLM APIs and other API management needs.