Using Gloo Gateway to Support LLM APIs

LLM providers like OpenAI and Gemini offer advanced natural language processing capabilities, enabling innovative applications. An API gateway like Gloo Gateway is essential to efficiently and securely access these APIs. Gloo Gateway acts as a central hub, providing a unified endpoint and essential functionalities such as request routing, security enforcement, and monitoring.

If you’re building or considering building with LLM/GenAI APIs, please take our 30-second survey — we’d love to talk with you.

This article explores how Gloo Gateway can be configured to support LLM APIs, focusing on accessing the OpenAI API. We’ll cover topics such as enforcing security policies, managing API keys, and optimizing traffic routing. By integrating Gloo Gateway, organizations can streamline access to LLM APIs, enforce security policies, and simplify API management, enabling the development of innovative AI-driven applications.

What Is an API Gateway?

API gateways serve as a bridge between clients (API consumers) and backend services, serving as a single point of contact.

They offer a range of functionalities, such as request routing, protocol translation, security, monitoring, and API version management. By consolidating these features into a single service, API gateways simplify managing and securing API traffic, enhancing the system’s overall reliability and performance.

What Is an LLM API?

Large Language Model (LLM) APIs, such as Open AI, Mistral, and Gemini provide access to advanced natural language processing capabilities offered by language models like GPT. These APIs allow developers to use natural language prompts as inputs to the API, enabling tasks such as text analysis, content generation, translation, and conversational interfaces.

Compliance With Organizational Standards

API gateways play a crucial role in enforcing security policies and ensuring that applications comply with organizational standards established by security and infosec teams. LLM APIs are not subject to different requirements from any other API within the organization when it comes to preventing unauthorized access and maintaining the integrity and confidentiality of the data processed.

As the industry rushes to build new applications that consume LLM APIs, careful consideration needs to be made by the organization on establishing clear usage policies and high security standards of these public AI models.

Let’s illustrate this scenario: To mitigate the risk of leaking customer data, the security team mandates that all applications need to use an internal gateway as the endpoint to access LLM APIs rather than connecting directly. By doing so, the gateway team can centralize control over traffic routing, observability, and security policies, enhancing the organization’s overall security posture.

While a security strategy is important, it is also critical to implement a strategy that minimally compromises developer agility. Platform and security teams looking to better control or at minimum monitor the outflow of requests to public LLMs should look for solutions that provide benefits back to their consumers, in this case the developers of these AI-enabled applications!

API gateways such as Gloo Gateway can provide:

A unified endpoint to standardize internal API interfaces, with access to multiple LLM backends
Authentication strategies to streamline access management and security in accordance with org-wide authentication standards (for example issuing and refreshing individual API keys)
Rate limiting strategies at the gateway and route level
Traffic monitoring and logging
Transformations for request/response and header shaping
And many other features

Leveraging an API gateway to support LLM APIs can provide a robust solution for organizations seeking to enhance the security and consistency of their API infrastructure.

Configuring Gloo Gateway for LLMs

Assuming you’ve already set up Gloo Gateway, the next step is to use the ExternalService CRD and configure the LLM APIs you want the Gloo Gateway to handle. In this example, we’ll be using the OpenAI API; however, you can do this with other LLM APIs as well.

Here’s how you would create an ExternalService to represent the OpenAI API:

apiVersion: networking.gloo.solo.io/v2
kind: ExternalService
metadata:
name: openai-api
spec:
hosts:
- api.openai.com
ports:
- name: https
number: 443
protocol: HTTPS
clientsideTls: {}

Then we can use a route table to configure Gloo Gateway to route to the external service. For example:

apiVersion: networking.gloo.solo.io/v2
kind: RouteTable
metadata:
name: direct-to-openai-routetable
spec:
hosts:
- '*'
virtualGateways:
- name: istio-ingressgateway
namespace: gloo-mesh-gateways
cluster: mycluster
workloadSelectors: []
http:
- name: catch-all
labels:
route: openai
matchers:
- uri:
prefix: /openai
forwardTo:
pathRewrite: /v1/chat/completions
hostRewrite: api.openai.com
destinations:
- kind: EXTERNAL_SERVICE
port:
number: 443
ref:
name: openai-api
namespace: default

The above RouteTable is attached to a virtual gateway and listens for all hosts. Note that this is where you’d configure the actual hostname for your gateway, for example, mycompany.domain.com.

spec:
hosts:
- mycompany.domain.com
virtualGateways:
- name: istio-ingressgateway
namespace: gloo-mesh-gateways
cluster: mycluster

The RouteTable matches the incoming requests based on the configuration in the routes specified. In this case the prefix /openai on the catch-all route.

http:
- name: catch-all
labels:
route: openai
matchers:
- uri:
prefix: /openai
forwardTo:
pathRewrite: /v1/chat/completions
hostRewrite: api.openai.com
destinations:
- kind: EXTERNAL_SERVICE
port:
number: 443
ref:
name: openai-api
namespace: default

Once the incoming requests are matched on the prefix, the request is forwarded to the OpenAI external service. Before the request is forwarded, the host is re-written to api.openai.com and the path to /v1/chat/completions.

This configuration allows us to accept traffic destined for example to mycompany.domain.com/openai to resolve to the OpenAI Completions API external service endpoint at api.openai.com/v1/chat/completions. Note that here we are specifying /v1/chat/completions in the prefix rewrite as an additional mechanism to scope the usage of this route down to the Completions API endpoint only – however this is completely configurable by the platform team. This provides precise control over gradually rolling out additional LLM capabilities such as routing to the image generations endpoint at /v1/images/generations using an organization specific route path such as /openai/images.

At this point, you can send the request to Gloo Gateway and see that it is forwarded to the OpenAI API. Note the $GATEWAY points to the Gloo Gateway external address.

Let’s try this:

curl http://$GATEWAY/openai -H "Content-Type: application/json" -H "Authorization: Bearer $OPENAI_API_KEY" -d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": "You are a solutions architect for kubernetes networking, skilled in explaining complex technical concepts surrounding API Gateway and LLM Models"
},
{
"role": "user",
"content": "Write me a 30 second pitch on why I should use an API gateway in front of my LLM backends"
}
]
}'

It works! We can access the Open AI API through the Gloo Gateway with the /openai path routing to the OpenAI LLM completions endpoint.

We can already see the benefits of implementing an “LLM Proxy” in comparison to configuring a curl command to each public LLM directly.

OpenAI requires the API key as a header:

curl https://api.openai.com/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer $OPENAI_API_KEY" -d ‘{$PROMPT}’

Gemini requires the API key as a query parameter:

curl https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent?key='$GEMINI_API_KEY' -H "Content-Type: application/json" -d ‘{$PROMPT}’

Instead we can provide a unified consumer endpoint such as
mycompany.domain.com/openai mycompany.domain.com/gemini

Protecting the Gloo Gateway With an API Key

Let’s explore how we can continue to optimize the security and developer productivity experience. Using an ExtAuthPolicy resource, you can enforce authentication and authorization for the traffic reaching the gateway. Gloo Gateway supports multiple types of external auth policies, but for this example we’ll use a simple API key to authenticate the requests.

For the following example, we will edit our current direct-to-openai-routetable so that we rewrite to / as this example consumes the /v1/models endpoint:

forwardTo:
pathRewrite: /
hostRewrite: api.openai.com
destinations:
- kind: EXTERNAL_SERVICE
port:
number: 443
ref:
name: openai-api
namespace: default

Now create a Kubernetes Secret, in which we specify the API key a caller must provide to be authenticated by the Gloo Gateway:

apiVersion: v1
kind: Secret
metadata:
name: gw-api-key
labels:
api-key: api-gateway
type: extauth.solo.io/apikey
data:
api-key: bXlzZWNyZXRrZXk= # Base64 encoded value "mysecretkey"

Note the value for the api-key must be base64 encoded.

Before you can create an ExtAuthPolicy, you must configure the external auth server using the ExtAuthServer resource:

apiVersion: admin.gloo.solo.io/v2
kind: ExtAuthServer
metadata:
name: ext-auth-server
spec:
destinationServer:
port:
number: 8083
ref:
cluster: mycluster
name: ext-auth-service
namespace: gloo-mesh-addons

For Gloo Gateway to enforce the API key authentication, you’ll create an ExtAuthPolicy resource. The resource tells the external auth server you configured previously to use the API key authentication with the key provided in the Kubernetes Secret.

apiVersion: security.policy.gloo.solo.io/v2
kind: ExtAuthPolicy
metadata:
name: gateway-apikey
spec:
applyToRoutes:
- route:
labels:
route: openai
config:
server:
name: ext-auth-server
namespace: default
cluster: mycluster
glooAuth:
configs:
- apiKeyAuth:
headerName: api-key
k8sSecretApikeyStorage:
labelSelector:
api-key: api-gateway

The above policy configures the apiKeyAuth by pointing to a Kubernetes Secret using label selectors and tells the Gloo Gateway to compare the value provided in the api-key header name with the value read from the Kubernetes Secret.

This time, If you repeat the same request as before, you’ll get an HTTP 401 Unauthorized message:

curl -v http://$GATEWAY/openai/v1/models
...

< HTTP/1.1 401 Unauthorized
< www-authenticate: API key is missing or invalid

The message is telling us that the Gloo Gateway is expecting an API key, but we haven’t provided one. Let’s try again with the API key value we set in the Kubernetes Secret:

curl -v -H "api-key: mysecretkey" http://$GATEWAY/openai/v1/models

Gloo Gateway lets the request through; however, we still received an HTTP 401 message, this time from the OpenAI API.

The OpenAI API expects an OpenAI API key in the authorization header, following this format: Authorization: Bearer OPENAI_API_KEY. Instead of including this header for each request, we could automatically attach it for all requests sent to the OpenAI API at the gateway level.

Let’s modify the original Kubernetes Secret to include the OpenAI API key. Then, we can extract the OpenAI API key from the secret and add it to a header passed to the Gloo Gateway.

apiVersion: v1
kind: Secret
metadata:
name: gw-api-key
labels:
api-key: api-gateway
type: extauth.solo.io/apikey
data:
api-key: bXlzZWNyZXRrZXk= # Base64 encoded value "mysecretkey"
openai-api-key: <REPLACE WITH BASE64 ENCODED OPEN AI API KEY>

You can update the ExtAuthPolicy to read the openai-api-key value from the Secret and put it in a header called x-api-key:

apiVersion: security.policy.gloo.solo.io/v2
kind: ExtAuthPolicy
metadata:
name: gateway-apikey
spec:
applyToRoutes:
- route:
labels:
route: openai
config:
server:
name: ext-auth-server
namespace: default
cluster: mycluster
glooAuth:
configs:
- apiKeyAuth:
headerName: api-key
headersFromMetadataEntry:
x-api-key:
name: openai-api-key
k8sSecretApikeyStorage:
labelSelector:
api-key: api-gateway

At this point, the external auth policy assigns the OpenAI API key value to the header called x-api-key. Still, we must tell the Gloo Gateway to construct the authorization header in a format that OpenAI API expects. We can do that with the TransformationPolicy resource.

apiVersion: trafficcontrol.policy.gloo.solo.io/v2
kind: TransformationPolicy
metadata:
name: openai-transformations
spec:
applyToRoutes:
- route:
labels:
route: openai
config:
request:
injaTemplate:
headers:
Authorization:
text: 'Bearer {{ openai_api_key }}'
extractors:
openai_api_key:
header: 'x-api-key'
regex: '.*'

The policy uses an inja template to extract a value from the x-api-key header and store it in the openai_api_key variable. In the headers field we’re then specifying the value of the authorization header by concatenating the word Bearer with the value of the openai_api_key.

If you send a request to the gateway, you’ll get back a valid response from the OpenAI API:

curl -v -H "api-key: mysecretkey" $GATEWAY/openai/v1/models
...
{
"object": "list",
"data": [
{
"id": "whisper-1",
"object": "model",
"created": 1677532384,
"owned_by": "openai-internal"
},
{
"id": "davinci-002",
"object": "model",
"created": 1692634301,
"owned_by": "system"
},
...

Let’s also try sending a simple prompt to the /v1/chat/completions endpoint:

curl -H "content-type: application/json" -H "api-key: mysecretkey" -d '{"model": "gpt-3.5-turbo", "messages": [{ "role": "user", "content": "Tell me about Gloo Gateway."}]}' http://$GATEWAY/openai/v1/chat/completions
...
{
"id": "chatcmpl-9GwUcUu3WOQNNUoTDXQfKl4W9Mrpq",
"object": "chat.completion",
"created": 1713825122,
"model": "gpt-3.5-turbo-0125",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Gloo Gateway is an open-source API gateway and ingress controller built on top of the Envoy Proxy, which is a high-performance proxy developed by Lyft. It is designed to provide a secure, reliable, and scalable way to manage and control traffic to microservices and serverless applications.\n\nGloo Gateway offers features such as routing, load balancing, caching, and security capabilities like rate limiting, authentication, and authorization. It can be easily integrated with various platforms and services, making it a versatile tool for managing and securing APIs.\n\nGloo Gateway also supports modern cloud-native architectures, such as Kubernetes, Istio, and Knative, making it a good choice for organizations looking to adopt microservices or serverless technologies.\n\nOverall, Gloo Gateway is a powerful and flexible API gateway solution that helps organizations efficiently manage and secure their APIs in a cloud-native environment."
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 14,
"completion_tokens": 171,
"total_tokens": 185
},
"system_fingerprint": "fp_c2295e73ad"
}

To summarize, in this example we have added an additional layer of security on our API gateway that allows us to mask our LLM API key(s) using custom headers. This shifts the control of what public LLM API keys are being consumed and how they are created and rotated into the platform team’s hands, reducing individual/team API key sprawl.

Customizing LLM API Definition

Notice the requests to the chat completion API are more complex and can include things such as:

Prompts (user and/or system)
Model
Temperature
Frequency penalty
Logit bias
Maximum number of tokens
Number of chat completions
and many more.

Just like we automatically attach an API key to the requests, we can also use transformation policies to create prompt templates, handle query parameters and API Key substitution, modify the request body before being sent to the OpenAI API, and much more. Take a look at some additional in-depth examples of how to configure Gloo Gateway for LLM backends.

Additionally, not all LLM APIs have the exact API definition, which requires developers to learn each API separately. To solve this, you can customize the LLM API definition you want to provide to your applications and let the Gloo Gateway transform it into the correct backed LLM API definition.

Here’s an example of a transformation policy that takes a prompt from the client requests and adds a model name and a system message at the API gateway level:

apiVersion: trafficcontrol.policy.gloo.solo.io/v2
kind: TransformationPolicy
metadata:
name: openai-transformations
spec:
applyToRoutes:
- route:
labels:
route: openai
config:
request:
injaTemplate:
body:
text: |
{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": "You will always respond in JSON that includes the product and pitch attribute."
},
{
"role": "user",
"content": "{{ prompt }}"
}
]
}

Note the {{ prompt }} is Inja notation that will extract the attribute value (prompt) from the request body. If you send a request to the gateway, you’ll see the response is formatted according to the prompt, and we didn’t have to follow the API definition of the backing LLM API – we only sent the prompt value.

Here’s the response we passed through the jq tool to retrieve only the actual content of the message:

curl http://$GATEWAY/v1/chat/completions -H "Content-Type: application/json" -d '{"prompt": "Write me a 30 second pitch on why I should use Gloo Gateway in front of my LLM backends"}' | jq -r '.choices[0].message.content'

{
"product": "Gloo Gateway",
"pitch": "With Gloo Gateway, you can easily manage traffic, secure communication, and optimize performance for your LLM backends. By using Gloo Gateway, you can ensure reliability, scalability, and flexibility for your architecture. Simplify your deployment process and gain full control over your API traffic with built-in security features. Increase your backend's efficiency and streamline your operations with Gloo Gateway."
}

We can take this even further. Instead of using jq to parse the response at the client, we could update the response from OpenAI and only return the content that the caller cares about:

apiVersion: trafficcontrol.policy.gloo.solo.io/v2
kind: TransformationPolicy
metadata:
name: openai-transformations
spec:
applyToRoutes:
- route:
labels:
route: openai
config:
request:
injaTemplate:
body:
text: |
{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": "You will always respond in JSON that includes the product and pitch attribute."
},
{
"role": "user",
"content": "{{ prompt }}"
}
]
}
response:
injaTemplate:
body:
text: |
{{ choices.0.message.content }}

We added the response field to the configuration and defined another inja template that extracts the specific field from the response we receive from the OpenAI.

Take a look at some additional in-depth examples of how to configure Gloo Gateway for LLM backends.

Get Started With Gloo Gateway and LLM APIs

Using Gloo Gateway to support LLM APIs offers a robust solution for organizations seeking to enhance their API infrastructure’s security, reliability, and performance.

Organizations can enforce security policies, control traffic routing, and simplify API management by centralizing access through the gateway. Developers benefit from a unified internal endpoint, reducing the complexity of managing individual API keys and ensuring application consistency.

If you’re building or considering building with LLM/GenAI APIs, please complete this 30-second survey.

Contact Solo.io to learn how we can help you leverage Gloo Gateway for your LLM APIs and other API management needs.

Using Gloo Gateway to Support LLM APIs

What Is an API Gateway?

What Is an LLM API?

Compliance With Organizational Standards

Configuring Gloo Gateway for LLMs

Protecting the Gloo Gateway With an API Key

Customizing LLM API Definition

Get Started With Gloo Gateway and LLM APIs

Featured content

Part Two: MCP Authorization The Hard Way

Part One: MCP Authorization The Hard Way

Agent Identity and Access Management - Can SPIFFE Work?

Deep Dive into llm-d and Distributed Inference

Gloo Mesh 2.8 simplifies service mesh operations with new enhanced user experience across multi-cluster environments.

Gloo Gateway 1.19 accelerates context-rich, real-time AI apps with Gateway API

llm-d: Distributed Inference Serving on Kubernetes

AI Reliability Engineering For More Dependable Humans

Kubernetes Identity the Right Way with SPIRE and Ambient

Optimizing GenAI in Production: High-Value Use Cases for AI Gateways

Solo.io Recognized as a Visionary in the 2024 Gartner® Magic Quadrant™ for API Management for the SECOND year in a row.

Guardians of the Governance: GenAI Gateway Guidance with GitOps and Gloo

Istio Ambient Waypoint Proxy explained

Hands-On with the Kubernetes Gateway API and Envoy Proxy: A Tutorial with GitOps and Gloo Gateway

Istio and the State of DevOps: Enhancing Key Metrics

What is an AI Gateway and its role in AI Applications?

Best practices for secure Istio deployment with Gloo Mesh Core

Gloo Mesh 2.6: Istio's Ambient mode now ready for production

HTTP Observability Without Compromises

Advance your knowledge of service mesh tech with Solo.io Academy certifications

Service Mesh for the developer workflow, a series

Challenges of adopting service mesh in enterprise organizations

Service Mesh in the Real World #2 — Ingress Traffic Control

Service Mesh in the Real World Video Series – Episode # 1: Egress Traffic

Service Mesh the easy way with AWS App Mesh and SuperGloo

Webinar Recap: Intro to Service Mesh Hub and SMI

D-TECK Uses Solo.io Gloo Gateway and Google Cloud to Help Businesses Make Better HR Decisions

Minimize the blast radius of changes with Solo.io Gloo Gateway and Weaveworks Flagger

Announcing Service Mesh Interface (SMI) Support and Collaboration

Service Mesh Interface (SMI) and our Vision for the Community and Ecosystem

The need for a standard, service mesh API

SuperGloo to the Rescue! Making it easier to write extensions for Service Mesh

Introducing The Service Mesh Hub -everything you need for your service mesh

Kubernetes Ingress Past, Present, and Future

Solo.io Streamlines Service Mesh and Serverless Adoption for Enterprises in Google Cloud

ParkMobile

Vonage

Domino’s Pizza

Gloo Mesh Feature Comparison

Service Mesh for Developers, Part 1: Exploring the Power of Observability and OpenTelemetry

Service Mesh at Scale

Compare Capabilities of the Top Service Mesh Platforms

Compare Capabilities of the Top API Gateways

Establishing zero trust security for modern cloud architectures

Unlocking the Power of Your API Gateway

API Gateways: Productivity, Resilience, and Security for Next-Generation Cloud Applications

Driving Business Value with Istio

Service Mesh Vendor Comparison

Istio Then & Now

4 Reasons Why You Need an AI Gateway

Gloo Gateway vs. Kong

Gloo Gateway vs. Apigee

3 Reasons You Need an API Gateway for Microservices Apps

Solo Academy Course: Service Mesh Basics

Solo Academy Course: Istio Basics

Solo Academy Course: Envoy Basics

Solo Academy Course: API Gateway Basics

Solo Academy Course: Get Started with Istio Service Mesh

Solo Academy Course: Introduction to Envoy Proxy

Solo Academy Course: Deploying Istio for Production

Kgateway Lab: Integrating kgateway with Istio at Ingress

Kgateway Lab: Kgateway as a Waypoint

Kgateway AI Lab: Consumption Reporting

Kgateway AI Lab: Deploying kgateway as an AI Gateway

Kagent Lab: How to build an AI agent

Kagent Lab: Integrate tools from MCP servers with kagent

Gloo AI Gateway Hands-On Lab: Semantic Caching

Kgateway AI Lab: Credentials Management

Kgateway AI Lab: Prompt Enrichment

Kgateway AI Lab: Prompt Guards

Ambient Mesh Lab: Migrating from Sidecar to Sidecarless