Introducing Gloo AI Gateway
The expectations and promise of Generative AI (GenAI) continue to grow at an astounding pace, reflected in a transition from using ChatGPT in our personal lives to incorporating GenAI into the applications and platforms we build at work. A recent State of IT report confirms this transition, revealing that 86% of IT leaders expect GenAI will play a prominent role at their organizations in the near future. Of course, we all know that great power must be balanced with great responsibility, leaving many leaders grappling to understand how GenAI can be integrated with their applications safely, securely, and at scale. AI adoption surveys back this up with the following concerns in the top three areas for GenAI adoption:
- Introduction of new security threats (71%)
- Skills gap to use GenAI successfully (66%)
- Inability to integrate with existing tech stack (60%)
As organizations move past the initial development and proof of concept phases with GenAI projects, these concerns present massive challenges to getting these projects into production. At Solo, we see this regularly within our customer base and have leveraged this experience to build a comprehensive set of AI-enabling capabilities to support security, observability, control, and governance for GenAI integration in the enterprise. Today, we’re very excited to announce the release of these AI-enabling capabilities in Gloo AI Gateway. Building on the foundation of Gloo Gateway, Gloo AI Gateway integrates AI-enabling features natively within Envoy Proxy and Kubernetes Gateway API to fast-track AI application development, consistently enforce safety and security controls, and support advanced AI integration use cases as adoption expands and scales. Read on for some feature highlights in all three of these categories.
AI Safety and Security Controls
Safety and security are consistently the top concern of organizations in GenAI projects. The prevalence of this concern prompted OWASP to produce a new Top 10 for LLM applications. Examples of these controls include secure credential management, fine-grained authorization across LLMs and models, data and model exfiltration protections, and prompt guarding to protect against deliberate and inadvertent model exploits. Gloo AI Gateway implements these controls as native Envoy features configured via Kubernetes Gateway API. Let’s take an example of adding fine-grained authorization to limit access to a Mistral model containing account information and guarding against prompts requesting credit card details. This can be accomplished by configuring two RouteOption policies to Gloo AI Gateway:
Fast-track Your AI Development
The vast majority of development teams integrating LLM APIs with their applications for GenAI use cases are doing so for the first time. These teams want to focus on the primary business value of embedding LLM capabilities, not on undifferentiated heavy lifting associated with LLM provider integrations. Gloo AI Gateway offloads the responsibility of credential management, prompt management, LLM provider integration, rate limiting to ensure fair and balanced use across providers, and usage reporting to track consumption for internal showback/chargeback of LLM costs. Looking at a concrete example, enabling token-weighted rate limiting for any client application of an OpenAI GPT 3.5 model and injecting a system prompt on all requests can be configured independent from applications in Gloo AI Gateway using the following policies:
Expand and Scale with an AI Gateway
After initial GenAI projects are delivered and begin to grow, the focus changes from foundational concerns of security, observability, and governance to enriching, optimizing, and scaling use cases. Whether this occurs in phase one or phase five for your GenAI initiative, Gloo AI Gateway supports a broad range of advanced capabilities for LLMs. Examples include optimizing LLM response latency with semantic caching, A/B testing models for cost and quality, limiting hallucinations due to incomplete or out-of-date information in target models, and failover between models and providers to assure high availability of LLMs. All of these capabilities can be provided directly by Gloo AI Gateway without impact to existing applications or complex management with cloud or LLM provider specific configuration. As an example, the following policies enable retrieval-augmented generation (RAG) for dynamic context injection using OpenAI along with the ability to failover to a GPT 3.5 model if the GPT 4 model is rate limited or unavailable.
This has been a brief overview of a sample of the incredible features in Gloo AI Gateway. If you’re interested in learning more, please check out these resources to dig into more details:
- Gloo AI Gateway product page
- Gloo AI Gateway documentation
- Self-paced workshop: Credentials and Access Control
- Self-paced workshop: Prompt Management and Prompt Guards
- Self-paced workshop: Rate Limiting and Model Failover
- Self-paced workshop: RAG and Semantic Caching
- Reach out to us to get a full demo or trial