About 18-20 months ago, layering AI capabilities onto Envoy-based gateways made perfect sense for early LLM APIs. We leveraged Envoy's extensibility masterfully—external processors, WASM filters, even out-of-process Python for complex logic like token-aware rate limiting and prompt inspection. It got production deployments running quickly, and the market followed suit with similar "bolt-on" approaches.
But then MCP (Model Context Protocol) changed everything. Its stateful, bidirectional streaming nature exposed fundamental mismatches with Envoy's stateless, unidirectional design roots. Upstream Envoy contributions for MCP stayed surface-level, forcing most solutions into fragile out-of-process workarounds that added latency, complexity, and resource overhead.
Traditional API gateways were also blind to request bodies—treating traffic like opaque envelopes, inspecting headers only. That worked for classic REST APIs. But with LLM and agentic workloads, the critical context lives deep inside the JSON payload: the model name, prompt content for security and guardrails, token counts for rate limiting, MCP tool calls, and structured outputs. This demands a context-aware, body-inspecting gateway.
Rather than keep patching around these gaps, we made a different choice: start over with a data plane architected from the ground up for agentic workloads. That's the innovation at the heart of kgateway.

The Innovation: A Rust Data Plane Built for Agentic Traffic
Most AI gateway solutions today follow the same playbook: take an existing proxy, bolt on AI features through plugins or sidecars, and hope it holds together. That approach hits a ceiling fast—especially with stateful protocols like MCP that fundamentally clash with legacy architectures.
Kgateway takes a different path. Evolved from Solo.io's Gloo since 2018, it's already one of the most mature, production-hardened Kubernetes Gateway API implementations, powering billions of requests in enterprise environments. But the real innovation is agentgateway—a pure Rust data plane purpose-built from scratch for stateful, bidirectional agentic protocols.
This wasn't a side project or experiment. It built directly on our proven Rust expertise from ZTunnel, the ultra-efficient node proxy powering Istio Ambient Mesh at massive scale. By combining deep L7 insights from years of Envoy work with Rust's zero-cost abstractions, memory safety, and predictable performance (no garbage collector means no surprise pauses mid-request), agentgateway delivers capabilities that bolt-on solutions simply can't match:
- Deep body inspection as a first-class feature: Not an afterthought or plugin—kgateway parses OpenAI-compatible JSON bodies to extract model names, token counts, prompts, and tool calls, applying policies before forwarding. This is the foundation that makes everything else possible.
- Native protocol support, not workarounds: Full MCP federation, tool discovery, and A2A (agent-to-agent) security and routing are built into the core—no out-of-process sidecars adding latency and complexity.
- Inference-aware routing: First-class support for the official Gateway API Inference Extension to route based on GPU queue depth, memory utilization, and other signals for self-hosted models. The gateway finally understands what it's routing.
- Secure LLM egress done right: Centralized key management, provider failover, token-based rate limiting, and prompt guarding/enrichment—all in one place.
- Unified architecture: The same gateway handles ingress, mesh waypoint, and AI egress. One system to learn, deploy, and operate.
Why This Innovation Matters Now
The performance gains speak for themselves in production Kubernetes environments: dramatically higher throughput, lower tail latency, and far better CPU/memory utilization under concurrent streaming loads. Head-to-head tests against popular alternatives (including LiteLLM proxies) show significant gains in requests per second, memory footprint, and stability—especially as models get faster and agent loops tighten.
But the timing matters as much as the technology. Proxy overhead that was once swallowed by slow LLM response times now directly impacts end-user responsiveness. When agentic flows involve dozens of rapid round-trips between LLMs and tools, every millisecond of gateway latency compounds. The bolt-on approach that worked for simple LLM APIs becomes a bottleneck for real agent architectures.
This is why we couldn't just keep patching Envoy. The industry needed—and now has—a gateway that treats agentic traffic as a first-class architectural concern, not an edge case to be handled by plugins.
We've open-sourced reproducible benchmarks so teams can validate independently—transparency is key in this fast-moving space. Check them out: https://github.com/howardjohn/gateway-api-bench/tree/v2
Example: Routing to LLM Providers with Agentgateway
Here's how to set up agentgateway to route requests to OpenAI, based on the official kgateway 2.2 documentation.
First, set up the agentgateway proxy:
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: agentgateway-proxy
namespace: agentgateway-system
spec:
gatewayClassName: agentgateway-v2
listeners:
- protocol: HTTP
port: 80
name: http
allowedRoutes:
namespaces:
from: All
Store your API key as a secret:
apiVersion: v1
kind: Secret
metadata:
name: openai-secret
namespace: agentgateway-system
type: Opaque
stringData:
Authorization: $OPENAI_API_KEY
Create an AgentgatewayBackend to configure the LLM provider:
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
name: openai
namespace: agentgateway-system
spec:
ai:
provider:
openai:
model: gpt-4o
policies:
auth:
secretRef:
name: openai-secret
Route traffic to the backend with an HTTPRoute:
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: openai
namespace: agentgateway-system
spec:
parentRefs:
- name: agentgateway-proxy
namespace: agentgateway-system
rules:
- matches:
- path:
type: PathPrefix
value: /openai
backendRefs:
- name: openai
namespace: agentgateway-system
group: agentgateway.dev
kind: AgentgatewayBackend
Your agents point to the gateway with standard OpenAI-format requests:
curl "$INGRESS_GW_ADDRESS/openai" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Explain agent gateways"}]
}'
Agentgateway handles the routing, authentication, and path rewriting automatically. For MCP tool federation, A2A agent connectivity, and inference routing to self-hosted models, see the full documentation.
Closing Thoughts
While the rest of the industry patches legacy proxies to handle AI traffic, kgateway represents a fundamental rethinking of what a gateway should be in the agentic era. That's not incremental improvement—it's the kind of architectural innovation that only happens when you're willing to start from first principles.
To learn more about kgateway, check out these resources:
- Learn all about kgateway through these learning paths
- Kgateway videos - from basic info to advanced use cases
- Guide to getting started with agentgateway in Kubernetes

























%20a%20Bad%20Idea.png)











%20For%20More%20Dependable%20Humans.png)









