Agent Substrate Uses Envoy. Here's Why AI Workloads Need More.

Agent Substrate’s Networking Layer is Built on Envoy — And Why That’s Not Enough

Google chose Envoy to route traffic to actors inside Agent Substrate, and for moving requests to the right pod, it’s exactly the right tool. What it doesn’t do is understand MCP or A2A — and interpreting, acting on, and enforcing policy at the protocol level is where the story gets interesting.

In the first post in this series, we walked through how Agent Substrate works: its actor/worker scheduling model, its 30x oversubscription ratio, and the gVisor isolation layer underneath. This post goes deeper on one specific component that most coverage of Agent Substrate has missed entirely — the networking layer.

Agent Substrate’s networking component is called atenet. It is responsible for DNS resolution, traffic routing, and proxy sidecars within the substrate. And it is built on Envoy.

Why this matters: This detail does not appear in Google’s blog posts. It only surfaces when you read the source code and the system’s installation scripts. Understanding it explains both the design of Agent Substrate’s internal networking and, critically, where the Envoy model hits its limits for AI-native workloads — which is precisely where Solo’s agentgateway begins.

What atenet Does Inside Agent Substrate

Agent Substrate is not just a scheduler. When it activates an actor on a worker pod, something has to make sure that incoming traffic for that actor reaches its current pod — and that outgoing traffic from the actor can reach other actors and tools. That’s atenet’s job.

The atenet component provides three things:

DNS resolution: each actor gets a stable DNS identity (e.g. my-agent-1.actors.resources.substrate.ate.dev) that resolves regardless of which worker pod the actor is currently running on
Envoy-based routing: the atenet-router is an Envoy proxy that receives incoming requests addressed to an actor’s DNS name and forwards them to the correct worker pod in real time, updating routing state as actors move between workers during activate/deactivate cycles
Proxy sidecars: sidecar proxies injected into worker pods handle local traffic interception and forwarding

When you port-forward to the substrate and send an HTTP request with a Host header matching an actor’s identity, atenet is what routes it to the right place:

# Requests are addressed by actor identity, not pod IP
curl -X POST \
  -H "Host: my-agent-1.actors.resources.substrate.ate.dev" \
  http://localhost:8000/increment
 
# atenet-router (Envoy) receives this and forwards to
# whichever worker pod currently has my-agent-1 active

The Envoy choice here is well-suited for this use case. atenet is routing HTTP/gRPC traffic between known endpoints with dynamic host selection. Envoy is exactly the right tool for that: it has mature xDS-based dynamic configuration, handles connection pooling gracefully, and has well-understood operational characteristics on Kubernetes. Google’s own infrastructure runs heavily on Envoy.

One distinction is worth drawing sharply, because it is the crux of everything that follows. atenet does transport-level routing: it takes a request addressed to an actor’s stable identity and gets it to whatever worker pod currently hosts that actor. In the demos, that traffic arrives via kubectl port-forward to the atenet-router service, so atenet-router serves as the entry point — but it is a routing layer, not a full ingress controller, and the repo (still at v0.0.0) is silent on a production ingress story around it. What atenet does not do, wherever the traffic originates, is understand MCP or A2A semantics or apply policy on them. It moves bytes to the correct pod; it does not interpret the protocol inside those bytes. That gap — transport routing versus protocol-aware policy — is true regardless of where the proxy sits in the topology, and it is what the rest of this post is about.

The system also uses mTLS internally: Agent Substrate ships a pod-certificate-controller that auto-issues and auto-rotates short-lived X.509 TLS certificates to all system components. All communication between ate-api-server, atenet, atelet, and the state store is mTLS-secured out of the box, using Kubernetes’ Pod Certificates feature (alpha in v1.34, beta in v1.35).

Where the Envoy Model Hits Its Limits

For routing requests to the actor that should handle them, Envoy is the right choice. But Agent Substrate is also explicitly designed to host MCP servers and expose them as Actors, and to participate in A2A-based agent-to-agent workflows. Serving those protocols well requires more than moving requests to the right pod — it requires interpreting the protocol, acting on it, and enforcing policy at the MCP and A2A level. That is a different job from transport routing, and it is where the architectural picture gets more complicated. The limits below are about protocol-aware policy, not about the transport routing atenet does well.

MCP and A2A are not HTTP. Or rather: they run over HTTP as a transport, but their semantics are fundamentally different from the stateless request/response model that Envoy was designed for.

What Standard Proxies Handle Fine — And What They Don’t

It is tempting to argue that MCP defeats traditional proxies because MCP sessions are stateful. That argument is mostly wrong, and it is worth being precise about why — because the real limitation is more specific and more defensible.

MCP sessions are stateful: a client initializes a session with an MCP server and that session carries context across multiple requests, so requests in the same session must reach the same backend. But routing on session identity is not a new problem for proxies. MCP carries its session token in an HTTP response header, Mcp-Session-Id, which the client echoes back on every subsequent request. Pinning traffic to a backend based on an HTTP header is one of the most basic things a proxy does — Envoy supports it natively through header-based hash policies, and load balancers have offered sticky sessions for decades. No protocol parser is required, because the session ID lives in a standard transport header, not buried inside the JSON-RPC payload. If MCP had put the session identifier inside the protocol body, a proxy would need to parse and track MCP protocol state to route correctly, and that would be a genuine argument for a purpose-built data plane. It didn’t, so session affinity, on its own, is not that argument.

The first genuine limitation is bidirectional, server-initiated messaging. In standard request/response HTTP, only the client initiates; the server answers. MCP’s Streamable HTTP transport involves the server pushing notifications and messages to the client over a long-lived stream. Routing server-to-client traffic back to the correct originating client connection is not something the request/response proxy model handles natively — it requires the proxy to maintain a model of which client stream maps to which backend session and to move messages in both directions. This is awkward for an architecture built around discrete request handling, and it is a real constraint, not a configuration detail.

What actually matters: Strip away session affinity, which any competent proxy handles, and two real limitations remain. Bidirectional server-initiated streaming is awkward for the request/response model. And — the durable issue — a generic proxy has no understanding of what is inside an MCP message, so it cannot authorize at the tool level, aggregate tools across backends, or apply guardrails. The first is a transport constraint; the second is the one that actually justifies a purpose-built data plane.

The Semantic Gap

Beyond the transport constraints, there is a deeper and more durable issue. atenet routes a request to the right actor — it moves bytes to the correct pod. But it does not look inside those bytes. Serving MCP and A2A well requires acting on what the protocol actually says: which tool is being called, by whom, with what parameters, and whether that is allowed. That is protocol-aware policy, and it is a fundamentally different job from routing. Envoy, as the engine underneath atenet, has no understanding of what is inside MCP or A2A messages.

In a service mesh context, Envoy’s power comes from its ability to inspect traffic and make intelligent decisions based on what it sees: route based on headers, enforce policies based on request paths, collect telemetry keyed to specific operations. For HTTP and gRPC, Envoy has deep protocol awareness that makes all of this possible.

For MCP traffic, an Envoy proxy sees a stream of JSON-RPC messages over HTTP. It knows the HTTP headers. It does not know which tool is being called, what parameters are being passed, whether the tool name matches an approved list, or whether the response contains sensitive data. It cannot enforce tool-level access control, apply policy on agent-to-LLM calls, detect prompt injection attempts in tool descriptions, or produce observability keyed to specific agent-tool interactions. Every one of these requires interpreting and acting on the protocol itself — exactly what transport routing does not do, and exactly the capability that has driven agentgateway’s adoption.

This is not a criticism of Envoy — it was never designed for this use case. There is actually an active design proposal in the Envoy project to add native MCP and A2A protocol support, which would require building a streaming JSON-RPC parser from scratch, because the protocol’s method names are embedded in the payload rather than mapped to HTTP semantics. The proposal is real work, and it is still a proposal.

Why agentgateway Is a New Data Plane, Not an Envoy Extension

This gap is exactly why Solo.io built agentgateway from scratch in Rust rather than extending kgateway (Solo’s Envoy-based Kubernetes gateway) or contributing AI extensions to Envoy directly.

The decision deserves explanation, because on the surface it might seem redundant — Solo already has deep Envoy expertise and a mature Envoy-based gateway in kgateway. Why build something new?

The Fundamental Architecture Mismatch

Envoy’s filter chain architecture is designed for processing discrete requests or streams. Adding MCP support to Envoy is not just a matter of writing a new filter — it requires rethinking the fundamental connection model, because MCP’s session semantics do not map onto Envoy’s request processing model.

To handle MCP correctly, a proxy needs to:

Maintain session state across multiple client connections (clients may disconnect and reconnect)
Multiplex multiple logical agent sessions over physical connections
Demultiplex server-initiated messages back to the correct client
Parse and understand JSON-RPC payloads, including extracting method names and parameters that are buried in the message body
Enforce fine-grained policies at the tool call level, not just the request level

Building all of this as Envoy extensions would produce something structurally different from Envoy — you would essentially be writing a new proxy inside Envoy’s process. Rust, with its memory safety guarantees and async runtime (Tokio), provides a more natural foundation for the session-management complexity that MCP and A2A require, without the overhead of working around Envoy’s request-centric architecture.

The result: agentgateway is a purpose-built Rust data plane that handles MCP and A2A natively — plus HTTP, gRPC, and LLM traffic. It is the next-generation gateway that extends kgateway’s capabilities to cover the full AI-native protocol stack, rather than an incremental extension of the Envoy model.

What agentgateway Adds

When agentgateway sits at the edge of an Agent Substrate deployment, it provides capabilities that atenet cannot:

Virtual MCP tool endpoints: aggregate tools from multiple backend MCP servers behind a single discoverable endpoint, with version management and health failover
Session-aware routing: maintain MCP session affinity correctly across client reconnects and backend rebalancing
Tool-level access control: authorize at the granularity of individual tool calls, combining agent identity, user identity, and tool semantics
Semantic guardrails: inspect tool descriptions and call parameters to detect prompt injection, tool poisoning, and unauthorized data access
LLM traffic management: token-based rate limiting, model failover, semantic caching, and prompt enrichment for agent-to-LLM calls
Agent-native observability: distributed tracing keyed to agent sessions, tool calls, and LLM interactions using semantic OpenTelemetry, not just HTTP request metrics

The Complete Networking Architecture

Understanding how atenet, Istio ambient mesh, and agentgateway fit together clarifies why each layer exists and what it is responsible for. They are not redundant — each operates at a different layer of the networking stack.

Istio Ambient Mesh: The Zero-Trust Foundation

Istio ambient mesh (via ztunnel) operates beneath both atenet and agentgateway, providing the cryptographic identity layer for all pod-to-pod traffic. In an Agent Substrate deployment, this means:

Every actor pod and worker pod has a cryptographically verifiable SPIFFE identity
All traffic between pods is mTLS-secured transparently, without requiring library changes or sidecars
L4 authorization policies can enforce deny-all defaults, allowing only explicitly permitted communication paths

This zero-trust foundation is what makes it safe to run multi-tenant agent workloads in the same cluster. Even if an agent pod is compromised, it cannot communicate with unauthorized services — the ztunnel layer enforces network policy at the kernel level.

agentgateway: The AI-Protocol Edge

agentgateway operates above the zero-trust foundation, adding protocol-aware intelligence to the traffic that ztunnel secures. Where ztunnel enforces “is this pod allowed to talk to that pod?”, agentgateway enforces “is this agent allowed to call this tool with these parameters?” — a fundamentally different and complementary level of control.

In practice, this means agentgateway handles:

North-south MCP traffic: external clients connecting to MCP servers running as Substrate Actors
East-west A2A traffic: agent-to-agent calls within a deployment, with routing based on Agent Cards published at runtime
LLM egress: agent calls out to external language models, with token tracking, guardrails, and failover

atenet: The Substrate-Internal Layer

atenet remains responsible for what it was always designed to do: route HTTP requests to the correct actor within the substrate, regardless of which worker pod the actor currently occupies. It is not involved in AI-protocol concerns. Its Envoy foundation is well-matched to this job.

The key insight is that these three layers address three different problems at three different abstraction levels. Deploying all three gives you a complete networking stack for production Agent Substrate workloads: compute isolation (gVisor), substrate-internal routing (atenet/Envoy), zero-trust identity (Istio ambient), and AI-native protocol governance (agentgateway).

What the Envoy Community Is Doing About This

It is worth noting that the Envoy project itself has recognized the gap. In early 2026, a design proposal was submitted to the Envoy repository proposing native A2A and MCP support. The proposal is detailed and technically serious — it describes building a streaming JSON-RPC parser, a shared library for A2A and MCP payload extraction, and protocol-specific layers for each.

The proposal is still in the design stage. Even if it ships, it will address the stateful session and semantic parsing problems incrementally within Envoy’s existing filter chain model. The question of whether Envoy’s architecture is the right foundation for a protocol as session-heavy as MCP remains open in the community.

agentgateway’s Rust-native approach makes a different architectural bet: that a purpose-built data plane, unconstrained by Envoy’s request-centric model, can handle AI-native protocols more cleanly and perform better at scale. The two approaches will coexist — Envoy AI extensions for teams already deeply invested in the Envoy ecosystem, purpose-built gateways like agentgateway for teams building AI-first infrastructure.

The Practical Implication for Agent Substrate Deployments

If you are deploying Agent Substrate in production, the atenet layer handles everything inside the substrate automatically. You do not configure or interact with atenet directly — it is part of the substrate’s system installation.

What you do need to configure explicitly is the layer above it. For a production deployment that exposes MCP tools or participates in A2A agent workflows, you need a gateway that understands those protocols. Connecting agentgateway to an Agent Substrate deployment involves three steps:

# 1. Install agentgateway in your cluster
helm install agentgateway oci://ghcr.io/agentgateway/charts/agentgateway \
  --namespace agentgateway-system \
  --create-namespace
 
# 2. Configure an MCP listener pointing at your Substrate actors
# (Substrate actors expose MCP on a stable DNS name via atenet)
kubectl apply -f - <<EOF
apiVersion: gateway.agentgateway.dev/v1alpha1
kind: MCPRoute
metadata:
  name: substrate-tools
spec:
  parentRefs:
    - name: agentgateway
  backends:
    - target:
        host: my-mcp-actor.actors.resources.substrate.ate.dev
        port: 8080
EOF
 
# 3. Apply tool-level access policy
kubectl apply -f - <<EOF
apiVersion: gateway.agentgateway.dev/v1alpha1
kind: MCPPolicy
metadata:
  name: substrate-tool-policy
spec:
  targetRef:
    name: substrate-tools
  rules:
    - tools: ["read_file", "list_directory"]
      principals: ["agent/researcher"]
    - tools: ["write_file", "execute_command"]
      principals: ["agent/executor"]
      requireApproval: true
EOF

With this configuration, agentgateway exposes your Substrate-hosted MCP tools to external agents with authentication, tool-level authorization, and full observability — while atenet continues to handle the internal routing that keeps actors reachable as they move between worker pods.

The Bottom Line

Agent Substrate’s use of Envoy for internal networking (atenet) is the right call for the job it is doing: routing HTTP traffic to dynamically assigned actor pods with low latency and operational simplicity. Envoy’s maturity, xDS-based dynamic configuration, and proven performance at scale make it well-suited for this role.

But Envoy’s architecture was designed before MCP and A2A existed, and its stateless request/response model is architecturally mismatched with the stateful, session-heavy, bidirectional nature of AI-native protocols. The gap is real, recognized even within the Envoy project, and not trivially bridged with extensions.

agentgateway fills this gap as a purpose-built Rust data plane — one that extends Solo’s gateway capabilities beyond HTTP and gRPC to cover the full AI-native protocol stack. Deployed alongside Agent Substrate, it provides the protocol-aware, semantically intelligent networking layer that production agent deployments require.

In the next post in this series, we turn this architecture into a working tutorial: deploying MCP servers as Substrate Actors, exposing them through agentgateway, and applying real access policies — end to end.

This Post in the Agent Substrate Series

1. How Google Agent Substrate Works: 250 Agents, 8 Pods

2. Agent Substrate’s Networking Layer is Built on Envoy — And Why That’s Not Enough ← You are here

About Solo.io

Solo.io is reimagining infrastructure for cloud and AI, uniting secure, seamless cloud connectivity with AI-ready, agentic infrastructure. Solo’s open-source projects — including agentgateway, kagent, kgateway, and contributions to Istio and Envoy — are used by Fortune 2000 enterprises running AI workloads on Kubernetes.

Resources referenced in this post:

Agent Substrate repo — github.com/agent-substrate/substrate

Envoy A2A/MCP design proposal — github.com/envoyproxy/envoy/issues/43268

agentgateway — agentgateway.dev

Agent Mesh for Enterprise Agents — solo.io/blog/agent-mesh-for-enterprise-agents

Istio Ambient Mesh — istio.io/latest/docs/ambient

‍

What atenet Does Inside Agent Substrate

Where the Envoy Model Hits Its Limits

What Standard Proxies Handle Fine — And What They Don’t

The Semantic Gap

Why agentgateway Is a New Data Plane, Not an Envoy Extension

The Fundamental Architecture Mismatch

What agentgateway Adds

The Complete Networking Architecture

Istio Ambient Mesh: The Zero-Trust Foundation

agentgateway: The AI-Protocol Edge

atenet: The Substrate-Internal Layer

What the Envoy Community Is Doing About This

The Practical Implication for Agent Substrate Deployments

The Bottom Line

Featured content

The Role of Virtual MCP in Managing LLM Costs

What 'is' Agent Identity? Human? Workload? A new Layer?

Interview with James Quigley on Istio Ambient at KCD NY

kagent <3 Agent Substrate: A 101 installation & Configuration Guide

Solo Enterprise for Istio 1.30: Agentic Mesh, ztunnel-Native Egress, New UI, and Fine-Grained Workload Identity

Agentgateway Code Mode for OpenAPI to MCP

From Service Mesh to Agentic Mesh

Keeping Context and Tokens Low With Progressive Disclosure In Agentgateway

MCP Progressive Disclosure: Save Tokens, Retrieve Schemas

Five Minutes to Your First MCP Server Tool: A Quickstart with agentgateway

Agentic Quality Benchmarking With Agentevals

The AppMesh Migration Playbook

Solo Enterprise for Istio 1.29: ECS Now GA, Enhanced Debuggability, and Flexible Global Service Aliasing

Your First AI Route: Connecting to OpenAI with AgentGateway

Getting started with Multi-LLM provider routing

What Comes After Ingress NGINX? A Migration Guide to Gateway API

Why Traditional Gateways Failed AI Workloads - and How Kgateway's Rust-powered Agentgateway Fixes It

Context-aware Security for Agentic AI Gateways

Kgateway: The Best Alternative for Ingress NGINX

The Linux Foundation’s new Agentic AI Foundation and Secure MCP Infrastructure

Security Holes in MCP Servers and How To Plug Them

Announcing Gloo Mesh Support for Amazon ECS

Gloo Mesh 2.11: Expands Support to Amazon ECS and Brings Multi-Tenant Flexibility to Enterprises.

Reducing the costs and complexity of your cloud native architecture with Ambient Mesh

Introducing Solo Enterprise for agentgateway

Introducing Gloo Gateway 2.0

Ambient mesh deployments made easy with Gloo Operator

Choosing between installation methods in Gloo Mesh: Helm vs. the Gloo Operator

How ambient mesh challenges the security gaps in sidecar workloads

Migrating from sidecars to ambient with zero downtime

Comparing Istio's ambient multicluster support with Gloo Mesh's multicluster peering

The future of Kubernetes is context-aware: Meet Solo Enterprise for kagent

kgateway as Ingress for Ambient Service Mesh

Tracing GenAI Applications Is Not Enough

Gloo Mesh 2.10: More Secure, Scalable Cloud Connectivity

MCP Authorization is a Non-Starter for Enterprise

Securing and Observing Your Services, Simplified

From MCP Servers to Services: Introducing kmcp for Enterprise-Grade MCP Development

The Power of a Single API to Secure, Observe, and Control Traffic in All Directions

Why Building Large Kubernetes Clusters Is (Still) a Bad Idea

Fortifying Your Cloud Native Connectivity Security Posture with Solo and Ambient Mesh

Migrating from Sidecars to Ambient Mesh - Risks, Challenges, and Benefits

Overhaul of Agent Gateway supporting A2A, MCP, and Kubernetes Gateway API

How Ambient Mesh Delivers Advanced Resource and Cost Savings

Getting Started with Ambient Mesh: From 0 to 100 mph

Agent Discovery, Naming, and Resolution - the Missing Pieces to A2A

Part Two: MCP Authorization The Hard Way

Part One: MCP Authorization The Hard Way

Agent Identity and Access Management - Can SPIFFE Work?

Deep Dive into llm-d and Distributed Inference

Motive

Confluent

Ingenico

OfferUp

ParkMobile

Vonage

Domino’s Pizza

Introducing Solo Enterprise for agentgateway

Comparing Sidecars with Sidecarless Mesh Implementation

Solo Enterprise for Istio Feature Comparison

Enterprise Support for Istio in Production

Service Mesh for Developers, Part 1: Exploring the Power of Observability and OpenTelemetry

Service Mesh at Scale

Compare Capabilities of the Top Service Mesh Platforms

Compare Capabilities of the Top API Gateways