Open Source LLM Observability: Tracing AI Calls with Agentgateway and Langfuse

February 15, 2026

Sebastian Maniak

A guide to LLM observability with agentgateway and Langfuse for tracing AI agent calls, monitoring token usage, and capturing LLM and MCP tool activity.

Your AI agents are calling LLMs hundreds of times a day. Do you know what they’re sending? What they’re spending? Whether that prompt injection guard actually fired?

This guide shows how to wire Langfuse into agentgateway to get full observability over every LLM and MCP tool call — zero application code changes required.

Why Gateway-Level Observability?

Most teams add tracing inside their application code — wrapping LLM SDK calls with Langfuse decorators or OpenTelemetry spans. This works, but it has gaps:

You only see what the app reports. If a developer forgets to instrument a call, it’s invisible.
Gateway-level policies are opaque. PII redaction, prompt injection blocking, rate limiting — these happen at the gateway. Application-level tracing can’t see them.
Multiple agents, multiple codebases. Every team has to add their own instrumentation. Different languages, different quality.

When tracing happens at the gateway layer, every request is captured automatically. Every agent, every provider, every tool call — same fidelity, same format, zero code changes.

Architecture

┌──────────────┐ ┌────────────────────────┐ ┌─────────────────┐ │ Your App / │ │ Solo AgentGateway │ │ LLM Provider │ │ AI Agent │────▶│ (Gateway API) │────▶│ (OpenAI, etc) │ │ │ │ │ │ │ └──────────────┘ └───────────┬────────────┘ └─────────────────┘ │ OTLP Traces (gRPC) │ ┌───────────▼────────────┐ │ OpenTelemetry │ │ Collector (fan-out) │ │ │ └─────┬───────────┬──────┘ │ │ OTLP HTTP OTLP gRPC │ │ ┌─────▼──┐ ┌─────▼──────────┐ │Langfuse│ │ClickHouse / │ │ UI │ │ Solo Enterprise │ │ │ │ UI (optional) │ └────────┘ └─────────────────┘

Agentgateway natively emits OpenTelemetry traces for every LLM request. A lightweight OTel Collector receives these traces and forwards them to Langfuse via OTLP HTTP. The same collector can fan-out traces to additional backends (ClickHouse, Jaeger, Datadog) simultaneously.

Quick Start: Kind Cluster + agentgateway 2.1 OSS

Don’t have a cluster? Here’s the fastest path from zero to traced LLM calls.

Create the Cluster

kind create cluster --name agentgateway

Install Gateway API CRDs

kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.4.0/standard-install.yaml

Install agentgateway 2.1 OSS

helm upgrade -i agentgateway-crds oci://cr.agentgateway.dev/helm/agentgateway-crds \ --version 2.1.0 \ --namespace agentgateway-system \ --create-namespace # Install control plane helm upgrade -i agentgateway oci://cr.agentgateway.dev/helm/agentgateway \ --version 2.1.0 \ --namespace agentgateway-system

Verify it’s running:

kubectl get pods -n agentgateway-system kubectl get gatewayclass agentgateway

Setting Up Langfuse Tracing

Step 1: Get Langfuse API Keys

Sign up at cloud.langfuse.com (free tier) or use a self-hosted instance. Go to Settings → API Keys and create a key pair.

Base64 encode your credentials:

echo -n "pk-lf-YOUR_PUBLIC_KEY:sk-lf-YOUR_SECRET_KEY" | base64

Step 2: Deploy the OTel Collector

The collector bridges agentgateway’s OTLP gRPC output to Langfuse’s OTLP HTTP input:

apiVersion: v1 kind: ConfigMap metadata: name: langfuse-otel-collector-config namespace: agentgateway-system data: config.yaml: | receivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318 exporters: otlphttp/langfuse: endpoint: http://cloud.langfuse.com/api/public/otel headers: Authorization: "Basic <YOUR_BASE64_CREDENTIALS>" retry_on_failure: enabled: true initial_interval: 5s max_interval: 30s max_elapsed_time: 300s processors: batch: send_batch_size: 1000 timeout: 5s service: pipelines: traces: receivers: [otlp] processors: [batch] exporters: [otlphttp/langfuse] --- apiVersion: apps/v1 kind: Deployment metadata: name: langfuse-otel-collector namespace: agentgateway-system labels: app: langfuse-otel-collector spec: replicas: 1 selector: matchLabels: app: langfuse-otel-collector template: metadata: labels: app: langfuse-otel-collector spec: containers: - name: otel-collector image: otel/opentelemetry-collector-contrib:0.132.1 args: ["--config=/conf/config.yaml"] ports: - containerPort: 4317 name: otlp-grpc - containerPort: 4318 name: otlp-http volumeMounts: - name: config mountPath: /conf resources: requests: cpu: 50m memory: 128Mi limits: cpu: 200m memory: 256Mi volumes: - name: config configMap: name: langfuse-otel-collector-config --- apiVersion: v1 kind: Service metadata: name: langfuse-otel-collector namespace: agentgateway-system spec: selector: app: langfuse-otel-collector ports: - name: otlp-grpc port: 4317 targetPort: 4317 - name: otlp-http port: 4318 targetPort: 4318

kubectl apply -f langfuse-collector.yaml

Step 3: Configure AgentGateway Tracing

For agentgateway OSS, enable tracing via Helm values:

# values-tracing.yaml gateway: envs: OTEL_EXPORTER_OTLP_ENDPOINT: "http://langfuse-otel-collector.agentgateway-system.svc.cluster.local:4317" OTEL_EXPORTER_OTLP_PROTOCOL: "grpc"

helm upgrade agentgateway oci://cr.agentgateway.dev/helm/agentgateway \ --version 2.1.0 \ --namespace agentgateway-system \ -f values-tracing.yaml

For Solo Enterprise for agentgateway, use the EnterpriseAgentgatewayParameters resource instead:

apiVersion: enterpriseagentgateway.solo.io/v1alpha1 kind: EnterpriseAgentgatewayParameters metadata: name: tracing namespace: agentgateway-system spec: rawConfig: config: tracing: otlpEndpoint: grpc://langfuse-otel-collector.agentgateway-system.svc.cluster.local:4317 otlpProtocol: grpc randomSampling: true fields: add: gen_ai.operation.name: '"chat"' gen_ai.system: "llm.provider" gen_ai.request.model: "llm.requestModel" gen_ai.response.model: "llm.responseModel" gen_ai.usage.prompt_tokens: "llm.inputTokens" gen_ai.usage.completion_tokens: "llm.outputTokens" gen_ai.usage.total_tokens: "llm.totalTokens" gen_ai.request.temperature: "llm.params.temperature" gen_ai.prompt: "llm.prompt" gen_ai.completion: "llm.completion"

Step 4: Create a Gateway and Route

apiVersion: gateway.networking.k8s.io/v1 kind: Gateway metadata: name: ai-gateway namespace: agentgateway-system spec: gatewayClassName: agentgateway listeners: - name: llm port: 8080 protocol: HTTPS allowedRoutes: namespaces: from: Same --- apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: openai namespace: agentgateway-system spec: parentRefs: - name: ai-gateway rules: - matches: - path: type: PathPrefix value: /openai backendRefs: - group: agentgateway.dev kind: AgentgatewayBackend name: openai --- apiVersion: agentgateway.dev/v1alpha1 kind: AgentgatewayBackend metadata: name: openai namespace: agentgateway-system spec: type: llm llm: provider: openai: authToken: secretRef: name: openai-api-key namespace: agentgateway-system

# Create the API key secret kubectl create secret generic openai-api-key \ -n agentgateway-system \ --from-literal=Authorization="Bearer $OPENAI_API_KEY" # Apply the gateway and route kubectl apply -f gateway.yaml

Step 5: Test and View Traces

# Port-forward the gateway kubectl port-forward -n agentgateway-system svc/ai-gateway 8080:8080 & # Send a test request curl -X POST http://localhost:8080/openai/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4.1-mini", "messages": [{"role": "user", "content": "Hello from AgentGateway!"}] }'

Open Langfuse → Traces. You should see a trace with model, tokens, prompt, completion, and gateway metadata.

What Gets Captured

Trace Attributes (GenAI Semantic Conventions)

Gateway Metadata

Multi-Provider Support

Agentgateway traces all providers through the same pipeline — OpenAI, Anthropic, xAI/Grok, Azure OpenAI, Google Gemini, Ollama, and any OpenAI-compatible API. Add more routes, same observability:

/openai/* → OpenAI GPT → traced to Langfuse /anthropic/* → Anthropic → traced to Langfuse /xai/* → xAI Grok → traced to Langfuse

Every provider, same trace format, same dashboard.

Fan-Out: Langfuse + Additional Backends

Send traces to multiple backends simultaneously:

exporters: otlphttp/langfuse: endpoint: http://cloud.langfuse.com/api/public/otel headers: Authorization: "Basic <CREDENTIALS>" otlp/jaeger: endpoint: jaeger-collector:4317 tls: insecure: true service: pipelines: traces: receivers: [otlp] processors: [batch] exporters: [otlphttp/langfuse, otlp/jaeger]

Common fan-out targets: Langfuse (LLM analytics) + ClickHouse (gateway metrics) + Jaeger (distributed tracing) + Datadog (enterprise monitoring).

MCP Tool Tracing

Agentgateway doesn’t just trace LLM calls — it also traces MCP (Model Context Protocol) tool interactions:

Tool discovery (tools/list) — which tools are available, how long discovery takes
Tool execution (tools/call) — parameters, results, latency
Backend MCP server performance — per-server latency and error rates

When an agent calls Slack, GitHub, or any MCP tool server through agentgateway, the full tool call chain appears in Langfuse alongside the LLM calls that triggered it.

Security Policy Visibility

When agentgateway’s security policies fire, the trace metadata includes what happened:

PII Protection — how many entities were redacted, what types (email, SSN, phone)
Prompt Injection — whether an injection was detected and blocked
Credential Leak — whether secrets were caught in the LLM response
Rate Limiting — remaining quota for the user

This means you can see not just what your agents are doing, but what guardrails are protecting them.