It's not often the CFO and VP of Engineering end up working on the same problem together. But when your AI spend triples month-over-month and keeps climbing, finance and engineering end up in the same room pretty fast.
At the start of this year, our AI usage grew significantly and quickly. The drivers were straightforward:
- Improvements in Claude models drove heavier usage across the team
- Cursor became a standard part of the engineering workflow
- Steady OpenAI usage supported a growing number of internally hosted applications
- Internal experimentation picked up across departments
The increased usage was welcome, given the associated productivity gains. Our leadership team recognized that without AI, we'd need meaningfully more headcount to ship code at our current pace. The spend reflected real output.
But spend that wasn’t understood couldn’t be effectively managed, and we had a visibility gap that made it difficult to properly scope our AI budget.
A significant portion of our Claude usage was routed through Vertex AI. It's a capable platform, but its default reporting wasn't built to answer the operational questions we actually needed:
- Who was using what, and for which workflows?
- Per-person spend tracking (at least not easily)
- What is production-grade versus experimentation?
Our concerns extended beyond cost. Our policy is clear: sensitive or proprietary data must go through an enterprise license with a provider that doesn’t train on our inputs. These are the same challenges we help our customers solve. To address them internally, we knew we needed to use our own technology.
What We Did
We stood up a control layer using our own open source project, agentgateway, built to connect, secure, and observe AI agent and model traffic across any environment. We shut down direct access to Vertex and made agentgateway the single point of control between our users and the models they were calling.
It took less than a week.
From there:
- All requests were centralized through the gateway
- Usage and cost were tracked via Grafana, where we could see costs by person and model across multiple time scales
- We implemented weekly reviews to identify patterns and adjust
What Implementation Actually Looked Like
The timing worked in our favor. The team had just shipped a new configuration API for standalone LLM traffic management, and we jumped in as early adopters. As an added bonus, running agentgateway internally helped us identify and resolve subtle edge cases that had slipped through our CI/CD pipeline.
These are the kinds of issues that only surface in a long-lived production environment, and they highlight why “dogfooding” your own product is so important.
Our final setup exposed three providers (OpenAI, Vertex, and Fireworks.ai) across nine departments, all behind our existing Google Workspace SSO. Before this, IT was managing 100+ users across a growing list of AI providers, each with its own billing model and rate limits. This created a sprawl problem that was only going to get worse.
Now there’s one place to monitor and provision. Employees can try new models without filing a ticket. They log in and it’s available.
The entire configuration came in under 200 lines of YAML. The gateway itself runs on less than 1 vCPU and under 100MB of RAM, with no perceptible latency for users.
What Changed
Visibility came first. Within days, we could see that a small number of users and workflows were driving the bulk of spend. Unsurprisingly, usage was concentrated. But you can’t act on what you can’t see.
With clear attribution, we made targeted adjustments rather than broad cuts:
- We moved high-volume users to flat-rate plans like Claude Teams, converting unpredictable API spend into a fixed monthly line item
- We limited unbounded API access for pure experimentation workloads
- We routed appropriate workloads to more cost-efficient models, often 5–6x cheaper with no meaningful difference in output quality
The goal wasn’t fewer tokens. It was more productive tokens.
Results
- Vertex AI spend dropped to less than one-third of prior levels
- No measurable impact to developer productivity; teams kept using the tools they relied on
- Every employee now had SSO-enabled access to multiple models and providers
- Real-time visibility into AI usage across the company
More practically, we can now confidently expand AI access because we have the infrastructure to support it.
The Bigger Takeaway
AI spend doesn’t behave like SaaS spend. You’re not buying seats. You’re buying tokens per request, at rates that shift across models and providers. Usage patterns change quickly, and cost implications follow. There’s no procurement cycle to slow it down.
Our initial instinct was to treat this as a budget problem: negotiate rates, set ceilings, cut access. That would have helped at the margins. What actually worked was treating it as an infrastructure problem.
You need a control plane. One that gives you:
- Visibility across every model, tool, and workflow, not just one provider’s dashboard
- Governance that enforces policy without blocking the work your teams need to do
- Routing flexibility to direct workloads to the right model for the job
Spending less wasn’t the goal. Spending well was.
We built agentgateway because we kept seeing this problem in the field: AI usage spreading across tools and providers, no single view, and no way to enforce policy where it matters.
You need to see what’s happening before you can act.
We’re thrilled with what agentgateway has done for Solo internally, and we hope others can realize the same value in their organizations.
.jpeg)




























%20a%20Bad%20Idea.png)











%20For%20More%20Dependable%20Humans.png)








