Independent buyer side advisory · Anthropic onlyNew York · London
Home · Blog · AI Cost Governance
AI Cost Governance

FinOps for tokens: governing Claude spend.

Buyer side guide · 11 minute read

Token spend has quietly become one of the fastest growing line items in many technology budgets, and most organizations are governing it with none of the discipline they apply to cloud. The pattern is familiar to anyone who lived through the early years of cloud cost: a powerful, usage based service gets adopted quickly by engineering teams, spend climbs faster than anyone planned, and finance discovers a large and rising bill that nobody can fully explain. The answer for cloud was FinOps, a practice that brought visibility, accountability, and continuous optimization to variable spend without slowing teams down. Claude spend needs the same practice, adapted to tokens, and the organizations that build it early avoid the runaway bills and the weak negotiating positions that catch everyone else.

Why token spend behaves like cloud spend

The reason FinOps transfers so well is that token spend shares the structural features that made cloud spend hard to govern. It is consumption based, so the bill is a function of usage rather than a fixed license, and usage is driven by engineering decisions made far from the finance team. It is decentralized, because many teams can call the API independently, each making choices that move the cost. It is opaque by default, since a single invoice total tells you almost nothing about which workload, team, or feature drove it. And it is highly variable, swinging with traffic, launches, and the behavior of the applications themselves. Every one of these is exactly the condition that made cloud cost ungovernable without a deliberate practice, and tokens reproduce them.

There is one feature that makes token spend even harder than cloud. The same workload can cost wildly different amounts depending on choices that are invisible on the invoice: which model serves the request, how much context is sent, whether that context is cached, whether the work runs in real time or batch, and how long the output is allowed to run. A team can cut the cost of a workload by half or more without changing what the user sees, purely through these engineering choices. That means token FinOps is not only about visibility and accountability, it is about a continuous optimization loop that cloud FinOps only partly needed, because the levers are richer and the savings larger.

The first pillar: visibility

Nothing can be governed that cannot be seen, and the foundational work of token FinOps is making the spend visible at a useful granularity. A single monthly total from the provider is not visibility, it is a symptom. Useful visibility means knowing spend broken down by team, by application or feature, by model, and ideally by the kind of request, so that you can answer the questions that matter: which workload is the largest, which is growing fastest, where the spend concentrates, and which model is consuming the budget. Without this, every optimization is a guess and every budget conversation is a fight over a number nobody understands.

Building visibility means instrumenting the calls so that each one carries enough metadata to attribute it later, and aggregating that into a view that finance and engineering can both read. The detail to capture is straightforward: which team or service made the call, which application or feature it belongs to, which model was used, and the input, output, and cached token counts that drive the cost. With that in place, the opaque invoice becomes a map, and the map is what turns a runaway bill into a set of specific, addressable workloads. Visibility is the prerequisite for everything else, which is why it is the first pillar and the place every token FinOps effort should start.

The second pillar: accountability

Visibility without accountability changes nothing, because seeing the spend does not reduce it unless someone owns it. The second pillar is putting the cost in front of the people who can actually affect it, which means the engineering teams making the model and architecture choices, not just the finance team that receives the invoice. When a team can see its own token spend, understands what drives it, and is accountable for it against a budget or a unit economic target, behavior changes, because the people closest to the levers finally have a reason and the information to pull them.

The mechanism for this is showback or chargeback. Showback means each team sees its share of the spend without being formally billed for it, which creates awareness and gentle pressure. Chargeback goes further and allocates the actual cost back to the team's budget, which creates real ownership. Either can work, and the right choice depends on the organization's culture and maturity, but the principle is the same: the cost has to land with the team that controls it. A central finance team trying to govern token spend it cannot see the drivers of is fighting an unwinnable battle. Distribute the visibility and the accountability to the teams, and the spend starts to govern itself.

The third pillar: optimization

The third pillar is where token FinOps pays for itself, because the optimization levers on Claude spend are unusually powerful. The largest single lever is model routing. Using the most capable and most expensive model for every request is the most common and most expensive default, and routing each request to the cheapest model that handles it well, reserving Opus for the work that truly needs it, sending the bulk to Sonnet, and pushing the simplest classification and extraction work to Haiku, typically cuts aggregate spend by a large fraction on its own. The second lever is prompt caching, which on shared, repeated context can reduce the cost of that portion by up to ninety percent, transforming the economics of any workload that reuses a large system prompt or document. The third is batch processing, which runs asynchronous work at roughly half the cost of real time calls.

Together, routing, caching, and batch typically cut aggregate Claude spend forty to seventy percent versus uniform use of the most expensive model in real time. That is the headline number, and it is why optimization is a pillar of token FinOps rather than an afterthought. But optimization is not a one time project, it is a continuous loop, because applications change, traffic shifts, and new workloads appear constantly. The FinOps practice keeps the loop running: visibility surfaces the workloads worth optimizing, accountability puts them in front of the teams that can act, and the teams apply the levers and measure the result. Done continuously, this keeps the bill efficient as the organization grows rather than letting it drift back toward waste.

How FinOps strengthens the negotiation

There is a commercial payoff to token FinOps that is easy to miss, which is that it directly strengthens your position with Anthropic. A buyer who has visibility into spend, accountability across teams, and an optimized consumption base walks into a commitment negotiation knowing exactly what they use and what they need, and that knowledge is leverage. The commitment can be sized against the efficient bill rather than the wasteful one, which means committing to a smaller and more honest number, and the buyer can forecast with confidence rather than guessing. A vendor negotiating against a buyer who does not understand their own usage holds the information advantage. FinOps takes that advantage back.

The sequencing matters: optimize before you commit. A buyer who signs a large commitment based on current, unoptimized usage locks in a forecast far higher than necessary and carries that cost for the whole term. The same buyer who runs the FinOps optimization first, cutting the bill forty to seventy percent, commits to the lower efficient number and saves the difference twice over, once in the reduced consumption and again in the smaller commitment. This is why governance and negotiation are not separate disciplines. The FinOps practice that governs your spend day to day is the same practice that gives you the smallest honest forecast to negotiate against, and the smallest honest forecast is the strongest negotiating position there is.

Where to start

Building token FinOps does not require a large program on day one. It requires starting the loop: instrument the calls so you can see spend by team, application, and model; put that visibility in front of the teams that drive the cost; and apply the routing, caching, and batch levers to the largest workloads first, measuring the result. From there the practice compounds, because each cycle reveals the next workload worth optimizing and each team that takes ownership reduces the central burden. The organizations that govern token spend well are not the ones with the most sophisticated tools, they are the ones that started the loop early and kept it running. Our token optimization playbook lays out the levers in detail, with the math to size their impact, so you can begin the optimization pillar immediately and bring an efficient, well understood bill to your next negotiation.

Govern the bill before it governs you.

Download the token optimization playbook and see the exact levers we pull to cut aggregate Claude spend 40 to 70 percent.

Download the Playbook

The Counteroffer

Weekly intelligence on Anthropic pricing moves and the buyer side counters that work.

Get a Quote · Book a Strategy Call · The Counteroffer · New York · London Not affiliated with Anthropic PBC. Independent buyer side advisory only.