When a Claude bill starts climbing, the instinct inside most enterprises is to negotiate a larger committed spend. A bigger commit earns a deeper discount band, and a deeper discount feels like the obvious answer to a rising invoice. It is the move an account team will happily encourage, because it locks in revenue and it frames the conversation around volume rather than efficiency. But it is usually the wrong first move, and on a real workload it leaves money on the table that model routing would have captured for free.
We sit on the buyer side of these deals, and the pattern repeats. A company looks at six months of usage, projects the trend forward, and reaches for a commit that matches the projection. What almost never happens first is the simpler question: how much of that spend is going to the most expensive model when a cheaper one would do the same job. Until you answer that, you are committing to a number that is inflated by inefficiency, and no discount band makes an inflated number a good deal.
The math is worth sitting with, because it is the heart of the argument. Suppose your workload runs almost entirely on Opus, the top tier model, and your annual run rate lands around two million dollars. An account team offers you a commit at that level with a discount of, say, twenty percent off list. That sounds like four hundred thousand dollars saved, and on paper it is.
Now route that same workload properly. The majority of real traffic, classification, extraction, short summarization, routing, simple formatting, does not need Opus at all. Sonnet handles most of it at a fraction of the price, and Haiku takes the high volume, low complexity work for less again. Across a realistic mix, disciplined routing across Opus, Sonnet, and Haiku typically cuts aggregate spend by forty to seventy percent compared with uniform use of the top model. The two million dollar run rate was never two million dollars of necessary work. It was perhaps eight hundred thousand dollars of necessary work wearing a two million dollar price tag.
Commit to the two million and take the twenty percent, and you have locked yourself into paying one and a half million dollars for something that should cost well under a million. The discount is real, but it is a discount on a baseline that was wrong. Routing first, then committing, means you commit to the real number and still take whatever discount the smaller commit earns. The discount band matters far less than the baseline it applies to.
A larger commit is attractive for reasons that have nothing to do with whether it is the right answer. It is fast, it requires no engineering work, and it produces a headline saving you can show a finance committee. Routing, by contrast, is engineering work. It means classifying traffic, choosing a model per request type, testing that the cheaper model holds the quality bar, and building the plumbing to send each call to the right place. That is real effort, and effort always looks more expensive than a signature.
But the signature is the more expensive choice over the term. A multi year commit at an inflated baseline compounds the error across the whole contract, and it removes your incentive to fix the inefficiency, because you have already paid for the tokens whether you use them well or not. Unused commitment on Anthropic generally does not roll over and does not refund, so an oversized commit is not a safety margin, it is a sunk cost waiting to happen. The bigger commit feels like protection and behaves like a trap.
Routing does not just lower the number. It changes the shape of the negotiation. When you arrive at the table having already optimized, three things shift in your favor.
First, your commit is credible. An account team can tell the difference between a number pulled from a naive projection and a number built from an optimized, instrumented workload. The optimized number is defensible, and a defensible number is far harder to talk you up from.
Second, your leverage improves. A buyer who can run efficiently is a buyer who can walk, scale down, or wait. The dependence that comes from an unoptimized, ever climbing bill is exactly the dependence that weakens you in a renewal. Routing reduces that dependence, and reduced dependence is leverage.
Third, your overage exposure shrinks. When the baseline is lean, the gap between your commit and your real usage is small, which means a usage spike is less likely to blow past the commit into overage at a worse rate. An inflated commit invites both kinds of waste at once: unused commitment if you come in under, and poorly priced overage if a routing inefficiency pushes you over.
The sequence we recommend to clients is simple and it is almost always worth the few weeks it takes. Optimize first, commit second.
Begin by ranking your workloads by total cost, which is cost per inference multiplied by volume. The top of that list is where routing returns the most, because a single change to the most expensive workload moves the bill more than a dozen changes to cheap ones. Move that workload off uniform Opus and onto the cheapest model that clears its quality bar. Layer prompt caching on the repeated context, which bills cached reads at up to ninety percent off, and move any latency tolerant work into batch at roughly half price. Within a few weeks you have a new, lower, instrumented run rate.
Only then do you size the commit. You commit to the optimized baseline, with a sensible buffer for growth, and you negotiate the discount band against that honest number. You also negotiate the terms that protect the optimized position: overage at the committed rate rather than at list, price protection across the term, and treatment of unused commitment that does not punish you for being efficient. That is a deal built on the real workload rather than on the inefficiency you happened to be carrying the day the account team called.
None of this means commitments are bad. A commit is the correct instrument once the baseline is real. If your optimized workload is large and growing predictably, a committed spend at a deep band is exactly how you should buy, because you are buying volume you will genuinely consume at a rate you could not get on demand. The argument is not against commitments. It is against committing before you have done the work that tells you what to commit to.
There is also a timing case. If you are confident the optimization will take months you do not have, and a price increase is looming, a shorter commit at the current baseline with strong price protection can be a reasonable bridge, provided you build in the right to reforecast. But that is a deliberate, eyes open trade, not the default. The default should be to lower the bill first and commit to what remains.
A bigger commit is the easy answer, and that is exactly why it is so often the wrong one. It rewards the account team, it requires nothing of your engineers, and it produces a clean number for the finance deck, all while locking you into a baseline that model routing would have cut by forty to seventy percent. The discount band is the smaller variable. The baseline is the larger one. Fix the baseline with routing, caching, and batch, and then commit to the number that is actually true. You will sign a smaller deal, take a discount on an honest figure, and keep the leverage that an oversized commit quietly hands away.
If you want a second set of eyes before you sign, that is the work we do. We map the routing opportunity against your real Claude workload, tell you what the optimized baseline looks like, and then sit between you and Anthropic to negotiate the commit around that number. You can see how we read Anthropic pricing first, or book a call and we will start with your usage data.
Book a strategy call and we will map the routing opportunity against your real Claude workload before you size a single commit.
Book a Strategy CallWeekly intelligence on Anthropic pricing moves and the buyer side counters that work.