Independent buyer side advisory · Anthropic onlyNew York · London
Home · Blog · Token Optimization
Token Optimization

Cutting Claude spend 40 to 70 percent with routing.

Buyer side guide · 9 minute read

If you do one thing to your Claude bill this quarter, make it routing. Of every optimization lever available, sending each request to the right model instead of running everything on the most capable one is the change that moves the invoice most. The number is not marketing. Across a realistic enterprise workload, disciplined routing across Opus, Sonnet, and Haiku typically cuts aggregate spend by forty to seventy percent compared with uniform use of Opus. This is a buyer side guide to why that range is real and how to capture it.

We negotiate with Anthropic and optimize the spend beneath the contract, so this is written from what actually lands on invoices. The audience is both the engineering leader who will build the routing and the procurement leader who needs to understand why the savings are this large before committing to a spend tier.

Why uniform Opus use is so expensive

The default in most organizations is to send every request to the strongest model. It is the path of least resistance, nobody gets blamed for using the best model, and early in a project the cost is invisible. But the strongest model is also the most expensive per token, often by a wide margin, and most real traffic does not need it. When a workload runs everything on Opus, it pays the top rate for a large volume of work that a cheaper model would have handled at the same quality.

That gap between what the work needs and what it is being charged is the entire saving. The forty to seventy percent range is simply the cost of the requests that were overpaying, returned to you once each one is matched to the model that fits it. The wider end of the range belongs to workloads with a lot of simple, high volume traffic. The narrower end belongs to workloads that genuinely lean on the model's hardest reasoning.

What each model is for

Routing well starts with knowing what each tier is actually good at, so you can match work to model deliberately rather than defaulting upward.

  • Opus is for the genuinely hard work, deep reasoning, complex multi step problems, the tasks where a weaker answer carries real cost. It is worth its price exactly where quality is the binding constraint.
  • Sonnet is the workhorse for the broad middle, most drafting, analysis, summarization, structured extraction, and code assistance, at a fraction of the top rate and quality that is more than sufficient for the great majority of requests.
  • Haiku is for the high volume, low complexity work, classification, routing, simple formatting, short extractions, where speed and cost matter most and the task is well within a smaller model's reach.

The mistake is treating these as a quality ladder where higher is always safer. They are a fit problem, not a ranking. The cheapest model that clears the quality bar for a given task is the correct model for that task, and for most tasks that is not Opus.

How to build the routing

Routing does not require anything exotic. The practical approach is to classify requests by the kind of work they represent and assign each class to a model. Some classes are obvious from the endpoint or the feature they belong to. Others can be decided by a lightweight check on the input before the main call. The point is that the routing decision is cheap and the saving it unlocks is large, so even a simple rule set captures most of the benefit.

A common and effective pattern is to default the bulk of traffic to Sonnet, push the high volume simple work down to Haiku, and reserve Opus for the specific request types that demonstrably need it. Start there, measure quality on each class, and only move work up a tier where the data shows the cheaper model is genuinely falling short. The instinct to route everything to the top model out of caution is exactly the instinct that produced the bill you are trying to cut.

Guard quality with measurement, not fear

The objection to routing is always quality, and it deserves a real answer rather than a reassurance. The answer is measurement. Before you move a class of work to a cheaper model, define what good looks like for that class and test the cheaper model against it. For most classes the cheaper model passes easily, because the work was never hard enough to justify the top tier. For the few classes where it does not, you keep them on the stronger model. Routing decisions made on measured quality, not on nerves, capture the saving without the regression everyone fears.

This is also where the engineering and procurement views meet. The engineering leader owns the quality bar and the tests. The procurement leader owns the spend that the routing returns. When both can see that quality held while cost fell, the routing stops being a risk and becomes simply the correct way to run the workload.

Routing compounds with the other levers

Routing is the largest lever, but it does not work alone. Once a request is on the right model, caching its repeated context drops the input cost again, at up to ninety percent off the cached portion. Sending the latency tolerant share through batch halves it once more. The savings multiply, so a request that is routed to Sonnet, cached, and batched costs a small fraction of the same request run naively on Opus in real time. Routing is where you start because it has the biggest single effect and shapes everything downstream, but the full forty to seventy percent is routing working together with the rest.

Why this matters before you commit

There is a commercial reason to route before you negotiate a spend commitment, not after. A buyer who commits to a tier based on unoptimized, uniform Opus usage locks in a number that routing will then shrink, leaving them committed to spend they no longer have. Routing first means you forecast and commit against the optimized figure, which is often a tier lower and far safer. The lever that cuts your bill also protects your commitment, which is why it belongs at the front of any cost program.

Common objections, answered

Two objections stall most routing programs, and both deserve a direct answer. The first is that maintaining routing logic is a burden. In practice the logic is light, a classification of request types and a mapping to models, and it changes rarely. Set against a forty to seventy percent saving on a major line of spend, the maintenance cost is trivial, and the routing rules tend to stabilize quickly once the quality tests are in place.

The second objection is that a user might occasionally get a slightly weaker answer on a cheaper model. This is where measurement replaces fear. For the request classes you route down, you have already tested that the cheaper model meets the quality bar, so the risk is bounded by your own thresholds rather than left to chance. And for the small set of classes where quality genuinely matters more than cost, you keep them on the stronger model. Routing is not a blanket downgrade, it is a deliberate match of each task to the cheapest model that clears its bar, with the hardest work still going to the top tier.

Routing as a living system

Routing is not a one time project, it is a system that pays dividends as the model lineup evolves. When a new model arrives, or an existing one is repriced, the routing layer is the single place you adjust to capture the change, and every workload behind it benefits at once. A team without routing has to revisit each workload individually whenever the economics shift. A team with routing changes one mapping and the whole estate follows.

This is why routing belongs at the center of a cost program rather than at the edge. It is the control surface for your entire Claude spend, the place where a single decision, which model handles which work, propagates across everything. Build it once, measure the quality on each class, and you have not only captured today's forty to seventy percent saving but also given yourself the lever to capture tomorrow's, automatically, as the market keeps moving in the buyer's favor.

The buyer side takeaway

Model routing is the single most powerful way to cut a Claude bill, returning forty to seventy percent of aggregate spend by matching each request to the cheapest model that meets its quality bar. Build it on measured quality rather than fear, default most traffic to Sonnet with Haiku underneath and Opus reserved for the hard work, and layer caching and batch on top so the savings compound. Do it before you commit to a spend tier, and the same lever that lowers your invoice also right sizes your commitment.

Your Anthropic number is negotiable.

Get a quote for a bounded engagement. Fixed fee or gainshare, no risk to you.

Get a Quote

The Counteroffer

Weekly intelligence on Anthropic pricing moves and the buyer side counters that work.

Get a Quote · Book a Strategy Call · The Counteroffer · Blog · New York · London Not affiliated with Anthropic PBC. Independent buyer side advisory only.