Independent buyer side advisory · Anthropic onlyNew York · London
Model Selection

Reserving Opus for the work that needs it.

Opus is the most capable Claude model and the most expensive. Running everything on it is the most common source of token waste we find. The discipline that saves the most money is reserving Opus for the work that truly needs it, and routing the rest elsewhere.

Buyer side analysis · 11 min read
34%
Average reduction in Claude spend
$40M+
Anthropic commitments advised
100%
Anthropic focus, no other vendor

Most teams that build on Claude pick a model early, usually the strongest one available, and then never revisit the choice. Opus becomes the default, the safe answer, the model nobody gets criticized for choosing. The application works, quality is high, and the decision feels settled. Then the invoice arrives and keeps climbing, and someone asks why the bill is so large for what the product actually does. The answer, nine times out of ten, is that Opus is running every request, including the ones a much cheaper model would have handled identically. Reserving Opus for the work that needs it is the single highest leverage change most teams can make to their Claude spend, and it requires no loss of quality where quality matters. It requires only that you stop paying for the most capable model on work that never needed it.

What Opus is actually for

Opus is the model you reach for when the task is genuinely hard: deep multi step reasoning, complex analysis with many interacting constraints, nuanced judgment where a weaker model would visibly stumble, work where the cost of a wrong answer is high and the value of the best possible answer justifies the premium. On that class of work, Opus earns its price, because the quality difference is real and it shows up in the output. The mistake is not using Opus. The mistake is using Opus everywhere, including on the large volume of routine work that fills most production applications: classification, extraction, formatting, simple summarization, short factual responses, routing decisions, and the countless small calls that make up the majority of traffic in a mature system. None of that needs the most capable model, and paying Opus rates for it is pure waste.

The three model lineup and how to think about it

Claude offers a tiered lineup, and each tier exists for a reason. Opus is the most capable and most expensive, built for the hardest work. Sonnet sits in the middle, strong enough for the large majority of production workloads at a meaningfully lower price, and for most teams it is the workhorse that should carry the bulk of traffic. Haiku is the fastest and cheapest, ideal for high volume, latency sensitive, or simple tasks where speed and cost matter more than maximum reasoning depth. The error that drives overspending is treating these as a quality ladder where you should always climb as high as the budget allows. They are not a ladder. They are a set of tools matched to different jobs, and the goal is to run each request on the cheapest model that clears the quality bar for that specific task. Done across a real workload, this routing discipline typically cuts aggregate spend by forty to seventy percent against uniform Opus use, with no degradation on the work that mattered.

How to find what actually needs Opus

The way to reserve Opus correctly is not to guess, it is to measure. Start by segmenting your traffic into task types, because an application rarely does one thing. Pull a representative sample of each type and test it on Sonnet and Haiku alongside Opus, scoring the outputs against the quality bar that task actually requires. You will usually find that a large share of your volume passes on Sonnet with no meaningful difference a user would notice, and a further share passes on Haiku. What remains, the genuinely hard tasks where the cheaper models visibly fall short, is your real Opus workload, and it is almost always far smaller than your current Opus usage. The exercise is empirical, not ideological. You are not trying to use the cheapest model everywhere, you are trying to find the cheapest model that holds quality for each job, and let the evidence decide.

Build the routing into the system, not the habit

Reserving Opus only sticks if the routing is built into the application rather than left to individual judgment. Teams that rely on developers choosing the right model per call drift back to the default, because under deadline pressure the safe choice is always the strongest model. The durable answer is a routing layer that classifies each request and sends it to the appropriate model automatically, so the cheap path is the default path and Opus is invoked only when the task profile calls for it. This turns model selection from a habit that decays into a property of the system that holds. It also gives you a control point: when a new model is released or pricing shifts, you adjust the routing rules in one place rather than chasing changes through the codebase.

Why this matters before you ever negotiate

Model selection is not only an engineering decision, it is a commercial one, because it directly shapes the baseline you negotiate from. If your usage is inflated by Opus running work that never needed it, every commitment you make to Anthropic is sized against waste, and you commit to and pay for tokens you should never have consumed. Optimizing model routing before you commit means you negotiate from a leaner, truer baseline, and the commit band you land in reflects real demand rather than inefficiency. The buyers who get the best deals are the ones who arrive having already done this work, because they are not asking Anthropic to discount waste, they are committing to optimized consumption and negotiating the rate on that. Reserving Opus for the work that needs it is where token optimization begins, and it pays twice: once on the invoice today, and again on the contract you sign tomorrow.

The objections you will hear, and the answers

Reserving Opus meets resistance inside engineering teams, and the objections are worth taking seriously because each has a real answer. The first is risk: we cannot afford a quality regression, so we keep the strongest model everywhere. The answer is that reserving Opus is not a blind downgrade, it is an evidence based routing decision in which each task moves to a cheaper model only after testing proves quality holds for that task. You are not gambling, you are measuring, and the tasks that genuinely need Opus stay on Opus. The second objection is effort: routing is work, and the team is busy shipping features. The answer is that the saving is usually large enough to fund the effort many times over, and once a routing layer exists it pays continuously with little ongoing cost. The third is the most human: nobody gets blamed for choosing the best model, so the default is safe. The answer is governance, making cost a visible, owned metric so that overspending on Opus is no longer invisible and the safe default is the right one rather than the expensive one.

A fourth objection is subtler and deserves a careful answer: what if a cheaper model is fine today but a future change in our inputs makes it insufficient? This is a real risk, and the answer is monitoring rather than avoidance. Route to the cheapest sufficient model, instrument quality in production against the same bar you tested with, and you will catch drift before users do, escalating the affected tasks back to a stronger model if and when the evidence calls for it. Staying on Opus everywhere to avoid this risk is paying a large permanent premium to insure against a problem that monitoring handles for a fraction of the cost.

What the savings look like across a real workload

The reason reserving Opus matters so much is the shape of real traffic. In most production applications, the genuinely hard tasks that need the strongest model are a minority of volume, while the bulk is routine work that a mid tier or fast model handles identically. When you run everything on Opus, you pay the top rate on all of it, including the large majority that never needed it. When you route, the top rate applies only to the small core that earns it, and the rest runs far cheaper. Because the cheap majority is so much larger than the expensive minority, the blended cost falls dramatically, which is why disciplined model routing typically cuts aggregate spend by forty to seventy percent against uniform Opus use. The saving is not a rounding adjustment, it is often the single largest line item improvement available, and it comes with no quality loss on the work that mattered because that work stayed on the model it needed.

Your Anthropic number is negotiable.

Get a quote for a bounded engagement. Fixed fee or gainshare, no risk to you.

Get a Quote

The Counteroffer

Weekly intelligence on Anthropic pricing moves and the buyer side counters that work.

Get a Quote · Book a Strategy Call · The Counteroffer · Blog · New York · London Not affiliated with Anthropic PBC. Independent buyer side advisory only.