Independent buyer side advisory · Anthropic onlyNew York · London
Home · Blog · Token Optimization
Token Optimization

When Opus is worth the premium and when it is not.

Buyer side guide · 11 minute read

Opus is the most capable model in the Claude family and the most expensive to run. The temptation, once a team has access to it, is to route everything through it on the theory that the best model is always the safe choice. That theory is what produces the largest avoidable Claude bills we see. The honest answer is that Opus is worth its premium for a specific class of work and a waste of money for everything else. The skill is knowing which is which, and building your system so the premium is spent only where it earns its keep.

What you are actually paying for

The Claude lineup spans Opus, Sonnet, and Haiku, and the price differences between them are large, not marginal. Opus carries a substantial premium over Sonnet, which in turn costs more than Haiku. When you route a request to Opus, you are paying that premium on every input and output token, whether or not the task needed the extra capability. For a high volume workload, the difference between running on Opus and running on the right cheaper model is not a few percent; it is frequently the difference between a manageable bill and a runaway one. Routing across the three models, rather than defaulting to Opus, is what typically cuts aggregate spend by forty to seventy percent.

So the question is never whether Opus is better. It usually is. The question is whether the increment of capability it adds, for a given task, is worth the increment of cost. For some tasks that increment is decisive and cheap relative to the value. For most tasks it is invisible to the end result and expensive.

When Opus is worth it

There is a clear profile of work where the Opus premium pays for itself many times over. The common thread is that the task is hard, the cost of a wrong answer is high, and the volume is low enough that the per request premium does not dominate.

Genuinely complex reasoning

Tasks that require multi step reasoning, holding many constraints at once, or working through a problem where a lesser model visibly stumbles are where Opus earns its place. If you can observe Sonnet failing or producing materially worse output on a task, and the task matters, Opus is the right call.

High stakes output

When a mistake is expensive, the premium is cheap insurance. Legal analysis, financial reasoning, complex code that will ship to production, and anything where a subtle error propagates downstream all justify the better model, because the cost of getting it wrong dwarfs the cost difference between models.

Low volume, high value

The premium matters far less when the task runs hundreds of times a day rather than millions. A complex analysis run a few thousand times a month on Opus is a small line on the bill and a large source of value. The same workload at internet scale would be a different calculation entirely.

When Opus is not worth it

The mirror image is just as clear. There is a large category of work where Opus adds cost without adding anything the end user or the system can detect.

Simple, well defined tasks

Classification, extraction, formatting, routing, short summarization, and structured data transformation are tasks where a smaller model performs essentially as well. Paying the Opus premium to label support tickets or pull fields from a document is spending on capability the task cannot use.

High volume workloads

When a task runs millions of times, the per request premium compounds into the dominant cost. Even a modest task, run at that scale on Opus, can swamp a budget. High volume is precisely where moving to Sonnet or Haiku produces the largest absolute saving, because the multiplier is so large.

Tasks where Haiku already passes

The decisive test is empirical. If Haiku produces output that passes your quality bar on a given task, then anything more expensive is pure waste for that task. Many teams never run this test and assume they need the top model when the cheapest one would have passed. The default to Opus is a habit, not a measurement.

The framework: test down, do not default up

The mistake almost everyone makes is starting at Opus and never moving down. The disciplined approach is the reverse. For each distinct task in your system, start with the cheapest model that might plausibly work and test whether its output meets your quality bar. If it passes, you are done, and you are paying the least. If it fails, step up to the next model and test again. Only tasks that genuinely need Opus end up on Opus.

This is task by task, not system wide. A single application typically contains a mix: a few hard reasoning steps that belong on Opus, a layer of moderate work that Sonnet handles well, and a high volume base of simple operations that Haiku does for a fraction of the cost. Routing each task to the right tier, rather than running the whole application on one model, is the core of the saving. The work is in classifying the tasks and building the routing, and it is the single highest return engineering effort available on most Claude deployments.

How model choice interacts with the other levers

Model routing is the largest single driver of aggregate spend, but it stacks with the rest. Prompt caching cuts the cost of repeated context by up to ninety percent on whichever model you route to. Batch processing runs eligible asynchronous work at half the standard rate, and the model you batch on still matters. The full optimization program combines all three: route each task to the cheapest sufficient model, cache the stable context, and batch what can tolerate delay. Model choice is where you start because it has the largest aggregate effect, but it is not the whole answer.

Why this matters before you commit

Model selection is not only an engineering decision; it sets the consumption number you carry into your Anthropic negotiation. A workload that has been routed properly commits to a genuinely lower spend, which protects you from oversizing your commitment and the shortfall that follows. A team that defaults everything to Opus commits to an inflated number, then either overpays against it or scrambles to justify it. Doing the routing work first means the number you commit to is real, and a real number negotiated from disciplined spend is a stronger position than a padded one.

The buyer side takeaway

Opus is worth its premium for complex reasoning, high stakes output, and low volume high value work, where the increment of capability is decisive and the increment of cost is small relative to the value. It is a waste for simple, well defined, high volume tasks where a smaller model passes the quality bar. Do not default up to the best model; test down to the cheapest sufficient one, task by task, and route accordingly. The discipline typically cuts aggregate spend by forty to seventy percent and leaves you committing to a real number rather than a padded one. To see the model mix that fits your workloads, download the token optimization playbook.

The best model is not always the right one.

Download the token optimization playbook, or have us map your workloads to the right model tier.

Download the playbook

The Counteroffer

Weekly intelligence on Anthropic pricing moves and the buyer side counters that work.

Get a Quote · Book a Strategy Call · The Counteroffer · New York · London Not affiliated with Anthropic PBC. Independent buyer side advisory only.