The single biggest lever on a Claude bill is which model runs each request. Choose by cost first and quality second, rather than defaulting everything to the most capable tier, and aggregate spend typically falls forty to seventy percent. Here is how the three tiers price and where each one earns its keep.
Most teams pick a Claude model the way they pick a default font. Someone chose the most capable option during the prototype, it worked, and it became the standard for everything. That habit is the most expensive mistake in the entire token budget, because the three tiers are not three flavors of the same price. They sit at very different points on the cost curve, and running every request on the top tier means paying premium rates for work that a cheaper model would have done identically. A cost first approach inverts the default. Instead of asking whether the cheapest model is good enough and only moving up when it fails, most teams ask whether the most expensive model is necessary and never move down. This guide walks the three tiers in plain commercial terms, shows where each belongs, and explains why routing across them is the lever that moves a Claude bill more than any contract clause.
Claude comes in three tiers that trade capability against cost. Opus is the most capable and the most expensive per token, built for the hardest reasoning, the longest chains of logic, and the work where a wrong answer is costly. Sonnet sits in the middle, strong on the large majority of production tasks at a fraction of the Opus rate, which is why it is the right default for most workloads rather than the exception. Haiku is the fastest and cheapest, built for high volume, well defined, latency sensitive work where the task is narrow and the cost per call has to be tiny because the call count is enormous. The exact rates move over time and across input versus output, so the number that matters is not the headline price but the ratio: the gap between the tiers is large enough that the model choice, not the prompt, is usually the dominant term in your cost.
The asymmetry worth internalizing is that output tokens cost several times more than input tokens on every tier. That means a verbose model producing long answers is expensive twice over, once for the tier and once for the volume of output, and it is why a cheaper model that answers concisely can beat a pricier one that rambles. Cost first model selection is partly about the tier and partly about controlling how much the model says.
Opus is worth its rate when the task genuinely needs the top of the reasoning curve and the cost of a wrong answer is high. That includes complex multi step reasoning, difficult code generation where a subtle bug is expensive, nuanced analysis of ambiguous material, and the small set of flagship features where quality is the product. The mistake is not using Opus. The mistake is using it everywhere. Reserve it for the work that needs it, measure how often that actually is, and you usually find it is a minority of your traffic carrying a majority of your cost. A buyer who can name the specific workloads that justify Opus, and defend the boundary against scope creep, has already found a large part of the savings.
For most production workloads, Sonnet is the model that should run by default. It handles the bulk of summarization, drafting, classification, extraction, standard code, and conversational tasks at a quality most users cannot distinguish from the top tier, at a meaningfully lower rate. The strategic move for most teams is to make Sonnet the default and require a reason to move up to Opus or down to Haiku, rather than making Opus the default and never revisiting it. That single inversion, default to the middle tier and justify the extremes, captures much of the available saving before any other optimization is applied. It also concentrates your Opus spend on the cases that earn it, which makes the eventual measurement honest.
Haiku is the tier teams under use, usually because they never tested whether it was good enough. For high volume, well scoped tasks, classification, routing, simple extraction, tagging, short structured responses, first pass filtering, Haiku often delivers identical results to a pricier model at a fraction of the cost, and faster. When the call count is large, the cost difference compounds into real money, and the latency advantage is a feature in its own right. The discipline is to test Haiku on each candidate workload rather than assuming it cannot cope. Many teams discover that a large slice of their traffic was paying premium rates for work the cheapest model handled perfectly.
The point of understanding the three tiers is to route between them, and routing is where the forty to seventy percent saving comes from. A routing layer classifies each request and sends it to the cheapest model that clears the quality bar for that task, Haiku for the simple and high volume, Sonnet for the standard, Opus for the genuinely hard. Done well, the user sees no difference in quality and the aggregate bill falls sharply, because the expensive tier is now reserved for the fraction of traffic that needs it. This is a far bigger lever than negotiating a few points off the rate, because it changes the mix of what you buy rather than the unit price of a single tier. Pair it with prompt caching, which takes up to ninety percent off repeated input, and batch processing at roughly half the real time rate for asynchronous work, and the compounding saving is what turns a runaway Claude bill back into a managed one.
Cost first model selection is not only an engineering decision, it is a negotiation asset. When you commit to Anthropic against an optimized, routed baseline, you commit to a smaller and truer number than the one your unoptimized bill implies, and you negotiate from demonstrated efficiency rather than padded spend. A buyer who has already done the routing work walks into the commit conversation with a lower, defensible forecast and the credibility that comes with it. The teams that overcommit are usually the ones that never routed, sized the commitment against uniform Opus usage, and locked in waste for the length of the term. Get the model mix right first, then commit, and the same workload costs less on the bill and less in the contract.
The full version of this analysis, with the routing patterns we deploy, the quality testing approach that keeps it safe, and the caching and batch levers that stack on top, sits in our token optimization playbook. Download it below and start with the model mix, because it is the largest lever you control without renegotiating anything.
Download the token optimization playbook for the routing, caching, and batch levers that cut Claude spend forty to seventy percent.
Download the playbookWeekly intelligence on Anthropic pricing moves and the buyer side counters that work.