Independent buyer side advisory · Anthropic onlyNew York · London
Blog · Token Optimization
Top of funnel · Informational

When to use Haiku instead of Sonnet.

Haiku is a fraction of the price of Sonnet, and for a surprising share of real workloads it is the right model rather than a compromise. The skill is knowing which tasks it handles well and proving the swap before you scale it. Here is how to decide where Haiku belongs and how much it can take off your bill.

Teams building on Claude tend to reach for the most capable model they can and leave it there. Sonnet becomes the default for everything, because it is strong across the board and nobody got fired for using the better model. The problem is that capability has a price, and running every task on a mid tier model when many of those tasks would run just as well on the cheapest one is one of the most common sources of waste we see. Haiku exists for exactly this reason. The question is not whether Haiku is as capable as Sonnet, because it is not. The question is which of your tasks do not need Sonnet's capability in the first place.

What you are actually paying for

The Claude family is tiered by capability and priced accordingly. Opus sits at the top for the hardest reasoning, Sonnet in the middle as a strong general model, and Haiku at the bottom as the fast, inexpensive option. The price gap between Haiku and Sonnet is large, several times over on both input and output. That gap is the size of the prize. Every request you move from Sonnet to Haiku without losing acceptable quality is a request whose cost falls by a multiple, not a few percent. Across a high volume workload, that adds up to a material change in aggregate spend.

The instinct to avoid the cheaper model usually rests on a quiet assumption that more capability is always safer. For a hard task that is true. For a simple, well defined task it is false, because the extra capability sits idle and you pay for it anyway. The discipline is to match the model to the difficulty of the work rather than defaulting to the strong model out of caution.

The right model is the cheapest one that meets your quality bar for that specific task, not the most capable one you can afford. On a high volume endpoint, that distinction is the difference between a large bill and a small one.

The tasks where Haiku is the right call

Haiku is well suited to work that is high in volume and low in reasoning depth, where the task is clearly specified and the answer space is narrow. These are the workloads where its speed and price are an advantage and its lower ceiling is rarely tested.

  • Classification and routing. Deciding which category a message belongs to, which queue a ticket goes in, or which downstream model should handle a request. The task is constrained and the output is short.
  • Extraction. Pulling structured fields out of a document or message, where the information is present and the job is to find and format it rather than to reason about it.
  • Short form transformation. Reformatting, simple rewriting, tagging, and tidying text where the transformation is mechanical.
  • First pass triage. Filtering a large volume of inputs to decide which ones need a more capable model, so Haiku handles the bulk and Sonnet only sees the cases that matter.
  • High frequency, low stakes responses. Suggestions, autocompletions, and lightweight replies where speed matters and the occasional weaker answer carries little cost.

What these share is that the task does not depend on deep reasoning, long chains of inference, or subtle judgment. When the work is well defined and the output is bounded, Haiku frequently matches Sonnet closely enough that the difference does not show up in your outcomes, while the cost difference shows up immediately on your bill.

The tasks where you should stay on Sonnet

The swap is not free of risk, and there are clear cases where reaching for Haiku is a false economy. Work that requires multi step reasoning, that handles ambiguous or adversarial inputs, that produces output a customer reads directly and judges your brand by, or where an error is expensive to recover from, generally belongs on Sonnet or higher. The saving on a single request is never worth a wrong answer in a context where wrong answers carry real cost. The art is drawing the line accurately, and the only reliable way to draw it is to test rather than to guess.

How to test the swap safely

Never move a workload from Sonnet to Haiku on assumption. Prove it. The method is straightforward and it protects you from both the cost of staying on the expensive model and the risk of degrading quality. Run a representative sample of your real traffic through both models, compare the outputs against your actual quality bar, and measure the gap honestly. If Haiku meets the bar on the sample, the swap is safe to scale. If it falls short, you have learned that cheaply rather than discovering it in production.

The comparison has to use your real inputs and your real definition of acceptable, not a generic benchmark. A model that scores well on a public test may still fail on the specific shape of your data, and a model that looks weaker in general may handle your narrow task perfectly. Test on what you actually run, judge against what you actually need, and let the evidence decide rather than the instinct to play safe.

Routing, not replacing

The strongest pattern is rarely to replace Sonnet wholesale with Haiku. It is to route. Different requests within the same application have different difficulty, and a routing layer can send each to the right model. Simple, high volume requests go to Haiku, harder ones go to Sonnet, and the genuinely difficult cases go to Opus. A cheap classification step, often run on Haiku itself, can decide the route. This way you capture the saving on the bulk of traffic that does not need the expensive model while preserving quality on the work that does. Routing is where model selection delivers the often quoted 40 to 70 percent reduction in aggregate spend, because it applies the right rate to each request rather than the highest rate to all of them.

The hidden saving in latency

Haiku is not only cheaper, it is faster, and for some workloads the speed is worth as much as the price. A high frequency interactive feature that feels sluggish on a larger model can feel immediate on Haiku, which improves the product as well as the bill. When you are deciding where Haiku belongs, weigh latency alongside cost, because the model that is cheap enough to use freely and fast enough to feel instant can change what you are able to build, not just what you pay for it.

Where this fits the wider playbook

Choosing Haiku over Sonnet is one move within model routing, which is itself one lever among several including prompt caching, batch processing, and output reduction. The largest savings come from combining them, and from sequencing the work so you capture the easy wins first. Our token optimization playbook lays out how model selection fits with the other levers and how to run the testing that makes a model swap safe. Haiku is often the fastest place to start, because the price gap is so large that even a partial migration of suitable traffic returns quickly.

Helping Haiku punch above its tier

The gap between Haiku and Sonnet on a given task is not fixed. It narrows when you give the cheaper model the support it needs to succeed. A task that Haiku handles poorly with a vague prompt can become a task it handles well with a clear one. Precise instructions, a few good examples of the output you want, and a tightly defined response format all lift the floor of what the smaller model can do. Before you conclude that a workload needs Sonnet, make sure you have given Haiku a fair chance with a well constructed prompt, because the difference between the two models often shrinks to nothing once the prompt does its job.

This is where prompt design and model selection meet. The same engineering that makes a prompt clearer and cheaper also makes the cheaper model viable on more of your traffic. A team that invests in good prompts widens the set of tasks where Haiku is good enough, which widens the share of traffic that can move to the cheaper rate. The investment pays twice, once in quality and once in the model tier it unlocks.

Watch for quality drift after the swap

A model swap that passes its test on day one can still degrade later if your inputs change. The distribution of traffic a model sees in production drifts over time as your product evolves, your users change, and new cases appear that were not in your original sample. A workload that Haiku handled well on last quarter's traffic may struggle on this quarter's if the nature of the requests has shifted. The protection is to keep a sample of production traffic flowing through a quality check, so you notice degradation before it shows up as a complaint rather than after.

This does not mean second guessing every swap. It means treating model selection as a decision you revisit when the inputs change, not a setting you fix once and forget. The teams that get the most from Haiku are the ones that move suitable work to it confidently and watch the quality signal, ready to route a workload back up a tier if its difficulty has genuinely increased. That combination of confidence and monitoring is what lets you capture the saving without carrying hidden quality risk.

The cost of leaving everything on Sonnet

It is worth naming the cost of doing nothing, because the default of running everything on the safer model feels free and is not. Every high volume, low complexity request that runs on Sonnet when Haiku would have sufficed is overpaying by a multiple, every day, at scale. That waste does not announce itself. It sits quietly inside an invoice that looks reasonable because nobody has separated the work that needed Sonnet from the work that did not. The exercise of finding the Haiku eligible traffic is, in effect, the exercise of finding money you are already spending for no return. For most teams running real volume, that is a larger number than they expect.

Your Anthropic number is negotiable.

Get a quote for a bounded engagement. Fixed fee or gainshare, no risk to you.

Get a Quote

The Counteroffer

Weekly intelligence on Anthropic pricing moves and the buyer side counters that work.

Get a Quote · Book a Strategy Call · The Counteroffer · Blog · New York · London Not affiliated with Anthropic PBC. Independent buyer side advisory only.