Independent buyer side advisory · Anthropic onlyNew York · London
Blog · Model Selection
Top of funnel · Informational

When Haiku is good enough.

Most teams reach for the biggest Claude model out of caution and pay for capability the task never uses. Haiku is good enough for a surprising share of production work. Here is how to tell which workloads, how to prove it without guessing, and what moving them is actually worth.

The most common cause of an oversized Claude bill is not heavy usage. It is using a heavier model than the work requires. Teams standardize on Opus or Sonnet early, often during a pilot, and then never revisit the choice as the workload settles into routine, predictable tasks. Haiku, the fastest and least expensive model in the family, sits unused while it would handle a meaningful share of that traffic perfectly well. The question worth asking is not whether Haiku is as capable as Opus, because it is not. The question is whether a given task actually needs Opus level capability at all. For a large category of production work, the honest answer is no, and that gap is where the savings live.

The cost difference is not marginal

Moving a workload from a top tier model to Haiku is not a small efficiency. The per token price difference across the model tiers is large, and because cost scales directly with token volume, sending high volume routine traffic to Haiku rather than Opus can change the economics of an entire product line. This is why model selection routinely drives forty to seventy percent of aggregate spend. It is the single biggest lever most teams have, and it costs nothing to pull beyond the engineering time to route correctly. The mistake is treating model choice as a quality decision when it is really a fit decision, matching the model to what each task genuinely demands.

The right question is never which model is best. It is which is the smallest model that clears the quality bar for this specific task. For a lot of tasks, that model is Haiku.

Where Haiku is genuinely good enough

Haiku excels at high volume, well defined tasks where the output is structured or bounded and the reasoning required is shallow. These are exactly the tasks that often run on a larger model purely out of habit. Common examples include the following.

  • Classification and routing. Tagging tickets, sorting messages by intent, labeling content, and deciding which downstream path a request should take are pattern tasks that Haiku handles reliably and cheaply.
  • Extraction. Pulling structured fields out of documents, parsing entities from text, and turning unstructured input into clean records are well within Haiku's range.
  • Short form generation. Drafting brief, templated responses, generating summaries of modest length, and producing simple rewrites rarely need a frontier model.
  • First pass filtering. Screening, deduplication, and triage steps that feed a more capable model later are ideal for Haiku, because Haiku does the cheap bulk work and the expensive model only sees what survives.
  • High frequency, low stakes calls. Anything that runs constantly and where an occasional imperfect result is tolerable is a candidate, because the volume is exactly what makes the cost difference matter.

Where it is not

Haiku is the wrong choice when the task needs deep reasoning, long multi step logic, nuanced judgment, or careful handling of ambiguity. Complex analysis, intricate code generation, sensitive customer facing reasoning, and work where a wrong answer is costly should stay on Sonnet or Opus. The goal is not to push everything down to the cheapest model, which would trade savings for quality you cannot afford to lose. The goal is to stop running shallow tasks on deep models. Knowing the line between the two is the whole skill, and it is workload specific rather than universal.

How to prove it rather than guess

You do not have to decide on instinct. The reliable way to find Haiku candidates is to test them against the model you run today and measure the quality difference on your own data, not on a benchmark. The method is straightforward.

  • Pick a candidate workload that looks shallow and high volume. Pull a representative sample of real production inputs for it.
  • Run that sample through both your current model and Haiku, and compare the outputs on the quality metric that actually matters for that task, whether that is accuracy, format compliance, or human rated usefulness.
  • If Haiku clears your quality bar, route the workload to it. If it falls short, you have learned the task needs more capability and you keep it where it is. Either way you have replaced a guess with evidence.
  • Re test periodically, because workloads drift and a task that needed Sonnet last year may sit comfortably on Haiku after the inputs stabilized.

Routing, not replacing

The strongest setups do not pick one model for everything. They route each request to the smallest model that fits, with cheaper models handling the bulk and escalating to larger models only when a request genuinely needs it. Haiku becomes the default for the shallow majority, Sonnet handles the middle, and Opus is reserved for the work that truly demands it. That tiered routing is what produces the largest aggregate savings, because it matches spend to need request by request rather than applying one expensive default to all of it. Haiku being good enough for part of the traffic is what makes the whole routing strategy pay off.

Where this fits the wider optimization picture

Model selection is the largest of several levers that move Claude spend, and it compounds with the others. Caching reduces the cost of repeated context on whichever model you choose, batch halves the cost of asynchronous work, and disciplined prompts cut waste everywhere. Our token optimization playbook brings these together into a single method for taking cost out of a Claude deployment without losing quality, with model routing as the foundation. Getting Haiku into the workloads it fits is usually the fastest first win on that path.

The takeaway

Haiku is good enough for a large share of real production work, specifically the high volume, well defined, shallow reasoning tasks that teams too often run on a heavier model out of habit. Because cost scales with volume and the price gap between tiers is wide, moving those tasks to Haiku is one of the biggest savings available, and it can be proven rather than guessed by testing candidates against your current model on your own data. Reserve the larger models for the work that needs them, route everything else down, and let Haiku carry the bulk. Download the token optimization playbook to map which of your workloads belong on Haiku and what moving them is worth.

Find the workloads Haiku can carry.

We test your traffic model by model on your own data and route each task to the smallest model that clears the bar. Download the playbook to see the method.

Download playbook
Start here
Get the spend in your favor.

The Counteroffer

Weekly intelligence on Anthropic pricing moves and the buyer side counters that work.

Get a Quote · Book a Strategy Call · The Counteroffer · New York · London Not affiliated with Anthropic PBC. Independent buyer side advisory only.