Building a Consumption Model for Claude

Almost every committed spend mistake we see starts the same way: a buyer agrees to a number before anyone has built a model that explains where the number came from. The account team proposes a commit band, the figure sounds reasonable against the current run rate, and the deal closes on a forecast that is really just a guess dressed up as a plan. A consumption model is the cure. It is the document that turns your actual Claude usage into a forecast you can defend line by line, and it is the single most valuable thing you can build before you sit down to negotiate. This piece walks through how to build one that holds up, because the quality of your commit is never better than the quality of the model underneath it.

What a consumption model actually is

A consumption model is a structured forecast of how many tokens your workloads will consume over the term of an agreement, broken down by the things that actually drive cost. It is not a single number. It is a build up: each workload, the volume of requests it generates, the input and output tokens per request, the model each request runs on, and the optimizations that change the effective rate. When you assemble those pieces you get a forecast you can interrogate, and interrogation is the point. A number you cannot break apart is a number you cannot defend, and a forecast that cannot be defended is one the other side can move at will. The model gives you the opposite: a forecast where every assumption is visible, sourced, and adjustable.

Start from real usage, not from ambition

The foundation of a credible model is your own usage data. Pull the consumption logs for every workload that touches Claude and establish a true baseline: requests per day, tokens in and tokens out, the model each call uses, and how those figures move across a typical month. This is the part teams most often skip, because it is tedious and because ambition is more fun than arithmetic. But a model built on what you hope to do rather than what you actually do is worthless at the table, since the moment Anthropic asks how you arrived at a figure you have no answer. Real logs give you an answer for every line. They also surface the surprises that always exist in a mature application: the workload nobody remembered, the retry loop quietly doubling a call volume, the batch job that runs heavier than anyone assumed. Find those now, in your own model, rather than discovering them in an invoice after you have committed.

Separate the drivers that move cost

Token spend is the product of several independent drivers, and a good model keeps them separate so you can reason about each one. The first is volume, the number of requests each workload generates, which is usually tied to a business metric such as users, documents processed, or tickets handled. The second is the token weight of each request, the input you send and the output you receive, where output matters disproportionately because it is billed at a much higher rate than input. The third is model choice, since the same request costs very differently on Opus, Sonnet, and Haiku, and the mix across those models is one of the largest levers in the whole forecast. The fourth is the set of optimizations that change the effective rate, principally prompt caching, which can take up to ninety percent off repeated input, and batch processing, which runs asynchronous work at roughly half the real time rate. Keep these four drivers in separate columns and your model becomes a tool you can steer, because you can change one assumption and watch the forecast respond rather than guessing at a blended number.

The four drivers to model separately

Volume, tied to a business metric so growth scales the forecast naturally.
Token weight per request, with input and output split because output is billed far higher.
Model mix across Opus, Sonnet, and Haiku, the single largest rate lever.
Optimization effect from caching at up to ninety percent and batch at fifty percent.

Tie volume to a business metric

The mistake that wrecks most forecasts is projecting token volume directly, as a raw number that grows by some assumed percentage. That hides the real relationship, which is that token volume rides on a business driver. Tie each workload to the metric that actually generates its requests, then forecast that metric and let the tokens follow. A support automation workload scales with ticket volume, a document pipeline scales with documents ingested, a customer facing feature scales with active users and their engagement. When you model it this way, your forecast becomes legible to the people who have to approve it, because finance can challenge the growth rate of a business metric they understand rather than arguing about an abstract token count nobody can sanity check. It also makes the model self correcting: if the business metric comes in below plan, the token forecast moves with it, and you have an early signal that your commit may be sized too high.

Model the optimized state, not the current mess

A consumption model should reflect where your application will be after optimization, not where it sits today, because committing to your current unoptimized usage means committing to waste. Before you finalize the forecast, run the optimization analysis: which workloads can move to a cheaper model without losing quality, where prompt caching applies to repeated context, which asynchronous jobs belong in batch. Then build the model around the optimized state. This typically pulls the forecast down substantially, because disciplined model routing alone often cuts aggregate spend forty to seventy percent against uniform Opus use, and caching and batch stack further savings on top. The reason to do this before you commit rather than after is leverage. If you commit to inflated usage and optimize later, you have locked yourself into paying for tokens you no longer consume, and you have handed Anthropic the value of efficiencies you created. Model the optimized state, commit to that leaner number, and the savings stay yours.

Build a range, not a point

No forecast is exact, and a model that pretends to be invites trouble. Build a range instead: a conservative case that reflects slower growth and heavier optimization, a base case that reflects your honest expectation, and a higher case that reflects faster adoption. The range does two things. It tells you how much risk sits in the commit, since a wide range means the number is uncertain and you should commit cautiously toward the lower end. And it arms you for the negotiation, because when the account team pushes a high commit you can show exactly what would have to be true for that figure to make sense, and let the conversation turn on assumptions rather than on willpower. A buyer who arrives with a single number is negotiating a guess. A buyer who arrives with a modeled range is negotiating from evidence, and evidence is what moves a deal.

Use the model to size the commit, then keep it live

The output of the model is the input to your commitment strategy. With a defensible range in hand, you size the commit toward the lower end of what you are confident you will consume, because undercommitting and paying a little overage is almost always cheaper than overcommitting and forfeiting unused commitment, which on most Anthropic agreements simply disappears at the end of the period. The model tells you where that lower bound sits. It also keeps working after you sign. Update it against actual consumption through the term, and it becomes your early warning system: it tells you when you are tracking ahead and should plan for the next band, or behind and should prepare to renegotiate rather than renew into a number you will not use. A consumption model is not a one time artifact for a single negotiation. It is the instrument you run your whole Anthropic relationship from, and the buyers who keep it live are the ones who never get surprised by their own invoice.

Where consumption models go wrong

It is worth naming the failure modes, because most weak models share the same handful of flaws. The first is the single point estimate, a model that produces one number with no range, which hides all the uncertainty and gives the negotiation nothing to push on. The second is the stale baseline, a model built once and never refreshed, so that by the time it reaches the table it describes a workload that has already moved. The third is the optimistic growth curve with no business driver, where token volume simply rises by a chosen percentage each period because that felt reasonable, untethered from anything that could be checked. The fourth is double counting, where a workload appears twice under different names because nobody reconciled the model against the actual billing, inflating the forecast in a way that survives until someone reads the invoice. The fifth, and the most expensive, is modeling the current unoptimized state and committing to it, which locks in waste the team was about to remove anyway. Each of these is avoidable, and the discipline that avoids them is the same: build from real data, separate the drivers, tie growth to a metric, reconcile against billing, and model the optimized state. A consumption model that does those five things holds up under scrutiny from both Anthropic and your own finance team, and a model that skips any of them tends to fail at exactly the moment you need it most, when the account team asks you to defend the number you walked in with.

The buyer checklist

Build the model from real consumption logs, not from ambition or a current run rate.
Keep volume, token weight, model mix, and optimization effect as separate drivers you can steer.
Tie each workload to the business metric that generates its requests, so the forecast is legible and self correcting.
Model the optimized state before you commit, so you never lock in waste.
Forecast a range, size the commit toward the lower bound, and keep the model live through the term.

A consumption model is where a sound Anthropic commitment begins, because the commit is only ever as good as the forecast beneath it. We build these models with clients and then carry the optimized baseline into the negotiation so the commit reflects real demand rather than a guess. For the full framework on routing, caching, and batch that feeds the model, read the pillar guide and download the playbook, the token optimization playbook.

Building a consumption model for Claude.