How to forecast Claude token consumption.

Buyer side guide · 12 minute read · By Morten Andersen · Published May 29, 2026 · Updated June 12, 2026

Every committed spend decision rests on a forecast, and most forecasts are guesses dressed up as numbers. A buyer takes a few months of usage, draws a line, multiplies by twelve, and signs a commitment against it. Then reality diverges, usage ramps slower or faster than the line, the model mix shifts, a new workload appears, and the commitment is either overshot into overage or undershot into expiring unused spend. A good token forecast is not a straight line. It is a model built workload by workload, with the ramp, the model mix, and the optimization layer made explicit. This guide shows how to build one that holds up well enough to commit against, and where the common forecasts go wrong.

Forecast by workload, not by trend line

The first principle is to stop forecasting at the aggregate level. Total spend is the sum of distinct workloads that behave differently, and a trend line through the total hides all of it. Break the forecast into its parts: the interactive seat usage, the production API workloads, the agent and coding usage, the batch and bulk jobs, and any dedicated capacity. Each has its own driver, its own growth pattern, and its own model mix, and each should be modeled on its own terms before you add them up.

For each workload, the unit of the forecast is tokens, not dollars, because dollars are an output of tokens times rate times model mix, and conflating them hides where the cost actually comes from. Estimate the volume driver, requests, tasks, documents, sessions, then the tokens per unit, then the model that handles it. Build the dollar figure last, from those pieces. A forecast assembled this way tells you not just how much but why, which is what makes it defensible and adjustable.

Model the ramp, because adoption is not linear

The second principle is that adoption ramps in steps, not a smooth line, and the ramp is where most forecasts break. A new workload pilots small, gets validated or approved, then jumps as it goes to production and spreads. Agent and coding usage in particular grows in bursts as champions drive adoption. A forecast that assumes smooth linear growth will be wrong in both directions: too high in the early months when the ramp has not started, too low later when it accelerates.

Model the ramp explicitly for each workload: when does it start consuming meaningfully, what gates it, a pilot, an approval, a model risk sign off, and how fast does it climb once the gate clears. In regulated environments especially, the gate is an approval and the ramp waits on it, which makes the curve slower and lumpier than the business case assumes. A forecast that builds the approval timeline into the ramp is far closer to reality than one that draws a confident line from a three month sample.

Forecast the model mix, because it dominates cost

The third principle is that the model mix drives the dollar figure as much as the token volume, and a forecast that ignores it is forecasting the wrong thing. The same number of tokens costs very differently depending on whether it runs on Opus, Sonnet, or Haiku. A forecast that assumes everything runs on the top model overstates cost dramatically if you intend to route, and a forecast that assumes aggressive routing you have not actually implemented understates it.

So forecast the mix you will actually run. For each workload, estimate the share of tokens by model based on the routing you have built or plan to build. This is also where the forecast meets the optimization: a workload you intend to route across Opus, Sonnet, and Haiku will land forty to seventy percent below a uniform top model assumption, and a workload with a heavy shared prefix that you will cache will see its input cost drop by up to ninety percent on that shared portion. A forecast that bakes in the optimization you will deploy is the forecast you should size a commitment against. One that uses list assumptions overstates the commitment and pushes you to commit too large.

Build the bands, not a single number

The fourth principle is to forecast a range, not a point. A single number invites a single commitment, and a single commitment is brittle. Build a conservative case, an expected case, and an aggressive case, driven by the variables you are least sure of: the ramp timing, the adoption rate, the model mix. The spread between the cases tells you how much uncertainty you are carrying, and that uncertainty is exactly what the commitment structure has to absorb.

This is where the forecast turns into a negotiating position. If your conservative and aggressive cases are far apart, you do not want a large fixed commitment sized to the middle, because either tail leaves you exposed: overage if you run hot, expiring commitment if you run cold. You want a commitment sized near the conservative case, a ramp that steps up as usage proves out, a protected overage rate for the upside, and negotiated unused commitment treatment for the downside. The forecast does not just tell you how much to commit. It tells you how to structure the commitment so neither tail hurts.

Common forecasting mistakes and how to avoid them

A few forecasting mistakes show up again and again, and naming them is the fastest way to avoid them. The first is the trend line: taking a few months of total spend, drawing a straight line, and committing against it. This hides the workload mix, ignores the step shape of adoption, and bakes in whatever model mix happened to be running during the sample. The fix is to forecast bottom up, workload by workload, in tokens before dollars.

The second mistake is forecasting list price rather than optimized cost. A team models everything on the top model, produces a frightening number, and either over commits to it or panics. If you intend to route across Opus, Sonnet, and Haiku, to cache shared context, and to use batch for bulk jobs, the forecast must reflect that, or it is forecasting a deployment you will not actually run. The third is forecasting a single number instead of a range, which invites a brittle single commitment. Build a conservative, expected, and aggressive case so the uncertainty is visible and the commitment can be structured to absorb it.

The fourth, and the most damaging in regulated settings, is ignoring the approval gate on the ramp. Adoption that waits on a model risk sign off or a compliance approval ramps later and lumpier than the business case assumes, and a forecast that draws a confident curve from an early sample will overshoot. Build the approval timeline into the ramp and the forecast comes back to earth.

Revisiting the forecast as reality comes in

A forecast is not a one time artifact. It is a model you update as real consumption arrives, and the discipline of revisiting it is what keeps a committed spend safe over the life of the term. Each period, compare actual tokens by workload against the forecast, and look not just at whether the total matched but at why any gap opened. A workload ramping slower than projected usually means an approval took longer than planned, which tells you the rest of the curve will shift too. A workload running hot tells you the optimization is lagging the growth, or that adoption outran the plan.

Updating the forecast this way turns the commitment from a bet into a managed position. If usage is tracking below the conservative case, you have early warning that unused commitment is at risk and time to act, by accelerating a workload or by exercising whatever flexibility the unused commitment treatment gives you. If it is tracking above the expected case, you have early warning of overage and time to push the optimization harder or to revisit the commitment at the next opportunity. The buyers who get blindsided are the ones who forecast once and never look again. The ones who stay ahead treat the forecast as a living instrument.

This is also where the forecast and the negotiation reconnect. A forecast you maintain gives you evidence, and evidence is leverage. When you go back to Anthropic to adjust a commitment, restructure a ramp, or defend an overage rate, a maintained, workload level forecast that has tracked reality is far more persuasive than a number you pulled together once at signing. The forecast is not just how you size the deal. It is how you keep negotiating it well for the whole term.

Turning the forecast into a negotiating position

A forecast built well does more than tell you how much to commit. It tells you how to structure the commitment, and that structure is itself a negotiating position. Because you have modeled a conservative, expected, and aggressive case, you know how much uncertainty you are carrying, and you can ask for the structure that absorbs it rather than accepting a single brittle number. That means a commitment sized near the conservative case rather than the optimistic one, a ramp that steps up as workloads prove out instead of assuming day one volume, a protected overage rate so the aggressive case does not punish you, and negotiated unused commitment treatment so the conservative case does not strand spend.

Each of these asks is backed by your forecast, which is what makes them credible at the table. A buyer who arrives with a workload level model, an explicit ramp, an honest model mix, and a defined range negotiates from evidence. A buyer who arrives with a single number pulled from a trend line negotiates from a guess, and the vendor can tell the difference. The forecast is the difference between asking for protections you can justify and asking for them on faith.

This is also why the forecast and the optimization are the same project. You cannot size a commitment safely until you have forecast the optimized consumption it should cover, and you cannot forecast the optimized consumption until you know which levers you will pull: routing across Opus, Sonnet, and Haiku, caching the shared context at up to ninety percent off on the repeated portion, and batch at half rate on bulk jobs, which together typically take aggregate spend forty to seventy percent below a uniform top model assumption. Forecast the optimized number, size the commitment to it with the right structure, and you have turned a budgeting exercise into a negotiating advantage. Our token optimization playbook includes this forecasting method alongside the optimization levers it depends on, with the numbers behind each.

From forecast to committed spend

A token forecast built this way, by workload, with an explicit ramp, an honest model mix, the optimization baked in, and expressed as a range, is the foundation of a committed spend you can sign without fear. It tells you the number, the shape, and the protections you need, and it gives you the evidence to defend all three to Anthropic and to your own finance team. The buyers who get committed spend wrong are almost always the ones who forecast with a trend line and a single number. The ones who get it right model it the way described here.

Our token optimization playbook includes the forecasting method alongside the optimization levers it depends on, with the numbers behind each, because the two are the same project: you cannot size a commitment safely until you have forecast the optimized consumption it should cover. Download it for the full sequence.

Read the pillar guide

The token optimization playbook: cut Claude spend without cutting usage →

Stop guessing at your token spend.

Download the token optimization playbook and see the exact levers we pull to cut aggregate Claude spend 40 to 70 percent.

Download the Playbook

The Counteroffer

Weekly intelligence on Anthropic pricing moves and the buyer side counters that work.

Get a Quote · Book a Strategy Call · The Counteroffer · Blog · How It Works · Pricing · LinkedIn · New York · London Not affiliated with Anthropic PBC. Independent buyer side advisory only.