Independent buyer side advisory · Anthropic onlyNew York · London
Committed Spend Math

Confidence intervals on token forecasts.

A single number on a slide looks decisive and hides everything that matters. When you forecast Claude token usage for a committed deal, the honest output is a range with a probability attached. Here is how to build that range, and how to commit against it without paying for headroom you will never use.

Buyer side guide · 9 min read
34%
Average reduction in Claude spend
$40M+
Anthropic commitments advised
100%
Anthropic focus, no other vendor

Every committed Claude deal starts with a forecast, and most forecasts start with a single number. Someone takes last quarter's token usage, applies a growth assumption, and writes down one figure for the year ahead. That number then becomes the commit, the budget line, and the thing the finance team holds you to. The problem is that the number is a guess wearing a suit. Token consumption depends on product adoption, prompt design, model choice, and a dozen behaviors you do not yet control, and pretending all of that resolves to one value is how buyers end up either overcommitted and writing off unused dollars, or undercommitted and paying overage on the part they missed. A confidence interval fixes the framing. Instead of asking what usage will be, you ask what range usage will fall inside, and how sure you are. That shift is the difference between a forecast you can defend in a negotiation and one that collapses the first time someone pushes on it.

Why a point estimate is the wrong tool

A point estimate carries no information about its own reliability. Tell an account team that you expect to consume two million dollars of Claude usage and they hear a commitment, not a midpoint. They will size the deal to that figure, and any discount they offer is priced against it. But your real distribution might be wide: a slow adoption quarter could land you at one and a half million, a viral feature could push you past three. The point estimate threw all of that away the moment it was written. Worse, the single number invites a specific failure. Teams tend to forecast optimistically because optimism is rewarded internally, then commit to the optimistic number to capture a better rate, and then spend the year explaining the shortfall. The unused commitment does not roll over on most Anthropic agreements, so the gap between your confident forecast and your actual usage is money that simply disappears. A range would have told you that risk was there before you signed.

Build the forecast from drivers, not from history alone

A credible interval is built from the things that actually move token usage, not from a trend line drawn through past invoices. Start by decomposing usage into its drivers. How many active workloads do you run, how many requests does each serve, how many input and output tokens does an average request consume, and which model handles it. Each of those is a variable with its own uncertainty. Adoption might double or stall. Average prompt length might fall as your engineers tighten templates, or rise as features get richer. Output tokens, which cost several times what input tokens cost, might balloon if responses get verbose. When you model usage as the product of these drivers rather than a single extrapolated line, you can reason about each one separately and you can see which uncertainties dominate. Usually one or two drivers carry most of the risk, and naming them is the first step to controlling them.

Run the scenarios, then attach probabilities

Once you have the drivers, build three honest scenarios. A low case where adoption is slow and optimization lands well, a base case that reflects your genuine expectation, and a high case where adoption runs hot and prompts stay heavy. Resist the urge to make the low case a disaster and the high case a fantasy. They should be plausible boundaries, the tenth and ninetieth percentiles of what you actually believe. Then attach rough probabilities. You do not need a statistics degree here. If you think there is a sixty percent chance you land in the base range, a twenty percent chance you fall short into the low range, and a twenty percent chance you overshoot, you have just built a usable distribution. The commit decision is now a question about that distribution rather than a bet on a single point, and that is a far better question to be answering when real dollars are on the table.

What the interval tells you about the commit

  • If your low case still clears the commit comfortably, you are oversized and leaving money in unused commitment that will not return.
  • If your base case sits near the top of the commit, you have built in almost no buffer and overage is likely.
  • If the range is very wide, the right move is often a phased commit ramp rather than one large number locked for the full term.
  • If one driver dominates the width, optimize or cap that driver before you size anything.

Commit on the conservative end of the range

The instinct on a committed deal is to commit high, because a larger commit usually earns a deeper discount per token. That instinct is exactly backward when your forecast is uncertain. Unused commitment is generally forfeited, so every dollar you commit above your actual usage is a dollar you paid for nothing, and no discount rate makes a forfeited dollar a good deal. The disciplined approach is to commit near the lower portion of your range, where you are highly confident the usage will materialize, and to negotiate the right to consume above the commit at the committed rate rather than at list. That way the part of the range you are sure about gets the discount, and the part you are unsure about gets handled as protected overage instead of as a pre paid bet. You capture the rate without buying the risk. The interval is what lets you draw that line in the right place.

Use the interval as negotiation leverage

A forecast expressed as a range is also a stronger position at the table. When an account team pushes you toward a larger commit, a confident range lets you answer with specifics: the low end is firm, the base is likely, and the high end is contingent on adoption you are not willing to pre pay for. That is a far more persuasive stance than either a single number you defend by stubbornness or a vague reluctance you cannot quantify. It tells the seller you understand your own consumption better than they do, which changes the tenor of the whole conversation. It also sets up the structural asks that protect you: a ramp that grows the commit as usage proves out, overage priced at the committed rate, and a midterm reforecast right if the range turns out to be wrong. Each of those is easier to win when you have already shown your homework in the form of a defensible interval.

Tighten the range before you sign

The width of your interval is not fixed. Much of it comes from drivers you can influence, and the cheapest savings often come from narrowing the forecast rather than negotiating the rate. If output token verbosity is a major source of uncertainty, tightening prompts and response formats shrinks both the expected usage and the variance around it. If model choice is uncertain, routing predictable work to Sonnet and Haiku rather than running everything on Opus pulls the whole distribution down and makes it more stable. Prompt caching on repeated context, which can cut the cost of that context by up to ninety percent, and batch processing on asynchronous work at roughly half rate, both reduce the number of expensive tokens you are forecasting in the first place. Do this work before you commit and you are sizing against a smaller, tighter, more honest distribution, which means a smaller commit, a smaller buffer, and far less money at risk of being forfeited.

Update the interval as real usage arrives

A confidence interval is not a one time exercise you complete before signing and then forget. It is a living estimate that should tighten as real consumption data comes in, and the contract should give you room to act on what you learn. In the first weeks of a deal you have the least information and the widest range, and as months of actual usage accumulate the distribution narrows around what is really happening. The buyer who keeps updating the interval can see early whether they are tracking toward the low end or the high end of their original range, and can adjust behavior, optimization effort, or the next ramp step accordingly. This is why a midterm reforecast right is so valuable: it lets you reset the commit to the interval you can now measure rather than the one you had to guess at. A forecast you revisit is worth far more than one you file away, because the whole point of expressing usage as a range was to manage it as the uncertainty resolves.

Watch the drivers that widen the range mid term

Even a well built interval can widen during the term if a driver you did not weight heavily starts to move. A new feature that ships richer responses can push output tokens up across the board. A change in how your product is used can shift the model mix toward Opus without anyone deciding to. A customer win can pull adoption forward faster than your base case assumed. The discipline is to keep watching the handful of drivers that carry most of the variance, so that a shift shows up in your numbers before it shows up in an overage invoice. Monitoring is cheap and surprise is expensive. A buyer who tracks the drivers can route the new verbose feature through a tighter prompt, cap the model that is drifting upward, or trigger the reforecast right before the gap becomes a problem. The interval told you which drivers to watch, and watching them is what keeps the forecast honest for the life of the deal.

How we handle it on the buyer side

We sit between you and Anthropic and we build the forecast as a range, not a guess. That means decomposing your usage into its real drivers, running the scenarios, and pricing the commit against the part of the distribution you are genuinely confident in, while structuring overage, ramp, and reforecast rights to cover the rest. At the same time we apply the optimization levers that narrow the forecast itself, model routing across Opus, Sonnet, and Haiku, caching at up to ninety percent on repeated context, and batch at roughly half rate, so the number you commit to is smaller and steadier than the one you would have signed alone. The result is a commitment sized to reality with the risk handled by structure rather than by hope.

If you are about to put a token forecast in front of Anthropic and you want it built as a defensible range, the playbook below walks through the consumption levers that tighten it. Our pricing is simple: a Fixed Fee from $18,000, or Gainshare which is a share of verified savings with zero retainer and no risk to you.

Your Anthropic number is negotiable.

Get a quote for a bounded engagement. Fixed fee or gainshare, no risk to you.

Get a Quote

The Counteroffer

Weekly intelligence on Anthropic pricing moves and the buyer side counters that work.

Get a Quote · Book a Strategy Call · The Counteroffer · Blog · New York · London Not affiliated with Anthropic PBC. Independent buyer side advisory only.