Blog · Claude API Commitment

Middle of funnel · Commercial investigation

Forecasting token consumption before you commit.

A committed spend number is only as good as the forecast behind it. Commit too low and you leave the discount on the table. Commit too high and you pay for tokens you never use. This is the buyer side method we use to model Claude consumption, build the right buffer, and turn a defensible forecast into leverage at the table.

By Fredrik Filipsson · Published May 29, 2026 · Updated June 12, 2026

Every committed Anthropic agreement rests on a single number that you are asked to promise before you can fully know it. How many tokens will your organization actually consume over the next year, or the next three. Get that number right and the commitment earns you a real discount on spend you would have made anyway. Get it wrong in either direction and you pay for the mistake, either through forfeited unused commitment or through a discount you failed to unlock. Forecasting consumption is therefore not a finance exercise you do after the negotiation. It is the foundation that the whole negotiation stands on, and it deserves to be built carefully.

Why the forecast is the hardest part

Token consumption is difficult to forecast because it is driven by variables that interact. The number of requests your applications make, the size of the prompts and the context attached to them, the length of the responses, the models you route to, and the efficiency techniques you have or have not deployed all multiply together. A small change in any one of them moves the total meaningfully. A product that doubles its user base while also lengthening its average prompt does not double its token spend, it more than doubles it, and a forecast that treats usage as a single growth rate will miss that.

The vendor has an interest in your forecast being high, because a higher forecast supports a higher commitment and a larger booked number. That does not make the account team dishonest, but it does mean the optimistic projection will rarely be challenged from their side. The discipline has to come from you. A forecast you can defend line by line is worth far more than one that simply matches the number the account team would like to see.

Start from measured usage, not estimates

The strongest forecast begins with real data. If you are already running on Claude, even at a small scale, your usage logs are the most valuable input you have. They tell you the actual distribution of prompt sizes, the real ratio of input to output tokens, the models your traffic actually hits, and the variance from day to day and month to month. A forecast grounded in measured usage is credible in a way that a top down estimate never is, and credibility is leverage when you sit across from the account team.

If you are not yet running at scale, build the forecast from a representative pilot rather than from assumptions. Run the real workloads through the real models for long enough to capture the patterns, then extrapolate from observed behavior. A two week pilot that measures actual token consumption per transaction is worth more than a spreadsheet of guesses, because it anchors every later projection to something you have actually seen.

The most defensible forecast is the one built from your own logs. Measured tokens per transaction, the real input to output ratio, and observed variance beat any top down estimate, and they hold up when the account team pushes back.

Break consumption into its drivers

A useful forecast does not project a single total. It models the drivers separately and lets them combine. The cleanest way to do this is to express consumption as a small number of components you can each estimate and defend.

Transaction volume. How many requests per day, and how is that volume expected to grow over the term.
Tokens per transaction. The average input tokens, including any context and system prompt, plus the average output tokens, kept separate because output tends to cost several times more than input.
Model mix. The share of traffic routed to Opus, to Sonnet, and to Haiku, since the per token rate differs sharply between them.
Efficiency factor. The effect of prompt caching, batch processing, and prompt design, which can reduce billed tokens substantially without reducing real work.

Modeling these separately lets you see which lever actually drives your spend. For most workloads the output token volume and the model mix dominate, which is why a forecast that ignores them and projects a flat cost per request is almost always wrong. It also lets you run scenarios, because you can flex one driver at a time and see how the total responds.

Account for the efficiency you have not built yet

A common forecasting mistake is to project your current, unoptimized consumption forward and commit to it. That locks in spend that better engineering could have avoided. Before you set a commitment, ask what your consumption would look like after the optimization work you can realistically do. Routing the right share of traffic to Sonnet and Haiku instead of running everything on Opus can cut aggregate spend substantially. Prompt caching can remove a large fraction of repeated input cost on workloads with stable context. Batch processing can halve the cost of work that does not need an immediate response.

The point is not to assume perfect efficiency you will never reach. It is to forecast the consumption you will actually have after the improvements you intend to make, so you commit to the efficient number rather than the wasteful one. Committing to your unoptimized usage and then optimizing means you spend the term paying for a commitment your own engineering has made unnecessary, which is the opposite of what the discount was supposed to buy you.

Build the right buffer

No forecast is exact, so the question is how much margin to leave. The instinct to commit to your expected number exactly is dangerous in both directions. Commit at the mean and roughly half your outcomes leave you short of the commitment, forfeiting the unused portion. Commit at the optimistic high and you risk overcommitting to spend that never arrives. The right buffer depends on how the agreement handles the edges.

If you have negotiated overage at your committed rate, you can commit conservatively, below your expected usage, and let overage cover the upside without losing the discount on it. This is usually the strongest structure, because it protects you on both sides. If overage is billed at list, you face pressure to commit higher to protect the rate on your growth, which raises the risk of forfeiting unused commitment. The buffer and the overage terms are two halves of the same decision, and they should be negotiated together rather than separately.

Model the term, not just the year

A multi year commitment compounds the forecasting challenge, because you are projecting consumption further into a future you can see less clearly. The right approach is to phase the commitment with a ramp, where the committed number rises over the term in line with your expected adoption curve rather than sitting flat at a level you only reach in the final year. A ramp protects you from committing to mature usage in an immature year, and it is one of the most reasonable structures to ask for. The account team books the growth they want, and you avoid paying for it before it arrives.

When you model the term, run at least three scenarios. A conservative case where growth disappoints, a base case that matches your best estimate, and an aggressive case where adoption exceeds plan. Each scenario should produce a commitment recommendation, and the gap between them tells you how much uncertainty you are carrying. If the conservative and aggressive cases are far apart, that is a signal to commit lower and rely on overage protection rather than to bet on the optimistic path.

Turn the forecast into leverage

A defensible forecast is not only a planning tool. It is a negotiating instrument. When you arrive with a forecast built from measured usage, broken into drivers, adjusted for efficiency, and tested across scenarios, you change the character of the conversation. The account team is used to buyers who accept the proposed commitment number. A buyer who arrives with their own model, who can explain exactly where their consumption comes from and how confident they are in it, negotiates from a position the vendor rarely sees.

That position lets you do several things. You can commit at the level your forecast actually supports rather than the level proposed to you. You can ask for the band threshold that your efficient number sits just below, and decide whether crossing it is worth a modest increase. You can insist on overage at the committed rate, because your forecast shows exactly why you need it. And you can resist pressure to overcommit, because you can show the work behind a lower number. The forecast converts your uncertainty into a controlled, evidenced position instead of a guess the vendor can shape.

Where the forecast fits the wider deal

Forecasting is the front end of a committed agreement, but it connects to everything that follows. The number you forecast feeds the commitment band you target, the overage terms you need, the buffer you build, and the price protections that defend the rate. Treating the forecast in isolation, as a finance task separate from the negotiation, is how buyers end up with a precise number attached to a poorly structured deal. The forecast should be built with the negotiation in mind from the start.

Our Claude API commitment guide lays out how the forecast connects to the band, the overage treatment, the unused commitment terms, and the protections that hold the rate. The forecast is where it begins, but the value comes from carrying that defensible number through the whole structure of the deal rather than handing it over and accepting whatever commitment the account team builds around it.

The variance you have to plan for

An average is a dangerous thing to commit against, because consumption is rarely smooth. Real workloads have spikes, seasonal peaks, launch surges, and quiet periods, and a commitment sized to the average can be comfortably met in a busy month and badly missed in a slow one. The forecast has to capture not just the central estimate but the shape of the distribution around it. A workload that averages a given level of consumption but swings widely month to month carries more risk than one that holds steady at the same average, and the commitment structure should reflect that.

The practical response to variance is to commit against a level you clear even in a normal month, not the level you reach in a peak, and to let overage at the committed rate absorb the busy periods. This keeps you from forfeiting unused commitment in the quiet stretches while still capturing the discount on the peaks. A forecast that ignores variance and commits to the average leaves you exposed in exactly the months when usage dips, which are also the months when an unused commitment hurts most.

Keep the forecast alive after you sign

A forecast is not a document you produce once for the negotiation and file away. The most valuable thing you can do after signing is to track actual consumption against the forecast continuously, so you know early whether you are tracking ahead, behind, or on plan. A commitment that is running short with months still on the clock can sometimes be addressed through usage shifts or a mid term conversation, but only if you see it coming. A commitment you discover you have missed at the end of the term offers no room to react. Live tracking turns the forecast from a one time bet into a managed position.

It also compounds into your next negotiation. The buyer who tracked actuals against forecast for a full term arrives at renewal with the most credible evidence anyone can bring: a demonstrated ability to predict their own consumption. That track record changes how the account team treats your numbers. It is far harder to push a buyer toward a higher commitment when that buyer can show, with their own data, exactly where their usage has gone and where it is heading. The forecast you maintain becomes the foundation of every commitment that follows.

Your Anthropic number is negotiable.

Get a quote for a bounded engagement. Fixed fee or gainshare, no risk to you.

Get a Quote

The Counteroffer

Weekly intelligence on Anthropic pricing moves and the buyer side counters that work.

Get a Quote · Book a Strategy Call · The Counteroffer · Blog · How It Works · Pricing · LinkedIn · New York · London Not affiliated with Anthropic PBC. Independent buyer side advisory only.

Forecasting token consumption before you commit.

Why the forecast is the hardest part

Start from measured usage, not estimates

Break consumption into its drivers

Account for the efficiency you have not built yet

Build the right buffer

Model the term, not just the year

Turn the forecast into leverage

Where the forecast fits the wider deal

The variance you have to plan for

Keep the forecast alive after you sign

Related reading

Your Anthropic number is negotiable.

The Counteroffer