Measuring the Real Saving From Batch

Everyone quotes the same figure for batch: fifty percent off. It is the right starting point and the wrong number to commit against, because the headline rate is a per request discount and your bill is not made of one request. The saving you actually capture is the per request discount multiplied by the share of your volume that genuinely moves to batch, set against what that volume would truly have cost on the real time path. Two companies with identical Claude bills can capture wildly different batch savings depending on how much of their work is asynchronous, and the only way to know your number is to measure it rather than assume the headline applies across the board.

The formula that matters

The real saving has three inputs and they are all knowable. The first is your batch eligible share, the fraction of your total token volume that runs on work no user is waiting on. The second is the per request discount, which is roughly half. The third is the real time baseline, the cost that eligible volume would have carried on the live path. Multiply the eligible share by the baseline by the discount and you have the saving you actually capture. The headline fifty percent only equals your total saving in the impossible case where one hundred percent of your volume is asynchronous. For real businesses the eligible share is the variable that moves the answer, and it is the number worth measuring first.

Batch eligible share: the fraction of volume where no user waits on the result this instant.
Per request discount: roughly half the real time rate, applied only to the eligible share.
Real time baseline: what that eligible volume would have cost on the live path.
Real saving: eligible share multiplied by baseline multiplied by the discount.

Measuring your eligible share honestly

The eligible share is where most teams either over or under count, and both errors are expensive. Over counting moves interactive work to batch and breaks a user experience to chase a discount that workload never qualified for. Under counting leaves genuinely asynchronous work on the real time path out of habit and pays the premium for speed nobody uses. The honest measure comes from applying one question to every workload: is anyone actually waiting on this result the instant it is produced? Tag each stream of volume against that single test and the eligible share falls out of the tagging. In practice the answer surprises teams in the upward direction, because a large amount of work that runs real time was only built that way and was never truly interactive.

Separating saving from shifting

A common measurement error is counting a number that moved without checking whether it actually fell. If you move volume to batch but that volume grows over the same period, the unit rate dropped while the absolute bill rose, and a naive before and after comparison hides the win inside the growth. The clean way to measure is per unit, not in total. Track the blended cost per million tokens before and after the batch shift, holding volume constant in the comparison, and the real saving shows up as a drop in the unit rate that growth cannot disguise. This is the same discipline that protects you at renewal, because a unit rate is defensible in a way that a total never is.

Why the measured number drives the deal

The reason to measure precisely rather than quote the headline is that the measured number is what you commit against. Your committed spend to Anthropic should reflect the real optimized cost of your workloads, and a business that has moved its eligible share to batch has a materially lower true cost base than the same business before the shift. Commit against the unoptimized bill and you lock the real time premium into the term and pay it whether or not the work needed it. Commit against the measured, batch adjusted number and you carry the saving into the contract instead of stranding it. The measurement is not an accounting exercise, it is the evidence that sets the commitment at the right level and defends it when the renewal pressure arrives.

Batch also stacks with the other levers, so the measured saving compounds. The eligible volume can run on a cheaper model tier through routing and can reuse cached prefixes at the same time, which means the batch saving lands on top of the routing and caching savings rather than competing with them. When you measure, measure the combined effect on the unit rate, because that blended number is the one that matters to your budget and to the deal.

We do this measurement on the buyer side and price our work against it. We will quantify your batch eligible share, model the saving it captures against your real time baseline, and turn that into either a fixed fee engagement from eighteen thousand dollars or a gainshare engagement where we take a share of the verified savings and you carry zero retainer and no risk. If you want the real batch number on paper before you commit anything to Anthropic, get a quote and we will measure it with you.

Avoiding the double count and the mirage

Two measurement errors recur often enough to name. The first is the double count, where the batch saving is claimed alongside a routing saving and a caching saving as if they were independent line items summing to a total, when in fact they apply to overlapping volume and the honest combined figure is smaller than the naive sum. The fix is to measure the blended unit rate across all three levers at once rather than crediting each lever with its own slice of a total it shares. The second is the mirage, where a saving is reported because a rate fell while the bill rose on growing volume, so the business feels no benefit even though the unit economics genuinely improved. The fix is the same per unit discipline: report the cost per million tokens, not the monthly total, so the saving is visible regardless of how volume moves underneath it.

Both errors share a root: measuring totals instead of unit rates. A total mixes price and quantity into one number that hides which one moved, and for a saving that is a price effect on a growing quantity, the total is exactly the wrong lens. The unit rate isolates the price effect and makes the batch saving legible, which is why every serious measurement of an optimization saving should be expressed per unit first and only then translated into a total for the budget conversation.

Proving the number to finance and to Anthropic

A measured saving is only useful if it survives scrutiny, and it faces scrutiny twice: once from your own finance function and once across the table from Anthropic. Finance will want to know the saving is real and durable rather than a one time effect or an accounting artifact, and a per unit number with a clear before and after on comparable volume is what satisfies them. Anthropic will, reasonably, want your commitment sized to demand they consider credible, and a buyer who can show a measured, optimized unit rate negotiates from evidence rather than assertion. The same measurement serves both audiences, which is another reason to do it rigorously: the work you do to convince your CFO is the work that strengthens your position in the deal.

The practical artifact is a short, defensible model: eligible share measured from tagged volume, the per request discount applied only to that share, the real time baseline it displaces, and the resulting blended unit rate before and after. Keep it simple enough that anyone can follow it and rigorous enough that no one can dismiss it. That model is the asset, because it both proves the saving internally and anchors the commitment externally, and it is exactly the kind of thing we build on the buyer side as part of pricing an engagement against the real number rather than the headline.

Read the pillar guide

The token optimization playbook for Claude buyers →

Measuring the real saving from batch.