Independent buyer side advisory · Anthropic onlyNew York · London
Token Optimization

Token optimization before you sign the commit.

Optimize the workload first, then size the commitment. Do it in that order and you sign a smaller, truer number with Anthropic, and you keep the savings instead of committing them away.

Buyer side analysis · 11 min read
34%
Average reduction in Claude spend
$40M+
Anthropic commitments advised
100%
Anthropic focus, no other vendor

There is a sequence to a good Anthropic deal, and most buyers get it backward. They negotiate the commitment first, against their current spend, and tell themselves they will optimize the workload later to make the most of it. That order quietly costs them a great deal of money. The right sequence is the opposite: optimize the workload first, measure what your consumption actually becomes, and only then size the commitment to that lower, truer number. This article is about why the order matters so much, and what optimizing before you sign actually looks like in practice.

The number you commit to should be your real cost

A committed spend is a promise to consume a level of usage over a term. On most Anthropic agreements, the commitment you do not use does not roll forward and is not refunded, so it should reflect the consumption you will actually have, not the consumption you have today before any savings. The problem is that your consumption today is almost always inflated by waste: tasks running on Opus that a cheaper model would handle, repeated context that is not cached, asynchronous jobs running in real time, verbose output nobody reads. If you commit against that inflated number, you commit to a level you will never reach once you optimize. The gap becomes dead capacity you paid for and cannot use.

Optimization can move the number a long way

This matters because the savings available are large. Routing each task to the cheapest model that handles it well, caching stable context at up to ninety percent off, moving asynchronous jobs to batch at fifty percent off, and trimming output to what the system actually consumes can together cut aggregate spend by forty to seventy percent versus an unoptimized, Opus heavy baseline. A commitment sized to the unoptimized bill and a commitment sized to the optimized bill can therefore differ by a factor that turns a good deal into a bad one. Optimizing first is not a refinement at the margins. It often halves the number you need to commit.

Why optimizing after you sign loses the savings

Buyers who intend to optimize later run into a trap. Once the commitment is signed against the high number, the savings from optimization do not flow to them. They flow into the gap between the commitment and their now lower consumption, and that gap is the capacity they already paid for and will lose. In effect, they committed their savings away before they captured them. The optimization still improves the engineering, but the financial benefit is trapped inside a commitment that is now too big. The only way to keep the savings is to optimize before the number is fixed, so the lower consumption shows up as a lower commitment.

What to optimize before you sign

  • Model mix: route each task to the cheapest model that meets its quality bar, reserving Opus for the work that needs it.
  • Caching: restructure prompts so stable context caches at up to ninety percent off instead of paying full price on every call.
  • Batch: move asynchronous jobs that do not need an immediate response to batch at fifty percent off.
  • Output: trim verbose responses and cap output length so you stop paying the premium output rate for content nobody reads.
  • Retries and waste: reduce avoidable retries and redundant calls that inflate consumption without adding value.

You do not need to finish optimizing to benefit

A reasonable objection is that full optimization takes time, and the deal has a deadline. The answer is that you do not need to have shipped every optimization before you sign. You need to have modeled the optimized state credibly. If you can show, with a sample of real traffic, that a workload will move to a cheaper model without quality loss, you can size the commitment to that future state with confidence and implement the change after signing. The discipline is to base the commitment on where your consumption is heading under a concrete optimization plan, not on where it sits today and not on a vague intention to improve later.

The audit that produces the number

What makes this concrete is an audit of the workload before the negotiation. The audit answers a few questions: which tasks run on which model and which could move down, how much of the context is repeated and cacheable, which jobs are asynchronous and could batch, and where output is longer than the downstream system needs. Each answer produces an estimate of the saving and a plan to capture it. Add them up and you have your optimized consumption, the floor you will commit to. The audit turns optimization from a hope into a number you can defend to your CFO and use to size the commitment.

How the optimized number changes the negotiation

Walking into a negotiation with an optimized consumption figure changes your position in three ways. First, you commit to a smaller number, which directly reduces your exposure to unused commitment. Second, your effective cost per unit of work is already lower because you are buying cheaper work, independent of any discount Anthropic grants, so the discount becomes a bonus on top of a saving you already secured. Third, a smaller, more accurate commit is easier to defend internally and harder for the account team to inflate, because it is grounded in a workload analysis rather than a projection the seller helped you build. Optimization is leverage as much as it is saving.

Pair the optimized commit with protective terms

Optimizing first works best when the commitment is also wrapped in the right terms. Size the commit to your optimized floor, then protect the overage rate so that growth above the commit is charged at your committed rate rather than a penalty. Ask how unused commitment is treated and push for rollover if it is available. Use a ramp if your usage is climbing, so the commitment steps up with your adoption rather than starting at its peak. These terms and the optimized sizing reinforce each other: a small, accurate commit with a protected overage rate and a ramp is a contract that cannot punish you for optimizing or for growing.

The common counterargument, answered

Sometimes the account team will encourage a larger commitment by pointing to a deeper discount band. The optimized buyer can evaluate that offer honestly rather than emotionally. The question is always whether the discount on the spend you will actually use exceeds the value of the commitment you would lose to nonuse. In most cases we model, reaching for a higher band by committing past your optimized floor loses more to dead capacity than it gains in rate. The optimized number is what lets you make that comparison with real figures instead of being talked into a band by the promise of a better headline rate.

The metrics that prove the optimized number

To size a commit to your optimized floor, you need a small set of metrics that turn the optimization plan into a defensible figure. The first is your model mix today and your target mix after routing, expressed as the share of spend on each model, because shifting that mix is usually the largest single saving. The second is your cache hit rate today and the rate you expect after restructuring the high volume prompts, because the gap is money you are currently leaving on the table. The third is the share of your workload that is asynchronous and could move to batch at fifty percent off. The fourth is your average output length against what the downstream systems actually consume. Each metric produces a saving estimate, and the sum of the estimates is the difference between your current bill and your optimized floor. With those numbers in hand, the commit you propose is not a guess. It is a figure you can defend line by line to your CFO and hold against the account team.

Who needs to be in the room

Optimizing before you sign is partly a sequencing problem and partly an organizational one, because the people who control consumption and the people who sign the contract are usually not the same. The engineering team owns the model mix, the caching, the batch decisions, and the output discipline. Procurement and finance own the commitment and the negotiation. If those two groups work in sequence rather than together, the contract gets signed against the bill engineering has not yet optimized, and the savings are lost into unused commitment. The fix is to bring them into one conversation before the deal is sized, so the optimization plan and the commitment are built together. The engineering leader and the procurement leader should agree on the optimized floor as a single number, and that number becomes the commit. When the two functions align early, the deal reflects the company's real, optimized consumption rather than two disconnected views of it.

The cost of doing it in the wrong order, quantified

It is worth being concrete about what the wrong order costs. Suppose a buyer's current unoptimized spend runs at a level that places them in an attractive discount band, and they commit there. Then they optimize, and routing, caching, batch, and output control bring their real consumption down by half. They are now consuming far below their commitment, and on a standard agreement the unused half disappears at period end. The discount they negotiated applies only to the spend they actually use, so the realized cost of their optimized work plus the wasted commitment can easily exceed what they would have paid on a smaller, accurate commit at a shallower discount. The optimization improved the engineering and worsened the contract, because the contract was fixed before the optimization landed. Reversing the order, optimizing first, would have let the same savings flow to a smaller commitment and into the buyer's pocket. The order is not a detail. It is the difference between capturing the savings and committing them away.

Your Anthropic number is negotiable.

Get a quote for a bounded engagement. Fixed fee or gainshare, no risk to you.

Get a Quote

The Counteroffer

Weekly intelligence on Anthropic pricing moves and the buyer side counters that work.

Get a Quote · Book a Strategy Call · The Counteroffer · Blog · New York · London Not affiliated with Anthropic PBC. Independent buyer side advisory only.