A Claude API commitment is a promise to spend. Promise too much and you forfeit the difference. Promise too little and you lose the discount band you could have held. We set the commit at the number your real usage supports, protect the overage rate, and stop unused commitment from quietly disappearing.
Anthropic offers better unit rates as your committed spend rises through the bands, from the 250K entry tier up through the 1M band and into the 5M plus range. The headline discount looks attractive, so buyers reach for a higher band to capture it. The trap is that the discount only pays off if you actually consume what you promised. Most enterprise contracts treat unused commitment as forfeited at the end of the term. There is no refund and usually no carryover. A 1M commit that you only burn 700K against is not a 1M deal at a discount. It is a 700K deal where you also paid 300K for nothing.
The job is to find the commit level where the unit discount and the consumption you can defend actually meet. That is a forecasting problem first and a negotiation problem second. We do both, in that order.
We translate your usage logs into a defensible consumption range across Opus, Sonnet, and Haiku, then set the commit to the floor you can stand behind, not the ceiling you hope for.
We position you at the band where the marginal discount is real, and we resist the pull toward a higher band that you cannot fill.
We negotiate overage at the committed unit rate, so usage above the commit does not snap back to list pricing and erase the discount.
We press for rollover, carryover, or a true down on unused commitment, so a soft quarter does not become forfeited money.
For new workloads we phase the commit across the term, so you are not paying for peak usage during the months you are still ramping toward it.
We lock the unit rate across the term and cap any renewal uplift, so the number you negotiate is the number you keep.
A Claude API commitment is drawn down as you consume tokens. Input tokens and output tokens bill at different rates, and output tokens typically cost several times what input tokens cost. That single fact changes how you forecast. A workload that returns long responses burns the commit far faster than its request count suggests. We model input and output separately, by model, because a commit that assumes uniform Opus use will look very different once routing sends the easy traffic to Haiku and Sonnet.
Underneath the commit sit the optimization levers that change how fast you draw it down. Prompt caching can take up to 90 percent off the cost of repeated context. Batch processing takes 50 percent off work that does not need a real time answer. Model routing across Opus, Sonnet, and Haiku typically cuts aggregate spend 40 to 70 percent versus sending everything to Opus. A commit set before those levers are in place is almost always too high. We sequence the optimization first so the commit reflects the bill you will actually run, not the one you run today.
If you commit first and optimize second, you have promised Anthropic a number you no longer need to spend, and you are back to forfeiting the difference. If you optimize first and commit second, you negotiate from a smaller, truer baseline and you keep the savings instead of handing them to the vendor as a commitment you cannot use. We always run the engineering review before we set the number.
"We were about to sign a 1M commit. They showed us our real draw down was closer to 650K once caching and routing were in. We committed to that instead and protected the overage rate."
Fixed fee or gainshare. We sit between you and Anthropic and we never take vendor money.
Get a QuoteWeekly intelligence on Anthropic pricing moves and the buyer side counters that work.