How caching and routing shrink the commit.

Buyer side guide · 13 minute read · By Morten Andersen · Published May 29, 2026 · Updated June 12, 2026

Most buyers think of caching and routing as ways to lower the monthly Claude bill, and they are. But the bigger effect, the one that changes the contract you sign, is that they shrink the committed spend you need with Anthropic in the first place. A commitment is sized against forecast consumption. If you optimize before you forecast, the number you commit against is smaller, the discount band you target is reached with less raw spend, and you carry far less risk of stranding committed dollars. This is the difference between negotiating from your optimized run rate and negotiating from your wasteful one, and it is worth several points of discount and a smaller commitment both. Here is the math, and how to put it into the deal.

The commit is a function of the forecast, and the forecast is yours to shape

Anthropic sizes a committed spend offer against your expected consumption. The account team looks at your current run rate, your growth, and your pipeline, and proposes a commitment that captures most of it, because their incentive is to lock in as much spend as they can at a discount you will accept. The number they propose is built on your consumption as it stands, which for most buyers means consumption that has never been optimized. If your application routes everything to the top model, never caches its shared context, and runs bulk jobs in real time, your run rate is inflated, and the commitment sized against it is inflated too.

This is the central insight: the forecast is not a fixed fact about your business. It is a number you can lower before anyone sizes a commitment against it. Every token you remove through optimization is a token you do not have to commit to buy. A buyer who optimizes first walks into the commitment conversation with a smaller, leaner forecast and asks Anthropic to size the deal against that. A buyer who has not optimized hands over an inflated run rate and gets locked into it for a year or more. The sequence matters enormously, and most buyers get it backwards by signing first and optimizing later, after the commitment is already set.

What routing removes from the forecast

Model routing is the largest lever, and it works by matching each request to the cheapest model that can handle it well. Most applications send everything to the top model out of caution, but the majority of real requests, classification, extraction, routing, short answers, simple drafting, are handled just as well by Sonnet or Haiku at a fraction of the cost. When you route across Opus, Sonnet, and Haiku based on the actual difficulty of each request, aggregate spend typically falls forty to seventy percent versus uniform top model use. That is not a marginal saving. It is a different forecast.

Consider what that does to the commit. Suppose your unoptimized run rate is four million dollars a year and the account team proposes a commitment around that, with a discount that improves at higher bands. If routing takes the real consumption to two million, then committing to four million means committing to twice the spend you will actually generate, and the gap becomes expiring unused commitment. Optimize first and the right commitment is sized near two million, which you reach comfortably, plus headroom for growth. You have not lost the discount. You have stopped committing to waste. The routing did not just cut the bill. It cut the size of the obligation.

What caching removes from the forecast

Prompt caching attacks a different part of the bill: the repeated input. Many workloads send the same large block of context on every request, a system prompt, a knowledge base, a long set of instructions, a codebase, and pay full input price for it every time. Caching lets Anthropic store that shared prefix and serve it at up to ninety percent off on the repeated portion. For a workload with a heavy shared prefix and high request volume, this can remove a large share of input cost, and input is often the dominant cost in retrieval heavy and agentic workloads.

For the commitment, caching matters most on your highest volume workloads, because that is where the repeated input piles up. A retrieval pipeline, a code review system, a customer assistant with a long instruction set, all of these carry shared context on every call. Caching that context shrinks the per request cost, which shrinks the forecast for the whole workload, which shrinks the commitment it contributes to. Caching and routing stack: routing lowers the model cost of each request, caching lowers the input cost of the shared portion, and together they compound into a forecast materially smaller than the unoptimized one.

What batch removes from the forecast

The third lever is batch, which applies to any workload that does not need a real time answer. Evaluation runs, bulk document processing, overnight summarization, content generation pipelines, dataset labeling, all of these can run asynchronously, and Anthropic prices batch at roughly half the rate of real time calls. For a business with meaningful asynchronous volume, moving that volume to batch removes another large slice from the forecast at no quality cost, because the work was never latency sensitive to begin with.

The commitment effect is the same as the other two levers. Batch volume sized at half rate contributes half as much to the forecast, so the commitment built on that forecast is smaller. And because batch volume is often the most predictable part of a workload, optimizing it first gives you a firmer, smaller base to commit against. The three levers together, routing, caching, and batch, are why an optimized forecast can land far below an unoptimized run rate, and why the order of operations, optimize then commit, is the single most valuable move a buyer can make before sizing a deal.

The discount band still works in your favor

A common worry is that a smaller commitment means a worse discount, because Anthropic's commit bands reward larger spend. This is true in isolation, but it misreads the trade. The discount on a commitment you overshoot is a discount on money you would not have spent, which is no saving at all. A modest discount on a right sized commitment beats a deeper discount on an inflated one, because you actually consume the right sized number and you do not strand the excess. The goal is not the deepest headline discount. It is the lowest total cost, and the lowest total cost comes from optimizing the consumption, sizing the commitment to the optimized number, and negotiating the best rate on that.

There is also a leverage benefit. When you arrive with an optimized forecast and can show the account team the levers behind it, you signal that you understand your own consumption better than most buyers do. That changes the conversation. You are no longer accepting a number sized against your waste. You are presenting a number you have engineered and asking for a fair rate on it. That posture, informed and unhurried, is exactly what earns the protections and the rate that an uninformed buyer never gets.

Sequencing the optimization and the negotiation

The practical sequence is straightforward but rarely followed. First, audit the application for routing, caching, and batch opportunities, and quantify what each would remove from the run rate. Second, build the forecast on the optimized consumption, expressed as a range across conservative, expected, and aggressive cases. Third, size the commitment near the conservative case of the optimized forecast, with a ramp for growth, a protected overage rate for the upside, and negotiated unused commitment treatment for the downside. Fourth, negotiate the rate on that structure. Each step depends on the one before it, and skipping the optimization at the front contaminates everything after it.

This is why caching and routing are not just engineering tactics. They are negotiating tools, because they change the size of the thing you are negotiating. A buyer who treats optimization and negotiation as the same project, sequenced correctly, commits to less, pays a fair rate on it, and keeps the savings instead of handing them to a discount on spend they never needed. If you want to see the routing and caching opportunity mapped against your real Claude workload before you size a single commit, that is exactly the work we do on a strategy call, and it is the fastest way to find out how much smaller your commitment could be.

Read the pillar guide

The token optimization playbook: cut Claude spend without cutting usage →

Shrink the commit before you sign it.

Book a strategy call and we will map the caching and routing opportunity against your real Claude workload before you size a single commit.

Book a Strategy Call

The Counteroffer

Weekly intelligence on Anthropic pricing moves and the buyer side counters that work.

Get a Quote · Book a Strategy Call · The Counteroffer · Blog · How It Works · Pricing · LinkedIn · New York · London Not affiliated with Anthropic PBC. Independent buyer side advisory only.