Independent buyer side advisory · Anthropic onlyNew York · London
Anthropic Pricing Intelligence

How token growth outpaces seat growth.

Seat counts grow with headcount, which is slow and predictable. Token consumption grows with usage, features, and ambition, which is neither. Here is why the API bill is where Claude spend runs away from you.

Buyer side analysis · 10 min read
34%
Average reduction in Claude spend
$40M+
Anthropic commitments advised
100%
Anthropic focus, no other vendor

When enterprises budget for Claude, they tend to anchor on seats, because seats are familiar. You know your headcount, you know roughly how many people need a license, and you can forecast that line with confidence because it moves with hiring. The problem is that seats are increasingly the smaller and slower part of the bill. The part that runs away is token consumption on the API, and it grows on a completely different curve. Seats grow with people. Tokens grow with usage, with new features, with deeper integration, and with the simple fact that successful AI applications get used more over time. Understanding why these two curves diverge is the key to not being surprised by an invoice that triples while your headcount barely moves.

Two different growth curves

Seat growth is linear and bounded. You add seats as you add people who need them, and the ceiling is your headcount. A company that grows staff ten percent in a year adds roughly ten percent more seats. That is a curve finance can plan around. Token growth has no such ceiling. It is driven by how much each application is used, how many applications you build, how much context each call consumes, and how ambitious the features become. A single successful internal tool can multiply its own consumption several times over as adoption spreads and as the team adds capability. The result is that token spend routinely grows several times faster than seat spend, and the gap widens as the deployment matures.

Why tokens compound

Token consumption compounds for reasons that have nothing to do with headcount. Adoption is the first. A tool that a few people pilot becomes a tool the whole department relies on, and usage rises with it. Feature depth is the second. Early versions send short prompts and get short answers, while mature versions send long context, reference documents, conversation history, and tool outputs, each of which adds tokens to every call. Ambition is the third. As teams trust the model, they push it at harder problems that need more context and longer reasoning. And the number of applications is the fourth. The first Claude use case proves the value, and a dozen more follow, each adding its own consumption. None of these track headcount, and all of them push the token curve up steeply.

The context window multiplier

One driver deserves special attention because it is so easy to miss. The size of the context you send on each call has a direct, multiplying effect on cost, and it tends to grow silently. A workload that begins by sending a short prompt may, over a few iterations, end up sending a long system prompt, several retrieved documents, and a full conversation history on every single call. Each addition feels reasonable in isolation, but together they can multiply the per call token count many times over, and that multiplier applies to every request the workload makes. Token spend can therefore rise sharply even when the number of requests is flat, simply because each request got heavier. This is why watching tokens per call matters as much as watching call volume.

Why this breaks naive budgets

A budget built on seats plus a flat assumption about API spend breaks because it models the wrong driver. Finance plans the seat line accurately and then bolts on a token estimate that assumes roughly steady usage, and within a couple of quarters the token line has outgrown the estimate by a wide margin. The failure is not that anyone forecast badly, it is that they forecast token spend as if it behaved like seats, linear and headcount bound, when it behaves like usage, compounding and unbounded. The fix is to forecast the two lines separately and to model the token line on the drivers that actually move it, adoption, context size, feature depth, and the number of workloads, rather than on headcount.

The seat trap that hides the token bill

There is a particular failure that follows from anchoring on seats, and it is worth naming because it catches sophisticated buyers. Because seats are the familiar line, finance pours its attention into right sizing them, negotiating the seat count down to real usage, trimming idle licenses, and protecting the per seat price. That work is worthwhile, but it can absorb the whole governance budget while the token line, which is larger and growing faster, goes unmanaged. The company ends up with a beautifully optimized seat bill and a runaway API bill, having spent its energy on the smaller, slower curve. The corrective is to allocate governance attention in proportion to where the money is and where it is heading, which in a maturing deployment means the token line gets more scrutiny than the seat line, not less.

Forecasting the token line on its real drivers

If tokens grow on usage rather than headcount, then the forecast has to model usage. That means projecting the drivers directly: how adoption of each workload is trending, how the context size per call is changing as features deepen, how many new workloads are likely to ship, and how ambitious those workloads will be. A forecast built this way looks nothing like a headcount projection, because it has to account for compounding rather than linear growth. The output is a range rather than a single line, because the drivers are uncertain, and the range itself is useful, since it tells finance how wide the band of plausible outcomes is. A buyer who forecasts the token line on its real drivers is rarely surprised by an invoice, because the surprise was already inside the range they planned for.

Watch tokens per call, not just call volume

One discipline deserves to be a standing metric: tokens per call, tracked per workload. Because context size grows silently, a workload can double or triple its cost without any change in how often it is called, simply because each call got heavier with added system prompt, retrieved documents, and history. A team that watches only call volume will miss this entirely and be baffled when the bill rises against flat traffic. Watching tokens per call surfaces the creep early, points at the workloads where context discipline has lapsed, and often reveals an easy saving, because much of the added context turns out to be unnecessary or cacheable. It is one of the cleanest early warning signs that the token curve is steepening for reasons other than genuine growth.

What it means for the commitment

The divergence of the two curves has a direct commercial consequence. A committed spend negotiated against today's token usage will be wrong within a year if the usage is on a steep curve, and wrong in a way that matters, because commitment terms are where money is locked in. Commit too low against a rising curve and you face overage, often at a worse rate than the committed one. Commit too high to be safe and you risk unused commitment, which in most agreements is simply lost. The way through is to forecast the token curve honestly, size the commit to a realistic projection rather than to last quarter, and structure the agreement so that growth past the commit is priced at the committed rate rather than penalized. Getting the shape of the commit right depends entirely on understanding that tokens, not seats, are the line that moves.

Optimize before you extrapolate

There is one more step that changes the whole picture. A steep token curve is partly real growth and partly waste, and the two should be separated before any commitment is sized. Routing work to the cheapest capable model, caching stable context, moving async work to batch, and controlling output length can take a large share off the curve, often enough to flatten it materially. The right sequence is to optimize first, establish a clean lower baseline, project the growth from there, and only then commit. A commit sized to an unoptimized, steeply rising curve locks in waste. A commit sized to an optimized, realistically projected curve is the one that holds up.

Where this fits

Understanding the token curve is core to reading your Claude costs correctly. For the rate bands, commit mechanics, and forecasting detail behind it, read the pillar guide on Anthropic and Claude pricing in 2026, and bring us your usage so we can project the curve and size the commitment to where you are actually heading.

Want to forecast where your token bill is heading?

Download the pricing playbook for the forecasting model, or bring us your usage and we will project the curve before it surprises you.

Download the playbook
Get started
Tell us what you are negotiating.

The Counteroffer

Weekly intelligence on Anthropic pricing moves and the buyer side counters that work.

Get a Quote · Book a Strategy Call · The Counteroffer · New York · London Not affiliated with Anthropic PBC. Independent buyer side advisory only.