Independent buyer side advisory · Anthropic onlyNew York · London
Blog · Committed Spend Math
Top of funnel · Informational

Unit economics of a Claude powered feature.

If you ship a feature built on Claude, every request has a cost and, ideally, a value. Knowing the unit economics turns an unpredictable API bill into a number you can plan, defend, and use to size a sane commitment with Anthropic.

The moment a feature built on Claude reaches real users, it stops being a line item in an engineering budget and becomes a unit of production with its own economics. Each time a user triggers that feature, your application sends tokens to the model and receives tokens back, and each of those tokens has a price. Multiply by the volume of requests and you have a cost of goods that scales directly with usage. The buyers who control their Anthropic spend are the ones who treat this like any other unit cost in the business, measure it precisely, and tie it to the value the feature delivers. The ones who lose control treat the API bill as a mysterious monthly surprise.

Unit economics is simply the discipline of knowing what one unit costs and what one unit earns. For a Claude powered feature, the unit is usually a request or a completed task, and the goal is to know the cost of that unit well enough to forecast spend, price the feature if it is external, and size a committed spend deal that matches reality. This article lays out how to build that picture from the ground up.

Define the unit before you measure anything

The first decision is what counts as one unit. It should map to something meaningful in the product. For a customer support assistant, the unit might be one resolved conversation. For a document summarizer, one document processed. For a coding assistant, one task completed. The unit needs to be the thing that grows when the product succeeds, because that is the thing whose cost you most need to understand. Pick the wrong unit, like raw API calls when a single user task triggers several, and your cost per unit will mislead you.

Once the unit is defined, every cost and every value gets expressed per unit. This sounds obvious, but it is the step most teams skip. They look at total monthly spend and total monthly active users and divide, which buries all the variation that actually drives cost. A proper unit model keeps the components visible so you can see what to optimize.

The cost side of the unit

The cost of one unit on Claude breaks down into a handful of components, and each is a lever.

Input tokens

Every request carries input, which includes the system prompt, any context or retrieved documents, and the user's content. Input tokens are priced lower than output, but they are often the larger volume, especially in features that stuff a lot of context into each call. Long system prompts and large retrieved passages drive this number up on every single request.

Output tokens

Output is what the model generates, and it is the expensive part, priced several times higher than input on every Claude model. A feature that produces long responses pays for that length on every unit. Controlling output length is one of the most direct ways to cut cost per unit without touching quality where it matters.

Model choice

The same unit costs wildly different amounts depending on whether it runs on Opus, Sonnet, or Haiku. Opus is the premium model and the right tool for genuinely hard reasoning. Sonnet handles the broad middle of work at a fraction of the cost. Haiku is cheaper still and fast, and for classification, extraction, and simple responses it is often all the unit needs. Routing each unit to the cheapest model that does the job well is the single biggest swing in unit cost, and it is why aggregate spend typically falls forty to seventy percent when buyers move off uniform Opus use.

Most features do not need one model. They need a router that sends easy units to Haiku, the bulk to Sonnet, and only the genuinely hard units to Opus. The unit cost of a feature is mostly a function of this routing decision.

Caching and batch

Two mechanics change the unit cost further. Prompt caching lets you reuse a stable block of context across many requests at a steep discount, up to ninety percent off on the cached portion, which is transformative for any feature that sends the same large system prompt or document set on every call. Batch processing runs requests asynchronously at half price, which is ideal for any unit that does not need an instant response. A feature designed with these in mind has a structurally lower unit cost than one that ignores them.

The value side of the unit

Cost is only half of unit economics. The other half is what the unit is worth. For an external feature you charge for, the value is the revenue or margin per unit, and the question is whether cost per unit leaves room for a healthy margin. For an internal feature, the value is the time saved or the outcome improved, which you can usually express in money even if roughly. A unit that costs a few cents and saves an employee twenty minutes is a spectacular trade. A unit that costs a dollar and saves nothing measurable is a problem regardless of how cheap a dollar sounds.

Putting cost and value side by side is what makes the model useful for decisions. It tells you which features deserve more investment, which need their costs cut before they scale, and which should not ship at all. It also gives you the language to defend the spend internally, because a finance leader who sees cost per unit against value per unit understands the feature in terms they already use.

From unit economics to a commitment

This is where the unit model pays off in your Anthropic deal. Once you know cost per unit and you can forecast unit volume, you can forecast total spend with far more confidence than a top down guess. That forecast is the foundation of a sane committed spend deal. You commit to the spend your unit model supports, with a conservative volume assumption, rather than to a round number the account team proposed.

The unit model also protects you from a trap that catches many buyers. If you plan to drive unit cost down through routing, caching, and batch over the term, that reduction has to be built into the commitment before you sign, or your own optimization will leave you stranded above your usage. A unit model that projects both volume growth and cost per unit decline gives you a commitment number that survives contact with reality.

A credible commitment is unit cost times forecast volume, with optimization already priced in. Bring that to the table and you negotiate from data. Bring a round number and you negotiate from hope.

Keeping the model alive

Unit economics is not a one time exercise. Unit cost drifts as prompts grow, as new context gets added, as volume shifts across models, and as Anthropic updates its pricing. The features that stay economical are the ones where someone watches cost per unit as a live metric and treats a rising number as a signal to investigate. A monthly view of cost per unit by feature catches problems while they are small, long before they show up as a shocking total at renewal.

A worked unit, end to end

Walk one unit through the model to see how the components combine. Take a document summarization feature where the unit is one document processed. Each unit sends a long system prompt, the document itself, and a short instruction, then receives a summary back. In a naive build, every unit runs on a premium model, sends the full system prompt fresh each time, and produces a verbose summary. The unit cost is high, driven by expensive output, premium model pricing, and the same large system prompt paid for on every single request. Multiply by volume and the feature looks alarmingly expensive.

Now optimize the same unit. Route it to a mid tier model that summarizes perfectly well, since summarization rarely needs the most powerful reasoning. Cache the stable system prompt so it is paid for once and reused at a steep discount across every subsequent unit. Constrain the output to a tight length that delivers the summary without padding. And if the summaries do not need to be instant, run them through batch at half price. The same unit now costs a fraction of the naive version, often a reduction of well over half, with no loss of quality that a user would notice. Nothing about the feature changed except the engineering decisions around it, and those decisions are entirely within your control.

The trap of averaging

One warning sits at the center of unit economics, and it catches even careful teams. An average cost per unit hides enormous variation. A feature might average a modest cost while a small share of units, the long documents, the complex queries, the verbose edge cases, cost many times the average and quietly drive most of the bill. If you optimize against the average you will miss the units that actually matter. The discipline is to look at the distribution of unit cost, not just the mean, and to attack the expensive tail specifically. Often a handful of unit types account for the majority of spend, and fixing those is where the real money is.

This is also why a single headline cost per unit, while useful for planning, is not enough for optimization. You need to see cost per unit broken down by the components and by the type of unit, so you can tell whether a rising bill is being driven by volume, by model mix, by output length, or by a particular expensive workload. That granularity turns the unit model from a reporting exercise into a tool for action.

Your Anthropic number is negotiable.

Get a quote for a bounded engagement. Fixed fee or gainshare, no risk to you.

Get a Quote

The Counteroffer

Weekly intelligence on Anthropic pricing moves and the buyer side counters that work.

Get a Quote · Book a Strategy Call · The Counteroffer · Blog · New York · London Not affiliated with Anthropic PBC. Independent buyer side advisory only.