Token Budgets per Feature: A Practical Framework

Most companies track Claude spend at the top, as a single line that goes up. They know the total, they feel the climb, and they have no idea which feature is responsible. That is the core problem with how AI cost is managed: it is measured in aggregate and owned by nobody, so it grows without anyone making a decision to let it. The fix is a token budget per feature, a number assigned to each thing your product does with Claude, so cost becomes a design constraint at the feature level instead of a surprise at the company level. This is a practical buyer side framework for setting those budgets and holding to them.

We negotiate Claude contracts for enterprise buyers and optimize the spend underneath them. The single biggest difference between companies that control their Claude cost and companies that do not is whether they budget at the feature level. The ones that do can answer what each feature costs, justify it against the value it returns, and catch a runaway before it reaches the invoice. The ones that do not are perpetually surprised.

Why the unit is the feature

Cost has to be owned somewhere specific enough to act on, and the feature is that unit. A feature has an owner, a purpose, and a measurable value, which means its cost can be judged against something. The company total cannot: it is the sum of every decision and the responsibility of no single person. By pushing the budget down to the feature, you give cost an owner who can see the tradeoff between what the feature spends and what it returns, and who has the authority to change the feature if the math is wrong.

This also matches how cost is actually created. Tokens are consumed by features, one call at a time, shaped by how that feature prompts, which model it uses, how long its responses run, and how often it is invoked. Every one of those is a feature level decision. Budgeting at the feature aligns the number with the decisions that move it, which is the only level at which a budget can do real work.

Cost you measure only in aggregate is cost nobody owns. Push the budget down to the feature and it gets an owner who can change it.

Setting the budget: cost per successful action

The right way to express a feature budget is not total monthly spend but cost per successful action. What does it cost, in tokens, for this feature to do its job once? A support feature has a cost per resolved ticket. A summarization feature has a cost per document. A classification feature has a cost per item classified. Expressing the budget this way makes it independent of volume, so growth does not break the budget, and it ties cost directly to value, because a successful action is the thing the feature exists to produce.

To set the number, measure the current cost per action, then ask what it should be given the value the action delivers. A feature that resolves a high value transaction can justify a larger per action budget than one that performs a trivial lookup. The budget is the cost per action you are willing to pay for the value that action returns, and it gives the feature owner a clear target: keep each successful action at or under this token cost.

The levers a feature owner controls

A budget is only useful if the owner has levers to hit it, and on Claude they have several powerful ones. The first is model choice. Running a feature on Opus when Sonnet or Haiku would serve is the most common reason a per action cost is too high, and routing the feature to the right model is often the single largest saving available. Reserve Opus for the actions that genuinely need its capability and the per action cost on everything else falls sharply.

The second lever is prompt caching, which returns up to 90 percent on the stable parts of a prompt, ideal for features that send the same large context on every call. The third is output discipline, capping and constraining responses so the expensive output half of each call stays lean. The fourth, for features that do not need an immediate answer, is batch processing at 50 percent of the real time rate. A feature owner with a budget and these four levers can almost always bring a per action cost down to where it belongs, and the combination typically moves aggregate spend 40 to 70 percent.

A feature budget worksheet

Define the successful action. The single unit of value the feature produces: a resolved ticket, a summarized document, a classified item.
Measure current cost per action. Total feature tokens divided by successful actions, at current rates.
Set the target. The per action cost justified by the value the action returns.
Apply the levers. Model choice, caching, output discipline, and batch, until the measured cost meets the target.
Alert on breach. Monitor cost per action and flag the moment a feature drifts above its budget.

Budgets catch runaways early

The quiet value of feature budgets is early detection. Without them, a feature whose cost per action doubles, because a prompt grew, a model was swapped up, or responses got longer, is invisible until the monthly total climbs enough to notice, by which point you have paid for weeks of the regression. With a per action budget and an alert on breach, the same regression surfaces the day it happens, while it is cheap to fix. Budgets turn cost control from a quarterly autopsy into a daily signal.

Free download

The Token Optimization Field Guide

Feature budgets work best inside a full optimization program. Our field guide covers model routing, caching, output discipline, and batch, with the numbers that show what each lever returns.

Get the Token Optimization Field Guide

From feature budgets to a credible commitment

Feature budgets do not just control spend. They build the forecast you take into an Anthropic negotiation. When every feature has a cost per action and a projected volume, the sum is a bottom up forecast of your total Claude consumption, grounded in real units rather than a guess. That forecast is the strongest possible basis for sizing a commitment, because it is defensible feature by feature and it reflects optimized consumption rather than today's waste.

This matters because the alternative, forecasting from aggregate history and padding for safety, leads straight to an oversized commitment, and unused commitment on Anthropic generally does not roll over. A buyer who walks in with feature level budgets can size the commitment to a number they can defend and will actually consume, which is both cheaper and stronger at the table. The discipline that controls spend internally also produces the leverage that controls price externally.

A worked example of a feature budget

Take a support assistant that drafts replies to customer tickets. The successful action is a drafted reply that an agent sends with minimal editing. Measure the current cost: total tokens the feature consumes divided by the number of replies actually used. Suppose that comes out higher than you expected, because the feature runs on Opus, sends the entire knowledge base as context on every call, and returns a long reply with reasoning the agent skims past. Each of those is a lever waiting to be pulled.

Route the drafting to Sonnet, which handles the task well, and the rate on every call drops. Cache the knowledge base context, which is identical across calls, and the repeated input falls by up to ninety percent. Constrain the output to the reply itself without the reasoning narrative, and the expensive output half shrinks. The cost per drafted reply falls to a fraction of where it started, and now it sits under a budget you set against the value of a resolved ticket. The feature owner can see the number, defend it, and hold it, because the budget is expressed in the unit the feature actually produces.

The discipline scales because every feature gets the same treatment. Each has a defined successful action, a measured cost per action, a target justified by value, and the same four levers to hit it. The company total stops being a mysterious climbing line and becomes the sum of numbers that each have an owner and a justification, which is the only state from which cost can actually be controlled.

Making the framework stick

A budget that lives in a spreadsheet nobody opens is not a budget. For the framework to hold, the cost per action has to be visible to the feature owner continuously, the breach alert has to reach them quickly, and the budget has to be revisited when the feature's value or volume changes. The goal is not a one time exercise but a standing discipline, where every feature carries a cost target the way it carries a latency target or an error budget, as a normal part of how it is built and run.

The companies that get this right treat Claude cost as an engineering metric, not just a finance line. Engineering owns the levers, so engineering owns the budget, with finance providing the value side of the equation. That shared ownership, an engineering leader holding the per action cost and a finance or procurement leader holding the value it returns, is what turns a framework on paper into spend that actually stays where you put it.

The buyer side summary

Token budgets per feature move cost control from the company total, which nobody owns, to the feature, which someone does. Express each budget as a cost per successful action, set it against the value the action returns, and give the owner the levers to hit it: model choice, caching, output discipline, and batch. Alert on breach so runaways surface early. Then roll the feature budgets up into a bottom up forecast that sizes your Anthropic commitment honestly. The result is spend you can explain, control, and defend, both in your own product and at the negotiating table.

For the full set of levers behind every feature budget, the Token Optimization Field Guide shows what each one returns and how they combine.