Blog · Token Optimization

Middle of funnel · Commercial investigation

The engineering review that pays for itself.

A focused review of how your team actually calls Claude almost always uncovers more savings than its cost, often many times over. The work is not glamorous, but it is the single highest return day on any AI spend program. Here is what the review looks for, how we size what it finds, and why it funds the rest of the optimization with room to spare.

By Fredrik Filipsson · Published May 29, 2026 · Updated June 12, 2026

Most enterprises that run a serious volume of Claude traffic have never had anyone look closely at how that traffic is constructed. The integration was built once, it worked, and the invoice grew quietly underneath it. By the time someone notices the number, the assumption is that the spend reflects the value being created, so the only lever anyone reaches for is the contract. The contract matters, and we negotiate it hard. But the spend that flows into that contract is usually inflated by avoidable waste in the code, and the fastest way to find that waste is a structured engineering review.

The review is not an audit in the punitive sense. It is a guided read of how your application talks to the model, conducted by people who know exactly where the money leaks. Because the levers on Claude are well understood and consistent, an experienced reviewer knows what to look for and where it tends to hide. The result is a ranked list of changes, each with an estimated saving and an estimated effort, so your engineers know precisely what to do first and what it is worth. In our experience the review repays its own cost within the first one or two changes it surfaces, and everything after that is margin.

Why the spend is almost always higher than it needs to be

There is nothing careless about a team whose Claude spend has drifted high. The drift is structural. Engineers build for correctness and speed of delivery, which is the right instinct, and cost is rarely a first class concern when the feature is being shipped. The natural choices that get a feature working, calling the strongest model, sending the whole context every time, letting the output run long, treating every call as real time, are all individually reasonable and collectively expensive. None of them are bugs. They are defaults that were never revisited once the volume grew.

This is exactly why an outside review is so productive. The team is too close to the code, and too busy shipping the next thing, to question choices that are working. A reviewer whose only job is to find cost arrives without those blind spots and works through the call paths asking a different question at every step. Not does this work, but does this cost more than it has to. That single shift in question is where the saving comes from.

The review does not assume your team did anything wrong. It assumes the defaults that ship a working feature are rarely the defaults that minimize its cost, and those two things drift apart silently as volume grows.

What the review actually looks at

A Claude engineering review walks the path that every request takes, from the moment it is assembled to the moment its response is consumed, and inspects each stage for the known sources of waste. The areas below are where the saving almost always lives.

Model selection on every call path

The first and largest question is which model is doing the work. Many applications route everything to Opus because it was the strongest model when the integration was built, even though a large share of the traffic is simple enough for Sonnet or Haiku to handle at a fraction of the price. The review identifies each distinct call path, judges how hard the task on that path actually is, and flags the ones that are paying for capability they do not use. Because model choice commonly drives forty to seventy percent of aggregate spend, this is usually where the biggest single number on the list comes from.

Prompt caching opportunities

The next question is how much of each request is the same every time. Applications that send a large stable preamble, a long system prompt, a fixed set of instructions, a static knowledge base, on every call are paying full input price repeatedly for content that never changes. Prompt caching can cut the cost of that repeated context by up to ninety percent. The review finds the call paths with large stable prefixes and estimates the cache saving on each, which on context heavy workloads is frequently the second largest item.

Batch candidates hiding in real time code

The review then asks which work is actually waited on by a person. Jobs that run on a schedule, process backlogs, or feed a later review step are usually billed at the full real time rate when the batch API would run them at roughly half. These are easy to miss because they were built synchronously out of habit. Surfacing them and moving the genuine candidates is a low risk saving that the review reliably uncovers.

Output length and verbosity

Output tokens cost several times more than input, so verbose responses are quietly expensive. The review looks at where the application asks the model for more than it needs, long explanations that get truncated downstream, formatting that is never read, restated context in the answer, and identifies where tightening the prompt and capping the output cuts cost without touching quality.

Retries, loops, and waste

Finally the review looks for the silent multipliers. Retry logic that re sends the full prompt on failure, agent loops that call the model more times than the task needs, duplicate calls from overlapping code paths, and prompts that have grown by accretion until they carry instructions no longer in use. Each of these multiplies cost without adding value, and each is invisible until someone reads the path with cost in mind.

How we turn findings into a number you can act on

A list of problems is not useful on its own. What makes the review pay for itself is that every finding comes with two estimates, the saving it would produce at your current volume, and the engineering effort to implement it. With both numbers attached, the findings sort themselves into a clear order of work. The high saving, low effort changes go first, and they are usually enough to cover the cost of the review several times before anyone touches the harder items.

We size the saving against your real traffic rather than a generic assumption. That means looking at your actual call volumes, your current model mix, and the share of your requests that carry repeated context, then projecting what each change does to the monthly number. The output is a plan a procurement leader can take to finance and an engineering leader can take to a sprint, with the same figures underpinning both. That shared, grounded view is what turns a review into action rather than a report that sits unread.

Why this is the highest return day on the program

It is worth being plain about the economics, because they are unusually favorable. The review is a bounded piece of work with a known cost. What it finds is recurring saving that compounds every month for as long as the application runs at volume. A change that takes an engineer a day and removes a few thousand dollars from the monthly invoice has paid for the entire review within weeks and keeps paying indefinitely. There are very few activities in an enterprise where the return is that direct and that durable.

The review also strengthens everything downstream. When you go into a contract negotiation having already optimized the underlying spend, you are negotiating a commit you can defend rather than one inflated by waste. You are not asking Anthropic to subsidize an inefficient integration. You are buying exactly what you need at the best rate, which is a far stronger position and usually a far better outcome. The review and the negotiation reinforce each other, and doing the review first is what makes the negotiation honest.

What it looks like to run one

A review is deliberately light on your team's time. We need read access to how the integration is built and a view of your recent usage, and a short series of conversations with the engineers who own each call path. From there the analysis happens off your team's critical path, and the deliverable is the ranked list with savings and effort attached. Your engineers implement the changes on their own schedule, starting with the ones that pay back fastest, and we are available to confirm the saving as each one lands. Nothing about it requires pausing the roadmap or handing over control of the code.

The mistake of skipping it

The most common error we see is going straight to the contract while leaving the spend untouched. A renegotiated rate on an inflated baseline still leaves you paying for waste, just at a slightly better unit price. The far stronger sequence is to review the engineering first, remove the avoidable spend, and then negotiate the commit you actually need. The review is what makes the rest of the program land on a clean foundation, and because it funds itself, there is no good reason to defer it.

Where this fits the wider playbook

The engineering review is the diagnostic that opens a full Claude cost program. It tells you which of the levers, model routing, prompt caching, batch, output discipline, will move the most money in your specific application, so the work is sequenced by return rather than by guesswork. Our token optimization playbook sets out each of those levers in detail and how they combine, and the review is how you find out which ones matter most for you before you spend a single engineering hour.

The takeaway

A structured engineering review reads your Claude integration with one question that your own team is too close to ask, does this cost more than it has to, and answers it with a ranked, sized plan of changes. Because the levers are well understood and the saving recurs every month, the review repays its cost almost immediately and funds the rest of the optimization with room to spare. If you run meaningful volume on Claude and no one has looked at how you call it, the review is the first thing to do and very likely the most profitable day on the whole program. Book a strategy call and we will scope one against your traffic.

Find the saving before you negotiate the rate.

We read your Claude integration with cost in mind and hand back a ranked plan with savings and effort attached. Book a strategy call to scope a review against your traffic.

Book a Strategy Call

Start here

Get the spend in your favor.

The Counteroffer

Weekly intelligence on Anthropic pricing moves and the buyer side counters that work.

Get a Quote · Book a Strategy Call · The Counteroffer · Blog · How It Works · Pricing · LinkedIn · New York · London Not affiliated with Anthropic PBC. Independent buyer side advisory only.

The engineering review that pays for itself.

Why the spend is almost always higher than it needs to be

What the review actually looks at

Model selection on every call path

Prompt caching opportunities

Batch candidates hiding in real time code

Output length and verbosity

Retries, loops, and waste

How we turn findings into a number you can act on

Why this is the highest return day on the program

What it looks like to run one

The mistake of skipping it

Where this fits the wider playbook

The takeaway

Related reading

Find the saving before you negotiate the rate.

The Counteroffer