Caching for code review workloads.

Automated code review sends Claude the same codebase context over and over, paying full price each time. It is one of the cleanest prompt caching wins there is. Here is where the up to ninety percent saving lands in a review pipeline and how to structure the prompts to capture it.

By Morten Andersen · Published May 29, 2026 · Updated June 12, 2026

Code review is one of the most natural fits for prompt caching, and one of the most commonly missed. A review workload, whether it runs on pull requests, on commits, or on demand from a developer, repeatedly sends the model a large body of context that barely changes, the surrounding files, the project conventions, the review guidelines, the architectural notes the model needs to judge a change. That stable context is reprocessed and paid for in full on every single review, even though it is nearly identical each time. Caching is exactly the mechanism that removes this waste, and because review workloads carry so much repeated context, the saving is often dramatic. This piece shows where caching applies in a review pipeline and how to set it up.

Why code review is so cacheable

Caching pays when a large block of context is stable across many calls, and code review fits that description almost perfectly. The thing that changes from one review to the next is small, the specific diff or the file under review. The thing that stays the same is large, the review instructions, the coding standards, the relevant surrounding code, the project context that lets the model give a useful review rather than a generic one. So most of the tokens you send on each review are repeated, and the part that is genuinely new is a fraction of the total. That ratio, lots of stable context and a little dynamic content, is the ideal caching shape, which is why review workloads often see savings near the top of the up to ninety percent range on the cached portion.

In a review, the diff changes but the context does not. The guidelines, the conventions, and the surrounding code are the same every time, and that repeated context is what caching turns from a recurring charge into a deep discount.

What to cache in a review pipeline

The stable context in a review workload usually breaks into a few layers, all of them good caching candidates.

The review instructions. Your standard prompt that tells the model how to review, what to flag, what tone to use, and what to ignore is identical on every review and should be cached.
The coding standards and conventions. Style guides, architectural principles, security rules, and the project specific conventions the model checks against are stable and often large, making them high value to cache.
The surrounding code context. When the model needs the broader file, related modules, or interface definitions to judge a change, that context is the same across many reviews of the same area and can be cached for the window in which those reviews cluster.
Reference material. Any documentation, examples of good and bad patterns, or domain context you include to improve review quality is stable and belongs in the cached portion.

The diff itself, the comment thread, and anything specific to the individual change stay dynamic and sit after the cached block.

Structure the prompt so caching works

Capturing the saving depends on prompt structure. The cached context needs to sit at the stable front of the prompt, with the dynamic content, the specific diff or file, placed after it. A review prompt that interleaves the changing diff with the stable guidelines breaks the cacheable block and leaves savings unclaimed. The fix is to organize every review prompt the same way, with the instructions, standards, and surrounding context first as a consistent cached prefix, and the change under review last. This ordering is what lets the pipeline read the stable prefix from cache on review after review rather than reprocessing it each time.

Cluster reviews to stay in the window

Caching pays best when reuse happens within the cache window, so the timing of reviews matters. A pipeline that reviews many changes against the same area in a short period keeps the cached context warm and reads from it repeatedly. A pipeline where reviews are sparse and spread out may let the cache expire between uses, paying the write cost again each time. Where you have control, batching related reviews together or keeping the high frequency review paths warm improves the hit rate. The more reviews that read from a single cache write, the larger the net saving, so concentrating reuse is part of designing the workload well.

The double benefit for developer experience

For reviews that a developer triggers and waits on, caching does more than cut cost. Because the large stable context is not reprocessed on a cache read, the review returns faster, which directly improves the developer experience. A review tool that feels slow gets used less, so the latency improvement from caching also protects adoption. This makes caching especially worth doing on interactive review paths, where it lowers cost and speeds up the response at the same time, a combination that is rare among optimization levers.

Measure the hit rate, not the intention

As with any caching, the real saving comes down to the cache hit rate in production, the share of reviews that successfully read the stable context from cache rather than rewriting it. A pipeline that looks well structured can still achieve a low hit rate if reviews are too sparse, if the cached prefix drifts because the context is assembled inconsistently, or if the prompt ordering is not actually stable across reviews. Instrument the pipeline so the hit rate is a number you watch. A high hit rate confirms the saving is real, and a low one points directly at what to fix, usually prompt consistency or review timing.

Where this fits the wider optimization picture

Caching for code review sits alongside the other levers that compound on a Claude bill. It combines with model routing, since much routine review work can run on Sonnet or even Haiku rather than Opus, and with batch for review jobs that do not need to be real time. Our token optimization playbook sets out how caching, routing, and batch fit together into one method for cutting Claude spend without losing quality. For engineering teams running review at scale, caching the stable context is frequently the highest return single change, and it pairs naturally with routing the reviews themselves to the right model.

The takeaway

Code review is one of the cleanest prompt caching wins because the diff changes while the context, the instructions, the standards, and the surrounding code, stays the same across every review. Cache that stable context, structure each review prompt so the cached prefix comes first and the change comes last, and cluster reviews so reuse stays inside the cache window. The saving lands near the top of the up to ninety percent range on the cached portion, and on interactive review paths it speeds the response up as well, which protects adoption. Measure the hit rate to confirm it, and pair caching with routing the reviews to the right model. Download the token optimization playbook to set up caching across your review pipeline and size what it is worth.

Stop reprocessing the same codebase context.

We structure your review prompts so the stable context reads from cache on every review, cutting cost and speeding the response. Download the playbook to see how.

Download playbook

Start here

Get the spend in your favor.

The Counteroffer

Weekly intelligence on Anthropic pricing moves and the buyer side counters that work.

Get a Quote · Book a Strategy Call · The Counteroffer · Blog · How It Works · Pricing · LinkedIn · New York · London Not affiliated with Anthropic PBC. Independent buyer side advisory only.

Caching for code review workloads.

Why code review is so cacheable

What to cache in a review pipeline

Structure the prompt so caching works

Cluster reviews to stay in the window

The double benefit for developer experience

Measure the hit rate, not the intention

Where this fits the wider optimization picture

The takeaway

Related reading

Stop reprocessing the same codebase context.

The Counteroffer