Every production request your application sends to Claude is built from a template. Somewhere in the code there is a string with slots in it, filled in with the user input and sent to the model. That template is paid for on every single request, which means a wasteful template is not a one time cost. It is a tax that scales with your traffic. If your template carries two thousand tokens of overhead it does not need, and you run a million requests a month, you are paying for two billion tokens of pure waste a month for nothing. Template design is one of the cheapest optimizations to implement and one of the most expensive to ignore.
Prompt templates rarely start bloated. They get that way over time, the same way any shared piece of code does. Someone adds an instruction to fix an edge case and never removes it. Someone pastes in three examples to improve quality and leaves all three even though one would do. A formatting instruction gets duplicated in two places. A polite preamble that does nothing for the model gets written because it reads well to humans. None of these additions feels expensive in isolation, and none of them gets reviewed for cost, so the template grows and the per request overhead grows with it.
The reason this survives is that the waste is invisible at the request level. A few hundred extra tokens on one request is nothing. The same few hundred tokens multiplied across your monthly volume is a real number on the invoice, but that number never appears next to the template in any code review. The cost lives in aggregate while the decisions get made one line at a time, and that mismatch is exactly why template waste is so common and so persistent.
When we audit a template the waste almost always sits in the same handful of spots, and naming them makes the cleanup fast.
The same rule stated three different ways, or instructions that contradict each other so the model has to be told twice. Consolidate to one clear statement per rule and the model usually follows it better, not worse.
Examples are powerful, and they are also expensive, because every example is input tokens on every request. Many templates carry five examples where two would produce the same quality. Test how few you can use before quality drops, and use that number, not the number someone happened to paste in.
Polite preambles, motivational language, and elaborate role descriptions read well to a person and do little for the model. Tighten the framing to the instructions that change the output and cut the prose that does not.
Output tokens bill at roughly five times the input rate, so an instruction that produces a verbose answer is the most expensive kind of waste. Specify the format and length you actually need rather than letting the model expand, and you cut the costly side of the bill.
If your template carries a large stable block, a schema, a policy document, a long instruction set, and that block sits where caching cannot reach it, you are paying full price for it on every request. Move stable content to the front so prompt caching can hold it and cut its repeated cost by up to ninety percent.
A disciplined template has a clear structure. The stable content sits first, in a fixed order, so it caches cleanly: the instructions, the schema, the examples, anything that does not change between requests. The variable content, the actual user input, comes last, after the cache boundary. The instructions are stated once, the examples are the minimum that holds quality, and the output specification is tight. The result is a template that costs a fraction of the original on every request, caches most of its bulk, and often produces better answers because the model is not wading through contradictory or redundant guidance.
None of this requires a rewrite of your application. It is a focused revision of the strings your code already sends, validated against your real quality bar so you know the cuts did not cost you accuracy. That is what makes template work such a good first optimization: the change is small, the risk is contained, and the saving compounds across every request from the day you ship it.
Template waste is the clearest case where a focused engagement pays for itself quickly. The work is bounded, the saving is measurable, and the effect is permanent because it applies to every future request, not just the ones running today. We start by measuring your current per request overhead and your monthly volume, which together give the size of the prize. Then we revise the templates, validate quality, and quantify the verified saving. Under our gainshare model you pay only from that verified saving, so there is no scenario where the work costs you more than it returns.
Because the saving also lowers your real run rate, the timing matters. A leaner template means a lower consumption number to carry into any Anthropic commitment, which protects you from committing to padded spend you no longer need. The cleanup is worth doing on its own merits and it is worth doing before you negotiate. If you want a fast, bounded engagement that finds and removes template waste and proves the saving, get a quote and we will scope it to your workload.
Get a quote for a bounded engagement that finds template waste and proves the saving on your real traffic.
Get a QuoteWeekly intelligence on Anthropic pricing moves and the buyer side counters that work.