The whole art of prompt caching comes down to one distinction: which parts of your context are static and which are dynamic. Static context is the content that stays the same from request to request and can be cached and reused. Dynamic context is the content that changes every time and has to be processed fresh on each call. Get the line between them right and you cache the heavy stable content while paying full rate only on a small variable tail. Draw the line wrong, or fail to separate the two at all, and the dynamic content contaminates the static, the cache never holds, and the up to 90 percent discount slips away. This guide is about that single skill: learning to see which of your context is static, which is dynamic, and how to keep them cleanly apart.
Static content is anything that is identical across many requests. The system instruction that defines how the model should behave, a fixed set of reference documents, a schema the model reads against, a library of examples, and policy or style guidance are all static, because they do not change from one call to the next. This content is usually large and expensive to process, and because it repeats, it is the ideal thing to cache. The model processes it once, stores it, and reuses it on every subsequent request that shares the same prefix.
Dynamic content is anything that changes per request. The user's current question, the specific record being processed, a timestamp, a session identifier, retrieved results that differ each time, and any per call parameters are all dynamic. This content cannot be reused, because it is different on every request, so it has to be processed fresh each time. The good news is that dynamic content is usually small relative to the static content, so paying full rate on it costs little, as long as it is kept separate.
The reason this distinction matters so much is the way caching matches content. The cache reuses content from the start of the prompt up to the first point where the new request differs from the cached one. Everything before that point is reused. Everything from that point onward is reprocessed. So the moment a piece of dynamic content appears, it ends the reusable section, and all the static content after it is lost to the cache even though it never changed.
This is why mixing static and dynamic content is so costly. If you insert a user name into the middle of your system instruction, or weave the current question into the reference material, you have placed a dynamic value inside what should be a static block. The cache match now breaks at that insertion point, and all the expensive static content sitting after it gets reprocessed on every call. The content was static in nature, but by mixing it with dynamic values you made it behave as if it were dynamic, and you pay full rate on all of it. Keeping the two cleanly separated, static first and complete, dynamic after and complete, is what lets the cache capture the full stable block.
Separating static from dynamic content is partly a way of seeing your prompt and partly a set of construction habits. A few practices make the separation clean and durable.
Go through your prompt section by section and label each piece as static or dynamic by asking one question: does this change between requests? The system prompt almost always answers no. The user input almost always answers yes. Reference documents usually answer no within a session and yes across sessions. This audit is the foundation, because you cannot order content by stability until you know which bucket each piece falls into.
Once labeled, collect all the dynamic content and move it to the end of the prompt, after the entire static block. The discipline here is to be strict about it, because a single stray dynamic value left inside the static section defeats the whole arrangement. Even small things like a current date or a request identifier belong at the tail, not woven into the instructions for readability. The static block must be completely free of anything that changes.
The hardest cases are the ones where a dynamic value is embedded inside otherwise static text, such as a system prompt that greets the user by name or references the current account. Rewrite these so the dynamic value lives in the tail rather than the static body. Instead of personalizing the static instruction, keep the instruction generic and stable, and supply the personal detail in the dynamic section at the end. The model still has the information, but the static block stays cacheable.
Some content is static within a session but dynamic across sessions, such as the documents a user is working with in a single conversation. Treat this as its own layer, placed after the always static content and before the per request dynamic content. That way the truly permanent content is cached across all sessions, the session content is cached within the session, and only the per request content pays full rate every time. Ordering by lifetime, from most stable to least, lets each layer be reused for as long as it lasts.
Most teams come to caching with prompts that already exist, written without the static and dynamic distinction in mind, and the first task is to read an existing prompt and see the split that is hiding in it. The technique is simple: imagine the same prompt being built for two different requests, side by side, and mark every place where the two would differ. Everything identical between them is static. Everything that differs is dynamic. That mental diff exposes the structure faster than any amount of staring at a single example, because the difference between two requests is exactly what the cache cares about.
Doing this almost always turns up dynamic values scattered through what looked like a stable block, a name here, a date there, a parameter woven into an instruction. Each one is a place the cache would break, and seeing them is the prerequisite to moving them. Once the split is visible, the repair is mechanical: collect the dynamic pieces, move them to the tail, and leave a clean static block at the front. The hard part is the seeing, not the moving, which is why the side by side diff is the habit worth building.
The cleanest cases are easy: a system prompt is static, a user question is dynamic. The work is in the grey area, the content that is neither obviously one nor the other, because how you classify it determines how much you can cache. Getting the grey area right is what separates a caching setup that captures most of the saving from one that captures only the obvious part.
Reference documents are the classic grey area. Within a single user session they are usually fixed, so relative to that session they are static and should be cached. Across sessions they may change as the user moves to different material, so relative to the whole application they are dynamic. The answer is not to force them into one bucket but to recognize they have their own lifetime, longer than a single request but shorter than the permanent instructions, and to place them accordingly. Treating session scoped content as its own layer lets you cache it for the duration it actually lasts rather than discarding the saving because it is not permanent.
Configuration and policy content is another grey area. A feature flag, a pricing table, or a policy document changes occasionally but not per request, so it is static for long stretches and then updates. The right treatment is to cache it and accept that each update invalidates the cache once, which is fine because the content is reused across many requests between updates. The mistake would be to treat it as dynamic and reprocess it every time just because it is not permanent. Almost anything that changes less often than once per request has a caching benefit, and the skill is matching the cache lifetime to the content lifetime rather than reserving caching only for the things that never change.
Once you accept that content has a spectrum of lifetimes rather than two fixed types, the design rule sharpens. You do not just put static before dynamic. You order the entire prompt from longest lived to shortest lived, so that each layer is reused for exactly as long as it remains valid. The permanent instructions come first, then the content that lasts across many sessions, then the session scoped content, then the per request content at the very end.
Get a quote for a bounded engagement. Fixed fee or gainshare, no risk to you.
Get a QuoteWeekly intelligence on Anthropic pricing moves and the buyer side counters that work.