Contract review, extraction, claims, and summarization are high volume document jobs that almost never need an answer this second. Run them in batch and you pay roughly half the real time rate for the same output. Here is how to move document work to batch and stack the saving with caching and routing.
Document workflows are where batch processing pays off fastest, and they are also where most enterprises quietly overpay Anthropic the most. Contract review, invoice extraction, claims processing, policy summarization, due diligence reading, knowledge base enrichment: these are high volume jobs that run across thousands or millions of documents, and almost none of them need an answer in the same second the request is made. Yet the typical integration sends every document through the real time path because that is how the proof of concept was wired, and nobody went back to change it once volume scaled. Batch processing runs the same Claude model on the same documents at roughly half the real time rate. For a workload built on documents, that is a fifty percent line item sitting unclaimed.
The test for batch is one question: is a person waiting on this result the instant it is produced? For the large majority of document work, the honest answer is no. A nightly run that extracts fields from the day's contracts feeds a database that someone queries the next morning. A bulk summarization job over a research library populates a search index that is read later. A compliance pass over a year of records produces a report that a reviewer opens when it is ready. In every case the output is consumed by a downstream system or a human hours later, so paying the real time premium buys speed that nobody uses. Documents also tend to arrive in volume rather than one at a time, which is exactly the shape batch is designed for. You assemble the set, submit it as a job, and accept results within a longer window in exchange for the lower rate.
Batch trades latency, nothing else. The same model produces the same output to the same prompt, you simply receive it later. There is no quality penalty and no prompt rewrite required to capture the discount. That makes batch unusual among optimization levers, because most savings ask you to give something up. With batch on asynchronous document work, the only thing you give up is immediacy that the workload never needed. The discipline is drawing the line correctly. A document a user uploaded and is actively waiting to see summarized stays real time. A document sitting in a queue for overnight processing belongs in batch. Once teams separate the genuinely interactive from the merely habitual, the share of document volume that can move to batch is usually far larger than they assumed.
The reason document workflows are special is that they let you stack batch on top of prompt caching, and the two levers multiply. Most document jobs reuse a large shared prefix on every call: the same extraction schema, the same scoring rubric, the same instruction block, sometimes the same reference policy or template repeated across thousands of items. Prompt caching lets that repeated context be charged at up to ninety percent less after the first call, because the model does not reprocess what it has already seen. Run that same job in batch and the rate underneath the cached and uncached tokens alike drops by roughly half. A document pipeline that caches its shared instructions and runs in batch captures both discounts at once, which is why these workloads can fall so far below their naive real time cost.
The third lever is choosing the right model for the document task rather than defaulting everything to Opus. A great deal of document work, field extraction, classification, tagging, first pass summarization, runs perfectly well on Sonnet, and the simplest sorting and routing decisions often run fine on Haiku. Reserving Opus for the genuinely hard reasoning, the nuanced legal judgment or the ambiguous edge case, and routing the bulk of the volume to Sonnet and Haiku, is where the largest share of document spend is recovered. Layer routing under caching under batch and the three combine to cut aggregate document spend by forty to seventy percent against a uniform real time Opus baseline. No single lever does that alone. Together, on a document heavy workload, they routinely do.
Optimizing document workflows is not only about a lower monthly bill, it changes the number you should commit to Anthropic. Committed spend should reflect the real, optimized cost of your workloads, not the inflated cost of an unoptimized one. A document pipeline that runs in batch, caches its shared context, and routes intelligently costs a fraction of the same pipeline run uniformly on the real time Opus path. A buyer who commits before optimizing locks that premium into the agreement for the full term and pays it every month. A buyer who optimizes first commits to a leaner, truer number and negotiates from demonstrated efficiency. The order is everything: optimize, then commit, never the reverse, because every dollar of saving found after the commitment is set is stranded inside a number you already agreed to.
We sit on the buyer side and do this work for a living. Our token optimization playbook lays out the full method for document workflows, the waiting test, the caching patterns for shared schemas, the routing logic, and how to combine all three for the compounding saving before you ever sign a commit. Download it below and start by counting the documents you process on the real time path that no one is actually waiting on.
Download the token optimization playbook for the batch, caching, and routing levers that cut Claude document spend without touching quality.
Download the playbookWeekly intelligence on Anthropic pricing moves and the buyer side counters that work.