Batch Processing for Document Workflows

Document workflows are where batch processing pays off fastest, and they are also where most enterprises quietly overpay Anthropic the most. Contract review, invoice extraction, claims processing, policy summarization, due diligence reading, knowledge base enrichment: these are high volume jobs that run across thousands or millions of documents, and almost none of them need an answer in the same second the request is made. Yet the typical integration sends every document through the real time path because that is how the proof of concept was wired, and nobody went back to change it once volume scaled. Batch processing runs the same Claude model on the same documents at roughly half the real time rate. For a workload built on documents, that is a fifty percent line item sitting unclaimed.

Why documents are the ideal batch candidate

The test for batch is one question: is a person waiting on this result the instant it is produced? For the large majority of document work, the honest answer is no. A nightly run that extracts fields from the day's contracts feeds a database that someone queries the next morning. A bulk summarization job over a research library populates a search index that is read later. A compliance pass over a year of records produces a report that a reviewer opens when it is ready. In every case the output is consumed by a downstream system or a human hours later, so paying the real time premium buys speed that nobody uses. Documents also tend to arrive in volume rather than one at a time, which is exactly the shape batch is designed for. You assemble the set, submit it as a job, and accept results within a longer window in exchange for the lower rate.

What batch does and does not change

Batch trades latency, nothing else. The same model produces the same output to the same prompt, you simply receive it later. There is no quality penalty and no prompt rewrite required to capture the discount. That makes batch unusual among optimization levers, because most savings ask you to give something up. With batch on asynchronous document work, the only thing you give up is immediacy that the workload never needed. The discipline is drawing the line correctly. A document a user uploaded and is actively waiting to see summarized stays real time. A document sitting in a queue for overnight processing belongs in batch. Once teams separate the genuinely interactive from the merely habitual, the share of document volume that can move to batch is usually far larger than they assumed.

Stacking batch with caching on document jobs

The reason document workflows are special is that they let you stack batch on top of prompt caching, and the two levers multiply. Most document jobs reuse a large shared prefix on every call: the same extraction schema, the same scoring rubric, the same instruction block, sometimes the same reference policy or template repeated across thousands of items. Prompt caching lets that repeated context be charged at up to ninety percent less after the first call, because the model does not reprocess what it has already seen. Run that same job in batch and the rate underneath the cached and uncached tokens alike drops by roughly half. A document pipeline that caches its shared instructions and runs in batch captures both discounts at once, which is why these workloads can fall so far below their naive real time cost.

Add model routing and the saving compounds again

The third lever is choosing the right model for the document task rather than defaulting everything to Opus. A great deal of document work, field extraction, classification, tagging, first pass summarization, runs perfectly well on Sonnet, and the simplest sorting and routing decisions often run fine on Haiku. Reserving Opus for the genuinely hard reasoning, the nuanced legal judgment or the ambiguous edge case, and routing the bulk of the volume to Sonnet and Haiku, is where the largest share of document spend is recovered. Layer routing under caching under batch and the three combine to cut aggregate document spend by forty to seventy percent against a uniform real time Opus baseline. No single lever does that alone. Together, on a document heavy workload, they routinely do.

A practical sequence for document workflows

List every document job you run and apply the waiting test: if no person needs the result this instant, mark it a batch candidate.
Move scheduled extraction, bulk summarization, enrichment, backfills, and compliance passes to batch first, since they carry no quality risk.
Identify the shared prefix in each job, the schema or rubric or reference text repeated on every call, and cache it for the up to ninety percent input saving.
Route each document task to the cheapest model tier that clears the quality bar, reserving Opus for the work that truly needs it.
Keep interactive, user facing document work on the real time path where a person is waiting.
Measure the blended cost per document before and after, so the saving is provable at the contract table.

Why this matters before you commit

Optimizing document workflows is not only about a lower monthly bill, it changes the number you should commit to Anthropic. Committed spend should reflect the real, optimized cost of your workloads, not the inflated cost of an unoptimized one. A document pipeline that runs in batch, caches its shared context, and routes intelligently costs a fraction of the same pipeline run uniformly on the real time Opus path. A buyer who commits before optimizing locks that premium into the agreement for the full term and pays it every month. A buyer who optimizes first commits to a leaner, truer number and negotiates from demonstrated efficiency. The order is everything: optimize, then commit, never the reverse, because every dollar of saving found after the commitment is set is stranded inside a number you already agreed to.

We sit on the buyer side and do this work for a living. Our token optimization playbook lays out the full method for document workflows, the waiting test, the caching patterns for shared schemas, the routing logic, and how to combine all three for the compounding saving before you ever sign a commit. Download it below and start by counting the documents you process on the real time path that no one is actually waiting on.

Read the pillar guide

The token optimization playbook for Claude buyers →

Batch processing for document workflows.