Any Claude work that does not need an answer this second can run in batch at roughly half the real time rate. For the large share of enterprise workloads that are genuinely asynchronous, that is a fifty percent discount most teams simply never claim. Here is which work belongs in batch and how the lever compounds with caching and routing.
Most teams default every Claude call to the real time path because that is how the first integration was built, and real time stayed the habit long after the urgency that justified it had gone. The result is that a large amount of work that nobody is waiting on, overnight reports, bulk classification, content enrichment, evaluation runs, gets processed at the premium real time rate when it could have run in batch at roughly half the cost. Batch processing is one of the simplest levers in the entire token budget because it asks for no quality tradeoff and no prompt rewrite. The output is the same. The only thing that changes is when you get it, and for asynchronous work that timing does not matter to anyone. If you are paying full rate for work no user is waiting on, you are leaving the fifty percent lever unpulled.
Batch processing works by submitting a set of requests as a job and accepting that the results come back within a longer window rather than immediately. In exchange for that flexibility, the rate is roughly half the real time price. The trade is purely about latency, not quality. The same model produces the same output to the same prompt, you simply receive it later. That makes batch the rare optimization with no downside on workloads that are genuinely asynchronous, because there is nothing to lose. The mistake teams make is assuming everything needs to be real time. In practice a large fraction of enterprise AI spend sits on work where a wait of minutes or hours is completely acceptable, and every one of those requests is overpaying by running on the live path.
The test for batch is a single question: is anyone actually waiting on this result the instant it is produced? If the answer is no, it is a batch candidate. The clearest cases are scheduled and bulk jobs. Overnight processing of the day's data. Periodic reports and digests. Bulk classification, tagging, or extraction across large datasets. Content enrichment pipelines that prepare material for later use. Backfills and reprocessing when a prompt or model changes. Evaluation and test runs that score model behavior offline. Any data pipeline stage where the output feeds a later step rather than a waiting user. In all of these, the result is consumed by a process or read by someone hours later, so collapsing the request onto the real time path buys speed nobody uses at a price everybody pays.
Batch is not a universal answer, and the discipline is knowing the boundary. Anything a user is actively waiting on stays real time: interactive chat, live assistance, anything in a request and response loop where a person is watching for the reply. The skill is separating the genuinely interactive from the merely habitual. A surprising amount of work that runs real time is not actually interactive, it was just built that way, and that is exactly the work to move. The goal is not to batch everything, it is to stop paying the real time premium on work that gained nothing from being real time in the first place. Map your workloads against the waiting test honestly and the split usually favors batch more than the team expected.
Batch is powerful on its own, but its real strength is that it stacks with the other token levers rather than competing with them. The fifty percent batch discount applies on top of the model you choose, so a batch job running on Sonnet instead of Opus captures both the routing saving and the batch saving at once. It also combines with caching: a bulk job that re uses a large shared prefix, a common instruction block or reference document across thousands of items, gets the caching discount on the input and the batch discount on the rate together. The levers multiply rather than add, which is why the teams that deploy routing, caching, and batch in combination see aggregate reductions far larger than any one lever alone. Batch is often the easiest of the three to apply, which makes it a good place to start capturing the compounding.
Shifting asynchronous work to batch does not only cut the monthly bill, it changes the number you should commit to Anthropic. Your committed spend should reflect the real, optimized cost of your workloads, and a workload that runs half its volume in batch costs materially less than the same volume run entirely real time. A buyer who commits before applying the batch lever locks the real time premium into the commitment for the length of the term, paying twice over for a habit. A buyer who batches first commits to a leaner, truer number and negotiates from demonstrated efficiency. The sequencing matters: optimize, then commit, never the reverse, because savings found after the commitment is set are stranded inside a number you already agreed to.
Batch is usually the fastest win in a token optimization program because it needs no prompt changes and carries no quality risk, only a shift in timing. Our token optimization playbook lays out the full method, the waiting test in detail, the workloads that move most easily, and how to combine batch with caching and routing for the compounding saving. Download it below and start by listing every job that runs real time today and asking, honestly, which of them anyone is actually waiting on.
Download the token optimization playbook for the batch, caching, and routing levers that cut Claude spend without touching quality.
Download the playbookWeekly intelligence on Anthropic pricing moves and the buyer side counters that work.
Get a quote for a bounded engagement. Fixed fee or gainshare, no risk to you.
Get a QuoteWeekly intelligence on Anthropic pricing moves and the buyer side counters that work.