Batch Job Sizing on Claude | Antrophic Negotiations

Once a team accepts that a large share of its Claude work is asynchronous and belongs in batch, the next question is rarely asked but quietly expensive: how big should each batch job be? Most teams answer it by accident. They submit whatever happens to accumulate, in whatever grouping the pipeline produced, and accept whatever throughput and cost falls out. That works, in the sense that the jobs complete, but it leaves real money and real reliability on the table. Job sizing is the difference between batch as a blunt discount and batch as a tuned system where the rate saving, the caching saving, and the operational resilience all reinforce one another.

The reason sizing matters is that a batch job is not just a bag of requests. It is the unit across which caching efficiency is realized, across which failures are recovered, and against which your throughput is measured. Choose the unit badly and you fragment your cache hits, complicate your retries, and produce a cost profile that is harder to forecast and therefore harder to commit to. Choose it well and the same volume of work costs less, finishes more predictably, and gives you a clean number to negotiate against. This guide lays out the levers that job size touches and how a buyer side desk thinks about each one.

Group by shared context, not by convenience

The first principle of sizing is to group requests by what they have in common, not by when they happened to arrive. Prompt caching rewards repetition of a shared prefix, the instruction block, the reference document, the schema, the examples that ride along with every item in a class of work. When you assemble a batch job from items that share that prefix, the cached portion is read at up to a 90 percent discount across the whole job. When you assemble it from a random mix of work types, each with a different prefix, you scatter the cache hits and pay closer to full input price on context you could have shared. So the unit of a batch job should be a class of similar work, all carrying the same heavy prefix, rather than simply the last hour of whatever the queue collected.

This reframes sizing as a design decision rather than a logistics one. Before asking how many items go in a job, ask which items belong together because they share context. A classification job over one taxonomy, an enrichment job against one reference set, an evaluation job using one rubric: each is a natural unit because the prefix is constant and the caching compounds. Mixing taxonomies or rubrics into one job to hit a round number is a false economy that trades a caching saving for tidy job counts.

Size for recovery, not just throughput

The second principle is that job size governs how gracefully you recover when something goes wrong, and something always eventually goes wrong. A single enormous job that fails late is expensive to diagnose and painful to rerun, because you either reprocess everything or build bookkeeping to find the failed slice. A swarm of tiny jobs is easy to recover but adds overhead and fragments your view of the work. The sensible middle is a job large enough to amortize the shared prefix and give you clean throughput, but bounded enough that a failure costs you a manageable rerun rather than a full reprocess. Think of the job as the blast radius of a failure and size it so that radius stays affordable.

Idempotency belongs in this conversation. If each item carries a stable identifier and your downstream write is idempotent, then a partial failure is cheap to repair because you can safely reprocess only what did not complete. Teams that build this in can run larger jobs with confidence, capturing more caching and cleaner throughput, because the cost of a failure is bounded by design rather than by luck. Teams that do not build it in are pushed toward smaller jobs to limit risk, and pay for that caution in lost caching efficiency.

Match model to job, then size within the tier

Sizing and model routing are partners. The point of batch is to run asynchronous work cheaply, and the point of routing is to run each job on the lightest model that clears its quality bar. So before sizing, decide the tier. Bulk classification, extraction, and routine enrichment usually clear on Haiku or Sonnet, and running them on Opus by default is the most common waste we find. Reserve Opus for the genuinely hard reasoning that fails on the lighter tiers. Once the tier is set, size the job to maximize caching within that tier. A large Haiku classification job with a shared prefix is often the single cheapest way to process volume on Claude, because it stacks the routing saving, the batch saving, and the caching saving in one unit of work.

Watch the windows and the queue

Batch trades immediacy for price, and the size of a job interacts with the service window you are working against. Very large jobs take longer to clear, which is fine for overnight work but worth planning around when a downstream step has its own deadline. The practical move is to size jobs so they comfortably clear within the window the downstream process needs, with margin, rather than discovering at the deadline that a single oversized job is still running. For recurring work this becomes a steady rhythm: a job size and a submission cadence that reliably deliver results before they are needed, every cycle, without drama.

Group each job by shared prefix so caching compounds across the whole unit.
Size the job so a failure costs a manageable rerun, not a full reprocess.
Build idempotent writes and stable item identifiers so partial failures are cheap to repair.
Set the model tier first, then size to maximize caching within that tier.
Reserve Opus for genuinely hard reasoning and push bulk work to Haiku or Sonnet.
Size and schedule jobs to clear the downstream window with margin to spare.

From job sizing to the commitment number

Sizing is not only an engineering concern. It feeds directly into the number you commit to Anthropic, which is why a buyer side desk cares about it. When your batch work is well sized, well cached, and routed to the right tier, your true cost per unit of output drops and stabilizes. A stable, optimized cost per unit is exactly what lets you forecast confidently and commit to a lean number with protected overage rather than padding the commitment to cover a noisy, unoptimized baseline. The teams that get burned at renewal are usually the ones that committed against an unsized, unoptimized cost and then discovered the efficiency later, stranded inside a figure they already signed.

The sequence is the same as everywhere else in this practice: optimize the work, including how the batch jobs are sized, then commit against the optimized reality. Get the sizing right and you also get a cleaner story at the table, because you can show Anthropic a disciplined, efficient workload rather than a bloated one, and disciplined buyers negotiate better terms. If you want a second set of eyes on how your batch jobs are sized and what that should mean for your commitment, book a strategy call below. We will look at your job structure, your caching, your tier choices, and the commitment math underneath, and tell you where the next saving is.

Read the pillar guide

The token optimization playbook for Claude buyers →

Batch job sizing on Claude.

Group by shared context, not by convenience

Size for recovery, not just throughput

Match model to job, then size within the tier

Watch the windows and the queue

From job sizing to the commitment number

Related reading

Size your batch work to win.

The Counteroffer