White Paper · Token Optimization

The Token Optimization Field Guide.

A buyer side guide to cutting Claude spend without cutting quality. Model routing, prompt caching, and batch, explained with the real numbers and a checklist your engineers can run this quarter. Enter your work email and we send the full guide straight over.

What the guide covers

Most enterprise Claude bills are larger than they need to be, and the reason is rarely the contract. It is the workload. Teams default to running everything on the most capable model in real time, and the invoice reflects that choice rather than the actual difficulty of the work. This guide walks through the three levers that change the bill the most, in the order we apply them with clients.

1. Model routing across Opus, Sonnet, and Haiku

The single biggest driver of aggregate spend is which model handles each request. Opus is built for the hardest reasoning. A great deal of production traffic is classification, extraction, routing, and short answers that Sonnet or Haiku handle just as well at a fraction of the cost. The guide shows how to classify your traffic and route it, and why this alone commonly moves aggregate spend 40 to 70 percent versus uniform Opus use.

2. Prompt caching at up to 90 percent

If your prompts repeat a large stable block, a long system prompt, a policy document, a code base, you are paying full price to send it again and again. Prompt caching lets Claude reuse that block and charges a fraction for the cached portion, up to 90 percent off the repeated input. The guide explains how to structure context so the cache actually hits.

3. Batch at 50 percent

Work that does not need an answer in the next second belongs on the batch path, which runs at half the price. Overnight enrichment, evaluation runs, back catalog processing, and most analytics jobs qualify. The guide covers how to size batch jobs and which workloads to move first.

Applied together and in the right order, these levers let you size your Anthropic commitment to an efficient workload. That is worth far more than the discount, because it compounds every month of the term.

What you also get

A worked example showing a sample workload before and after optimization.
A one page checklist your engineering team can run against any Claude application.
The questions to ask before you sign a commit, so the number is right the first time.

This guide is the pillar for our token optimization writing. Once you have read it, the blog goes deeper on individual levers, and our services page explains how we apply all of this inside a live negotiation.

Get the guide

Enter your work email and we send the full Token Optimization Field Guide as a PDF. No charge, no sales call required.

Cut the bill before you size the commit.

We apply every lever in this guide inside live Anthropic negotiations. Fixed fee or gainshare, no risk to you.

Get a Quote

Start here

Get the numbers in your favor.

The Counteroffer

Weekly intelligence on Anthropic pricing moves and the buyer side counters that work.

Get a Quote · Book a Strategy Call · The Counteroffer · LinkedIn · New York · London Not affiliated with Anthropic PBC. Independent buyer side advisory only.