Procurement KPIs for AI Contracts

Procurement teams are good at measuring deals. The trouble is that the standard scorecard was built for a world of seat licenses and fixed annual fees, where the number you negotiate is the number you pay. An Anthropic contract does not behave that way. The seats are part of it, but the larger and more volatile part is consumption, and consumption is driven by engineering decisions that procurement never sees. A KPI set that only watches the headline discount will declare victory on a deal that quietly bleeds money for three years.

This article lays out the KPIs that actually matter on an AI contract, why each one matters, and how a buyer side desk uses them to keep a Claude deal honest across its full term. The goal is a scorecard that measures the real cost of the relationship, not just the price on the order form.

Why software KPIs fail on AI contracts

The classic procurement KPI is discount off list. You take the vendor's published price, you negotiate it down, and you report the percentage you saved. On a seat license that is a reasonable measure, because the price and the usage are the same thing. One seat costs one price, and you pay it whether the user logs in daily or never.

Consumption pricing breaks that logic. On the API side of an Anthropic deal you are buying tokens, and the number of tokens you burn depends on which model you route to, how long your prompts are, whether you cache, and whether you batch. A buyer who negotiates a strong discount off the token rate and then runs every request through the most expensive model has not saved money. They have negotiated a good price on a workload they are operating badly. Discount off list, by itself, cannot see that. You need KPIs that watch both the rate you pay and the way you consume.

Discount off list measures the deal you signed. Effective rate measures the deal you are actually living. Only one of them shows up on the invoice.

The rate KPIs

The first family of KPIs measures the commercial terms themselves, the things you negotiated and locked into the order form.

Effective rate per million tokens

The single most important rate KPI is the effective rate you pay per million tokens, blended across input, output, and cache, and measured separately for each model you use. List price is a starting point. The effective rate is what you actually pay after your committed spend discount, your cache savings, and your batch savings are applied. Tracking it over time tells you whether your deal is keeping pace with your scale. If your usage doubles and your effective rate does not improve, your committed spend tier is not working hard enough, and that is a flag to raise at renewal.

Discount depth versus benchmark

Discount off list is not useless. It is just incomplete on its own. The useful version is discount depth measured against a benchmark of what comparable enterprises actually pay Anthropic at your commit size, not against the published list price, which almost nobody at scale pays. A thirty percent discount sounds strong until you learn that companies at your commit band routinely get more. Benchmarking turns a vanity number into a real one, and it is the metric that tells you whether to push harder before you sign.

Overage rate as a ratio to committed rate

A KPI procurement often forgets is the overage rate, the price you pay for tokens beyond your commitment. If your overage drops back to undiscounted list the moment you exceed your commit, every token of growth costs you far more than it should. The KPI to watch is the ratio of your overage rate to your committed rate. A well negotiated deal keeps that ratio at or near one, so growth does not get punished. A weak deal lets it balloon, and the buyer only notices when a growth quarter produces a shocking invoice.

The consumption KPIs

The second family measures how efficiently you use what you bought. These are the KPIs that software procurement never had to track, and they are where most of the real money lives.

Commitment utilization

If you signed a committed spend deal, your most important consumption KPI is utilization, the share of your commitment you actually consume. Anthropic commitments are use it or lose it, so unused commitment is money you paid for and threw away. A utilization below your target is a sign you overcommitted, and it is the number to bring to a mid term reforecast conversation. A utilization running hot, consistently near or over your commit, is a different signal, that you are about to hit overage and should be planning the next commit tier. Either way, utilization is the KPI that connects the deal you signed to the usage you actually have.

Model mix

The model mix KPI tracks what share of your spend runs through each model across Opus, Sonnet, and Haiku. This single metric explains more cost variance than almost any other, because the price gap between the models is large. A workload running uniformly on the most expensive model when much of it could run on a cheaper one is overspending by a wide margin, and disciplined routing across the three typically cuts aggregate spend forty to seventy percent versus uniform top tier use. Procurement cannot set the model mix, that is an engineering decision, but procurement can and should measure it, because it is the difference between a deal that looks good and a workload that is actually efficient.

Cache hit rate and batch share

Two more consumption KPIs round out the picture. Cache hit rate measures how much of your repeated context is being served from cache, where prompt caching takes up to ninety percent off the cost of those tokens. Batch share measures how much of your asynchronous work runs through the batch lane, where the discount is fifty percent. Both are levers engineering controls and procurement measures. A low cache hit rate on a workload with heavy repeated context, or a low batch share on work that does not need a real time answer, points directly at savings nobody has captured yet.

The relationship KPIs

The third family measures the health of the contract over time, the things that decide whether your next renewal is a fight or a formality.

Price protection coverage

This KPI is binary at first glance but matters enormously. Does your contract protect your rate across the term and through renewal, or does it leave you exposed to list price increases. A deal with no price protection means your effective rate can climb without you negotiating anything, simply because Anthropic moved its list. The KPI to track is the share of your spend covered by a locked rate, and the time remaining on that lock. A protection window expiring before your renewal is a flag to act on early.

Renewal runway

The last relationship KPI is the simplest and the most neglected. How many months of runway do you have before your renewal date, and have you started preparing. The strongest renewals begin a full twelve months out, while you still have leverage and alternatives. A renewal that arrives with the buyer unprepared is a renewal the vendor controls. Tracking renewal runway as a KPI forces the preparation to start on time, which is half the battle. We cover the full method in our Anthropic renewal guide.

Building the scorecard

A useful AI contract scorecard pulls these KPIs onto a single view that finance, procurement, and engineering can all read. The rate KPIs tell finance whether the commercial terms are competitive. The consumption KPIs tell engineering where efficiency is being left on the table. The relationship KPIs tell procurement when to act before the renewal does. No single team owns all of them, which is exactly why they belong on one shared scorecard rather than scattered across three systems that never reconcile.

The discipline that makes the scorecard work is reviewing it on a regular cadence, not just at renewal. A monthly look at utilization, model mix, and cache hit rate catches drift while it is still cheap to fix. A quarterly look at effective rate against benchmark tells you whether your deal is keeping pace with your scale. By the time the renewal arrives, a buyer who has watched these KPIs all year walks in knowing exactly where the deal is strong and where it needs work, which is a far better position than reconstructing the story from invoices in the final month.

What good looks like

A well run AI contract shows a recognizable pattern across these KPIs. The effective rate improves as scale grows. Commitment utilization sits in a healthy band, neither wasting commitment nor constantly spilling into overage. The model mix is weighted toward the cheapest model that meets the quality bar for each workload. Cache hit rate and batch share are high on the workloads where they apply. Price protection covers the term with runway to spare before renewal. When all of these are green, the deal is not just well priced. It is well operated, and the two together are what actually keep an Anthropic invoice under control.

When several of them are red, the scorecard tells you where to look. A poor effective rate points at the commercial terms and a renegotiation. A low utilization points at an oversized commit and a reforecast. A skewed model mix or a low cache hit rate points at an engineering optimization that procurement can fund and measure but not perform alone. The value of the KPI set is that it turns a vague sense that the AI bill is too high into a specific list of fixable causes, each with an owner and a lever. That is the difference between worrying about AI spend and managing it.

Setting up this scorecard and reading it correctly is exactly the kind of work we do for buyers. We sit between you and Anthropic, we know which KPIs the account team watches and which ones they hope you ignore, and we help procurement build a measurement framework that holds the deal honest across its full term. If your current AI contract scorecard still looks like a software one, it is worth a conversation.

Go deeper

This article is part of our Token Optimization Playbook. Read it for the full buyer side method behind everything above.