The change
Two weeks ago we shipped a small change to our routing pattern that cut Claude Sonnet 4.6 token usage by 22% across 300 of our sites. The change was a tighter classifier prompt that pre-classified the input before the routing branch fired. The change was one line of prompt text.
Why it worked
The model was being asked to do the routing classification inside the routing call itself, so the rest of the prompt context was wasted on a decision that had already been made by the time the rest of the prompt fired. The pre-classification step uses a smaller cheaper model and produces a single-token verdict; the routing call itself then runs with a leaner prompt because it does not need to re-derive what was already decided.
Why we did not ship it eight months earlier
We were not measuring per-pattern token cost. We were measuring per-task token cost. Per-task cost looked stable, so the leak was invisible. The day we instrumented per-pattern cost (the day we crossed from Maturity Curve Stage 2 to Stage 3) the leak became visible inside an afternoon. The fix took fifteen minutes; the instrumentation that revealed it took eight months of not-shipping-it. The instrumentation is the work.
The numbers
Before: average tokens per routing call across 300 sites, baseline 100. After: 78. The reduction is consistent across the sample. We have left the number anonymised but the proportional change is exact.
The lesson
Per-pattern instrumentation is what catches the leaks that per-task instrumentation hides. If you are still on per-task only, you are running the same leak we ran for eight months and you cannot see it. The Stage-3 marker on the Maturity Curve is exactly this transition.
Read next

Oliver runs Digital Signet, a research and product studio that operates ~500 production sites with AI agents as the engineering layer. The Digital Signet portfolio is built using a continuous AI-agent build pipeline, one of the largest agent-operated publishing operations on the open web. The handbook draws directly from those deployments: real cost data, real failure modes, real recovery patterns.