Routing Pattern: Production Cost, Mis-Routes, and the Confidence Gate

When to use it

Inputs of mixed type that benefit from specialised handling. Customer support tickets routing to a billing sub-agent, a product sub-agent, or an escalation sub-agent. Content classification dispatching to category-specific extractors. Query dispatch in a multi-tool agent.

Routing is also the right pattern when you want a cheap fast model on the easy cases and an expensive slow model on the hard ones. The router triages; the sub-agents do the work. Done well, this is a meaningful cost reduction.

When not to use it

Without a confidence gate. Confidently routing a boundary case to the wrong sub-agent is a regular Failure Pyramid level-3 incident. The downstream sub-agent does the wrong work and is satisfied with itself. Without a gate, the router will be wrong on the long-tail of inputs you did not anticipate.

When the input space is too narrow. If 95% of your inputs are the same type, do not route. A single sub-agent with a permissive prompt is cheaper than a router plus N sub-agents.

Production cost data

In our pipeline a routing classifier hits roughly 95% accuracy on the head of the input distribution. The 5% miss-rate is concentrated in boundary cases: inputs where two sub-agents are both plausibly the right destination. That is where the Confidence Gate earns its keep.

Operator Note: we shipped a one-line change to our routing pattern that cut Sonnet 4.6 token usage by 22% across 300 sites. The change was a tighter classifier prompt that pre-classified the input before the routing branch fired. The reason it worked: the model was being asked to do the routing classification inside the routing call itself, so the rest of the prompt context was wasted on a decision that had already been made. The full Note is at /operator-notes/the-22-percent-routing-fix/.

Anti-patterns

Implicit fall-through.If the router is unsure, it should not pick the "default" sub-agent silently. The fall-through path must be explicit and instrumented. Otherwise you are routing all your boundary cases to one sub-agent, and that sub-agent is the slowest one to debug.
Confidence-as-vibes.The router's confidence score must be calibrated against real outcomes, not vibes. Calibration drifts. Re-calibrate quarterly.
Routing inside the sub-agent. If the sub-agent re-checks the routing decision and overrides it, you have two routers. Pick one.

Sample code

# Routing with confidence gate.
def route(input_text, threshold=0.75):
    decision = router.classify(input_text)
    if decision.confidence < threshold:
        return fallthrough_handler(input_text)
    return SUB_AGENTS[decision.target].handle(input_text)

Cross-pattern interactions

Routing is often the entry point of an orchestrator-worker pattern: the orchestrator routes sub-tasks to specialised worker pools. It also pairs with evaluator-optimiser when the route itself is the artefact under evaluation; that is the Confidence Gate, expanded.

Engineering FAQ

What is a Confidence Gate?

A pre-check that compares the routing classifier's confidence in its sub-agent selection against a threshold. If confidence is below threshold, escalate to a fall-through path or to a human. We coined the name. The full essay is the inaugural Pattern Deep Dive.

How do I measure routing accuracy in production?

Sample real routes against a small held-out gold-set and a periodic shadow run where two routers see the same input and disagreements are logged. The shadow run finds drift; the gold-set anchors absolute accuracy.

Should the router be a small model or a large one?

Start small. Routing is a classification task and small models do classification well. The reflex to reach for a frontier model is wrong here, both on cost and on latency. Upgrade only when you can show a measurable accuracy gap on your gold-set.

Routing