When to use it
Inputs of mixed type that benefit from specialised handling. Customer support tickets routing to a billing sub-agent, a product sub-agent, or an escalation sub-agent. Content classification dispatching to category-specific extractors. Query dispatch in a multi-tool agent.
Routing is also the right pattern when you want a cheap fast model on the easy cases and an expensive slow model on the hard ones. The router triages; the sub-agents do the work. Done well, this is a meaningful cost reduction.
When not to use it
Without a confidence gate. Confidently routing a boundary case to the wrong sub-agent is a regular Failure Pyramid level-3 incident. The downstream sub-agent does the wrong work and is satisfied with itself. Without a gate, the router will be wrong on the long-tail of inputs you did not anticipate.
When the input space is too narrow. If 95% of your inputs are the same type, do not route. A single sub-agent with a permissive prompt is cheaper than a router plus N sub-agents.
Production cost data
In our pipeline a routing classifier hits roughly 95% accuracy on the head of the input distribution. The 5% miss-rate is concentrated in boundary cases: inputs where two sub-agents are both plausibly the right destination. That is where the Confidence Gate earns its keep.
Operator Note: we shipped a one-line change to our routing pattern that cut Sonnet 4.6 token usage by 22% across 300 sites. The change was a tighter classifier prompt that pre-classified the input before the routing branch fired. The reason it worked: the model was being asked to do the routing classification inside the routing call itself, so the rest of the prompt context was wasted on a decision that had already been made. The full Note is at /operator-notes/the-22-percent-routing-fix/.
Anti-patterns
- Implicit fall-through.If the router is unsure, it should not pick the "default" sub-agent silently. The fall-through path must be explicit and instrumented. Otherwise you are routing all your boundary cases to one sub-agent, and that sub-agent is the slowest one to debug.
- Confidence-as-vibes.The router's confidence score must be calibrated against real outcomes, not vibes. Calibration drifts. Re-calibrate quarterly.
- Routing inside the sub-agent. If the sub-agent re-checks the routing decision and overrides it, you have two routers. Pick one.
Sample code
# Routing with confidence gate.
def route(input_text, threshold=0.75):
decision = router.classify(input_text)
if decision.confidence < threshold:
return fallthrough_handler(input_text)
return SUB_AGENTS[decision.target].handle(input_text)
Cross-pattern interactions
Routing is often the entry point of an orchestrator-worker pattern: the orchestrator routes sub-tasks to specialised worker pools. It also pairs with evaluator-optimiser when the route itself is the artefact under evaluation; that is the Confidence Gate, expanded.
Engineering FAQ
What is a Confidence Gate?
A pre-check that compares the routing classifier's confidence in its sub-agent selection against a threshold. If confidence is below threshold, escalate to a fall-through path or to a human. We coined the name. The full essay is the inaugural Pattern Deep Dive.
How do I measure routing accuracy in production?
Sample real routes against a small held-out gold-set and a periodic shadow run where two routers see the same input and disagreements are logged. The shadow run finds drift; the gold-set anchors absolute accuracy.
Should the router be a small model or a large one?
Start small. Routing is a classification task and small models do classification well. The reflex to reach for a frontier model is wrong here, both on cost and on latency. Upgrade only when you can show a measurable accuracy gap on your gold-set.
Read next

Oliver runs Digital Signet, a research and product studio that operates ~500 production sites with AI agents as the engineering layer. The Digital Signet portfolio is built using a continuous AI-agent build pipeline, one of the largest agent-operated publishing operations on the open web. The handbook draws directly from those deployments: real cost data, real failure modes, real recovery patterns.