A production engineer's read of Anthropic's paper
The Anthropic paper names five patterns. We deploy four of them across our pipeline. We have a strong opinion on which one to start with and which one to avoid until you absolutely need it.
Anthropic's "Building Effective Agents" paper is the cornerstone vocabulary. The paper does not include cost data, because Anthropic does not run other people's agents. We do. The five sections below extend the paper with operator data: when we use each pattern, when we do not, and what breaks first.
Orchestrator-Worker
COST SPIKE-PRONEWhen we use it: Multi-step tasks where a planner can decompose work into independent sub-tasks. We deploy this most. Build pipelines, content updates, multi-source research.
When we do not: Tasks small enough to fit in a single chained prompt. The orchestrator overhead exceeds the work cost below ~3 worker calls.
Named failure mode: The Cost Cliff. The orchestrator decides to spawn N workers for a job that should have used three.
Read the essay →Prompt Chaining
COST CHEAPESTWhen we use it: Linear pipelines: extract, then transform, then summarise. Cheapest pattern at depth 3 or below.
When we do not: Past three steps, drift compounds super-linearly. Past five, you should be in a routing or evaluator-optimiser pattern instead.
Named failure mode: The Drift. Each step amplifies the error of the previous; by step four the output is no longer about the input.
Read the essay →Evaluator-Optimiser
COST STEADYWhen we use it: Tasks with clear quality criteria where a critique loop genuinely improves output: code review, content QA, structured-output validation.
When we do not: Tasks where the evaluator and the optimiser are running on the same model with the same prompt context; you are paying twice for the same opinion.
Named failure mode: The Loop. Evaluator and optimiser disagree forever. We cap iterations at 5.
Read the essay →Parallelisation
COST MODERATEWhen we use it: Vote-based reliability or speed for a fan-out task: route the same input to N models, take majority vote or fastest acceptable answer.
When we do not: Tasks where the bottleneck is a single tool downstream. Parallel models still serialise on a single tool.
Named failure mode: The Throughput Wall. Concurrency ceiling we observed in our pipeline lands at 8.
Read the essay →Routing
COST MODERATEWhen we use it: Inputs of mixed type that need specialised handling: support tickets routing to the right sub-agent, content classification, query dispatch.
When we do not: Without a Confidence Gate. Confidently routing a boundary case to the wrong sub-agent is a regular Failure Pyramid level-3 incident.
Named failure mode: Confidence-Gate Breach. The pattern most worth the gate.
Read the essay →All five patterns reference the Failure Pyramid for the named failure modes and the Maturity Curve for the deployment stage at which the pattern shows up explicitly.

Oliver runs Digital Signet, a research and product studio that operates ~500 production sites with AI agents as the engineering layer. The Digital Signet portfolio is built using a continuous AI-agent build pipeline, one of the largest agent-operated publishing operations on the open web. The handbook draws directly from those deployments: real cost data, real failure modes, real recovery patterns.