Building Effective Agents
PATTERN · COST SPIKE-PRONE

Orchestrator-Worker

A planner decomposes the task and dispatches sub-tasks to worker models. The pattern we deploy most. The one that costs the most when it goes wrong.

Oliver Wakefield-SmithBy Oliver Wakefield-Smith, Digital Signet
Last verified April 2026

When to use it

Of the five patterns in the Anthropic paper, orchestrator-worker is the one we deploy most. We use it for build pipelines (deploy plus screenshot plus QA), for content updates across multiple sources, for multi-source research where the sub-questions are independent. The two preconditions are: the task is decomposable, and the sub-tasks can run in parallel.

The structural advantage is that the orchestrator can choose the right model per worker. Cheap small model for boilerplate sub-tasks, expensive frontier model only for the work that needs it. Done right, this is materially cheaper than running everything on the frontier model.

When not to use it

Tasks small enough to fit in a single chained prompt. The orchestrator overhead, including the planning model call and the dispatch logic, exceeds the work cost below roughly three worker calls. If your task is "extract a price, then format it," do not orchestrate. Chain the prompts.

Tasks where workers must communicate with each other mid-flight. Orchestrator-worker assumes workers are independent. If they are not, you want a different pattern, often a smaller routing pattern with a shared state object.

Production cost data

We have telemetry on roughly 60 of our ~500 sites running orchestrator-worker for content updates. P50 cost per task is in the low-cents range. P95 is around 4x P50. P99 is where the Cost Cliff lives: in our pipeline P99 is around 10x P50.

Last month our orchestrator decided to spawn 47 workers for a task that should have used three. The bill for that single run was $4.20. Across our pipeline that is a five-figure annual leak if we do not catch it. The fix was a max-workers cap on the orchestrator's plan, enforced at dispatch time. Before the cap: 47 workers. After: 4 workers, no quality loss, 91% cost reduction on that task class.

The full Note is at /operator-notes/orchestrator-cost-spike-47-workers/.

Anti-patterns

  • Implicit max-workers.Leaving the cap to the orchestrator's judgement. The orchestrator will, given any input that surprises it, propose more workers. Make the cap explicit.
  • Synthesis on the same model as planning.The synthesis step does not need the planner's context budget. Run it on a smaller cheaper model, especially for fan-out tasks.
  • No per-task cost ceiling. Without a ceiling at dispatch, a single bad plan can run a five-figure overage in an afternoon. We have seen this happen to others. Cap at dispatch.

Sample code

# Minimal orchestrator-worker, model-agnostic.
def orchestrate(task, max_workers=8):
    plan = model.plan(task, max_workers=max_workers)
    if len(plan.subtasks) > max_workers:
        raise CostCliffError(f"Plan exceeded cap: {len(plan.subtasks)}")
    results = parallel_dispatch(plan.subtasks)
    return model.synthesize(task, results)

Cross-pattern interactions

Orchestrator-worker pairs naturally with evaluator-optimiser on the synthesis step (a critic model checks the synthesised output before returning). It also pairs with routingas the orchestrator's first decision: which worker pool gets which sub-task. Both pairings are common in our pipeline.

Engineering FAQ

When should I use orchestrator-worker vs prompt chaining?

Use orchestrator-worker when sub-tasks are independent and could run in parallel. Use prompt chaining when each step strictly depends on the previous one. Past three chained steps, the orchestrator-worker pattern is almost always cheaper because it does not re-pass the full context through every model call.

What is a safe maximum for workers per task?

We cap at 8 in our pipeline. Past 8, synchronisation overhead exceeds LLM token cost in our observation. Your ceiling will depend on your tools, but pick a number, enforce it at dispatch time, and treat overruns as a Cost Cliff incident.

How do I prevent the Cost Cliff?

Two layers. First, the orchestrator's planning prompt should include an explicit max-workers number; do not leave the cap implicit. Second, the dispatch layer should reject plans that exceed the cap, regardless of what the orchestrator proposed. The orchestrator can be wrong; the dispatch layer cannot.

Read next

The Failure Pyramid

Cost Cliff is level 2 of the Pyramid.

47 workers Note

The $4.20 single-run cost spike, the cap that fixed it.

LangGraph review

The framework we use to deploy this pattern in production.

Oliver Wakefield-Smith, Founder of Digital Signet
ABOUT THE AUTHOR
Oliver Wakefield-Smith
Founder, Digital Signet

Oliver runs Digital Signet, a research and product studio that operates ~500 production sites with AI agents as the engineering layer. The Digital Signet portfolio is built using a continuous AI-agent build pipeline, one of the largest agent-operated publishing operations on the open web. The handbook draws directly from those deployments: real cost data, real failure modes, real recovery patterns.