$building.effective.agents
Menu
Last verified: April 2026
· Pattern P03

Parallelization.

Fan out to N independent calls, fan in via aggregator. Anthropic distinguishes two flavours: sectioning (sub-tasks of a single problem) and voting (the same prompt run multiple times).

Definition

“LLMs can sometimes work simultaneously on a task and have their outputs aggregated programmatically. This workflow, parallelization, manifests in two key variations: Sectioning: Breaking a task into independent subtasks run in parallel. Voting: Running the same task multiple times to get diverse outputs.”

From Anthropic, “Building Effective Agents”, December 2024.

What it does

The pattern fans the input out to N independent calls and then merges the results. The two flavours differ in what is run in parallel.

inputcall 1call 2call 3call Naggregatorvote / merge / picksectioning runs sub-tasks; voting runs the same task N times

Sectioning partitions a single task into independent sub-tasks. Each sub-task is handled by its own call; the aggregator stitches the parts back together. Sectioning works when the parts are genuinely independent, otherwise the aggregation step is itself a hard problem.

Voting runs the same call multiple times with different sampling parameters or with reframed prompts, then picks the best answer (best-of-N) or the modal answer (self-consistency). The technique is documented in Wang et al. (2022) for chain-of-thought reasoning, where majority voting over sampled chains substantially improved arithmetic and commonsense benchmarks.

When it is appropriate

  • Sectioning is appropriate when independent sub-questions can be answered in parallel without coordination, for example: review a document for security issues, performance issues, and accessibility issues simultaneously, then merge.
  • Voting is appropriate when the modal answer across N samples is more reliable than a single sample, which tends to be true for tasks with discrete answers (multiple choice, classification, code generation against tests).
  • Both flavours are appropriate when latency budget is tight: the wall-clock cost of N parallel calls is the slowest call, not the sum.

Public examples

Cost considerations

The headline cost is N times the cost of a single call. The aggregator adds a small overhead. Vendor pricing pages (Anthropic, OpenAI) make this trivial to forecast for a fixed N.

The non-obvious lever is N as a function of input difficulty. A router (P02) can pick N based on initial confidence: easy inputs use N=1, hard inputs use N=5 with voting. This adapts cost to demand and is a common composition of the two patterns.

Failure mode

Sectioning fails when the parts are not actually independent and the aggregator must reconcile contradictions it cannot resolve. Voting fails when all N samples share the same systematic error (the modal answer is wrong because the model is consistently wrong). See aggregation failures.

Glossary

See parallelization, self-consistency, fan-out.

Foundational definitions on the sibling reference site: whatisanaiagent.com glossary.

Read next