Definition
“LLMs can sometimes work simultaneously on a task and have their outputs aggregated programmatically. This workflow, parallelization, manifests in two key variations: Sectioning: Breaking a task into independent subtasks run in parallel. Voting: Running the same task multiple times to get diverse outputs.”
From Anthropic, “Building Effective Agents”, December 2024.
What it does
The pattern fans the input out to N independent calls and then merges the results. The two flavours differ in what is run in parallel.
Sectioning partitions a single task into independent sub-tasks. Each sub-task is handled by its own call; the aggregator stitches the parts back together. Sectioning works when the parts are genuinely independent, otherwise the aggregation step is itself a hard problem.
Voting runs the same call multiple times with different sampling parameters or with reframed prompts, then picks the best answer (best-of-N) or the modal answer (self-consistency). The technique is documented in Wang et al. (2022) for chain-of-thought reasoning, where majority voting over sampled chains substantially improved arithmetic and commonsense benchmarks.
When it is appropriate
- Sectioning is appropriate when independent sub-questions can be answered in parallel without coordination, for example: review a document for security issues, performance issues, and accessibility issues simultaneously, then merge.
- Voting is appropriate when the modal answer across N samples is more reliable than a single sample, which tends to be true for tasks with discrete answers (multiple choice, classification, code generation against tests).
- Both flavours are appropriate when latency budget is tight: the wall-clock cost of N parallel calls is the slowest call, not the sum.
Public examples
- The Anthropic cookbook contains a parallelization reference implementation alongside the other four patterns.
- LangGraph branching models fan-out as a graph branch with a fan-in aggregator node.
- OpenAI's Swarm experimental framework provides parallel handoffs as a first-class primitive.
- Self-Consistency Improves Chain of Thought Reasoning in Language Models (Wang et al., 2022) is the cited reference for the voting variant on reasoning benchmarks.
Cost considerations
The headline cost is N times the cost of a single call. The aggregator adds a small overhead. Vendor pricing pages (Anthropic, OpenAI) make this trivial to forecast for a fixed N.
The non-obvious lever is N as a function of input difficulty. A router (P02) can pick N based on initial confidence: easy inputs use N=1, hard inputs use N=5 with voting. This adapts cost to demand and is a common composition of the two patterns.
Failure mode
Sectioning fails when the parts are not actually independent and the aggregator must reconcile contradictions it cannot resolve. Voting fails when all N samples share the same systematic error (the modal answer is wrong because the model is consistently wrong). See aggregation failures.
Glossary
See parallelization, self-consistency, fan-out.
Foundational definitions on the sibling reference site: whatisanaiagent.com glossary.