Multi-agent systems.

A multi-agent system is a system in which two or more LLM-based agents collaborate, usually under a coordinator. The orchestrator-worker pattern is the simplest case; more elaborate architectures introduce role specialisation, message buses, and shared memory.

Why multi-agent

Anthropic's “Building Effective Agents” treats multi-agent as a special case of orchestrator-worker, with the recommendation: use a multi-agent system only when the task structure justifies it. Most production cases that look like “multi-agent” can be handled by a single agent with a larger toolset and a clearer plan. The cases that genuinely benefit from multi-agent are those in which:

Distinct competences (researcher, planner, executor, critic) warrant distinct prompts and possibly distinct models.
The agents must operate concurrently with shared state, not just be invoked in sequence.
The system models a problem that is itself multi-actor (a negotiation, a debate, a multi-party simulation).

Coordination patterns

Hierarchical (orchestrator-worker)

One coordinator decomposes work and delegates to specialised workers. This is the orchestrator-worker pattern, named by Anthropic and documented in detail. The coordinator owns scheduling, the workers own execution. Most production multi-agent systems are this shape.

Conversational (peer-to-peer)

Two or more agents exchange messages. The seminal academic reference is CAMEL (Li et al., 2023), which introduced communicative agents that role-play to solve a task collaboratively. The framework analogue is AutoGen (Wu et al., 2023), which models multi-agent conversation as the primitive.

Simulation (many-agent)

Larger populations of agents acting in a shared environment. Park et al., “Generative Agents” (2023) simulates a small town of LLM-driven agents with memories, plans, and reflection. The simulation use case is well-documented; the production-system use case rarely scales to many agents because coordination overhead grows quickly.

Communication models

Multi-agent systems differ in how agents exchange information. Three patterns appear in framework documentation:

Shared memory. Agents read from and write to a common state object. Simple to reason about; coordination overhead is low. LangGraph models this as a typed state on the graph.
Direct messaging. Agents send messages to each other. AutoGen models this as a conversation history.
Publish-subscribe bus.Agents publish events to a bus; other agents subscribe to the events they care about. CrewAI's flow events use this shape.

Public examples

CAMEL and the underlying paper: reference for role-playing dual-agent systems.
AutoGen: multi-agent conversation framework from Microsoft Research.
CrewAI: role-and-task multi-agent framework.
LangGraph multi-agent docs: graph-based multi-agent shapes (supervisor, network, hierarchy).
OpenAI Swarm: experimental routines and handoffs for multi-agent.

Cost considerations

Multi-agent cost is the sum of per-agent costs plus the coordinator's own LLM call. Compared to a single-agent equivalent, multi-agent typically costs more for two reasons: duplicated context (each agent receives some shared context anew) and coordination overhead (the coordinator's call is pure overhead). Vendor docs on prompt caching reduce the duplication cost; nothing reduces the coordination cost without reducing the agent count.

Failure modes specific to multi-agent

Multi-agent systems inherit single-agent failures and add their own:

Coordination overhead exceeds task value. The coordinator spends more compute managing the agents than the agents save by parallelising.
Disagreement deadlocks. Two agents reach contradictory conclusions and the coordinator cannot pick. A tie-break rule is required up-front.
Context fragmentation. Each agent has only part of the picture; decisions are locally optimal but globally wrong.

See failure modes for the broader taxonomy and orchestrator-worker for the canonical multi-agent pattern.

Glossary

See multi-agent system, coordinator, worker, message bus.