A software system that uses an LLM to pursue a goal by perceiving its environment, deciding what to do next, taking actions through tools, observing the result, and iterating until the goal is reached or it gives up. Russell & Norvig's broader definition (anything with sensors and actuators) is the classical reference.
The four-step execution cycle: sense, think, act, observe. Iterates until a terminal condition or budget cap.
An informal adjective that means “has agent-like properties.” Useful as shorthand; not a formal taxonomy term.
A standardised dataset or environment with scoring rules. Public agent benchmarks include AgentBench, SWE-Bench, GAIA, ToolBench, HELM.
A hard limit on iterations or token usage that aborts an agent that gets stuck. Without a cap, the agent runs unbounded.
A function that maps inputs to discrete labels. In the routing pattern, a classifier picks which handler the input goes to.
A threshold check on a model's confidence in its own output. Below the threshold, the agent escalates or falls through to a different handler. Used as a mitigation for routing mis-classification.
The maximum number of tokens an LLM can process in one call. Modern flagship models support 100K-2M tokens depending on vendor and tier.
In a multi-agent system, the agent that owns scheduling and dispatch. Synonymous with orchestrator and supervisor in the relevant framework docs.
In a prompt chain, the cumulative effect of small per-step errors that compound until the final output is unrecognisable from the desired result. Mitigated by gates between steps.
A pattern in which a generator proposes a candidate, an evaluator critiques it, and the loop repeats until acceptance or a budget cap. Subsumes “LLM as judge.”
Fan-out: dispatching the same input to N parallel calls. Fan-in: aggregating the N results back into a single output.
The vendor-specific term for tool use, used by OpenAI, Google, and others. The model emits a structured JSON request to invoke a named function. Equivalent to Anthropic's tool use.
A deterministic check between LLM calls in a chain that either passes the result through or short-circuits the chain. Used to fail fast on malformed inputs.
Output that is fluent and plausible but not grounded in fact. In agents, the high-stakes form is hallucinated tool calls: the model invents a tool name that does not exist, or fabricates an argument that the tool cannot accept.
End-to-end wall-clock time. For an agent, the latency budget is usually larger than the per-call latency because the loop runs multiple times.
Using an LLM to evaluate another LLM's output against a rubric. The evaluator role in the evaluator-optimizer pattern.
A vendor-agnostic protocol for exposing tools, resources, and prompts to LLMs. Introduced by Anthropic and adopted by other vendors.
A publish-subscribe communication channel between agents. Agents publish events; other agents subscribe to the events they care about.
A system in which two or more LLM-based agents collaborate, usually under a coordinator. Most production multi-agent systems are an instance of the orchestrator-worker pattern.
A pattern in which a central LLM plans, dispatches subtasks to worker LLMs, and synthesises their results. The most expensive of the five patterns; benefits from worker caps.
A pattern in which an input is fanned out to N independent calls and the results are aggregated. Two flavours: sectioning (sub-tasks) and voting (the same task multiple times).
The role within an agent or multi-agent system that decides the sequence of actions. May be an explicit prompt, a separate LLM call, or implicit in the model's decision step.
A vendor-side optimisation that caches the result of computation on a shared prompt prefix. Reduces cost on repeated calls that share a common prefix.
A pattern in which LLM calls are arranged as a linear sequence. Each step's output is the next step's input. The simplest of the five patterns.
An attack in which adversarial instructions are placed in data the agent reads, so the agent treats them as instructions. Direct: in user input. Indirect: in tool output.
Reasoning + Acting: a prompting technique in which the model alternates between thoughts (chain-of-thought reasoning) and actions (tool calls). Yao et al., 2023.
An explicit critique step in the agent loop where the model reviews its own prior actions and revises the plan. The evaluator-optimizer pattern formalises reflection as a two-role loop.
The proportion of runs of the same task that succeed. For agents, reliability is more decision-relevant than peak capability because the consequence of unreliability is usually retry cost.
A pattern in which a classifier picks one of N specialised handlers based on the input. Adds a small classification cost per input and saves cost when most inputs route to a cheaper handler.
A voting variant of parallelization: the same prompt is sampled N times, and the modal answer is selected. Wang et al., 2022.
An evaluator-optimizer variant in which the same model plays both generator and evaluator. Madaan et al., 2023.
A single invocation of a tool by an agent. Includes the tool name and arguments emitted by the model. Equivalent to a function call in OpenAI/Google terminology.
The architectural difference between an agent and a standalone LLM. The agent calls external functions, reads their output, and decides what to do next.
In an orchestrator-worker or multi-agent system, an agent that executes a subtask dispatched by the coordinator.
Anthropic's distinction: a workflow follows a predefined path through code; an agent decides the path at runtime. Workflows fail predictably; agents fail in unanticipated ways.