Failure modes: a cited taxonomy.

Agent failure modes are not new categories of bug; they are old categories that compound through tool use, multi-step reasoning, and untrusted inputs. The taxonomy below is drawn from public research and vendor documentation.

Five classes of failure show up across the public literature on LLM agents. They are presented here from most-frequent (where practical mitigation matters most) to rarest. The frequency ordering reflects the public OWASP guidance and the relative volume of academic work on each class; no operator-specific ordering is implied.

1. Prompt injection

The single most-discussed agent risk. An attacker places instructions in data the agent reads (a webpage, an email, a document) and the agent, unable to distinguish trusted instructions from untrusted data, follows them. Direct prompt injection puts the instructions in the user input; indirect prompt injection (Greshake et al., 2023) hides them in tool outputs.

Cited as LLM01 in the OWASP Top 10 for Large Language Model Applications. Mitigation patterns documented in vendor guidance include input sanitisation, capability isolation (the agent cannot call dangerous tools when the input came from an untrusted source), and human approval for high-stakes actions.

2. Routing and planning loops

The agent picks the wrong sub-agent (in a routing pattern), or the orchestrator produces a plan with too many or too few workers, or the evaluator-optimizer never accepts a candidate (the refinement loop). Anthropic's “Building Effective Agents” describes caps on iteration count and worker count as the standard mitigations; both are deterministic guards around a non-deterministic decision.

In an evaluator-optimizer pattern, the evaluator never accepts the generator's candidate. Each iteration produces a result no better than the last. Without a hard iteration cap and a marginal-improvement detector, the loop runs to budget cap on every input. Mitigations are documented on the evaluator-optimizer pattern page.

2b. Orchestrator cost spike

In an orchestrator-worker pattern, the orchestrator's plan dispatches many more workers than the task requires. The plan is plausible, the workers run, the cost is many times the expected per-task cost. Worker caps mitigate this. See the orchestrator-worker pattern page.

3. Tool-call schema errors and hallucinated tool calls

The agent invents tool names that do not exist, supplies arguments in the wrong shape, or misinterprets tool output. Gorilla (Patil et al., 2023) documents the rate at which models hallucinate API calls and proposes retrieval-augmented tool selection as the mitigation. Schema validation at dispatch time (a deterministic gate) prevents malformed calls from executing.

3a. Aggregation failures

When N parallel calls disagree, the aggregator must reconcile contradictions it sometimes cannot resolve. Voting variants pick the modal answer, which fails when all N samples share the same systematic error. Sectioning variants stitch parts together, which fails when the parts are not actually independent. See parallelization.

4. Context blow-out and drift

Long-running agents accumulate transcript content until the context window is exhausted. When the agent truncates, it sometimes produces output that is plausible but disconnected from the truncated content, an effect described in vendor docs on long-context behaviour. Drift, the related failure in prompt chains, sees each step compound small errors from the previous step until the final output is unrecognisable. Gates between steps catch drift; checkpointing and summarisation mitigate context blow-out.

5. Capability and data exfiltration

Agents with broad tool access expose unexpected attack surfaces. An agent that can read email, browse the web, and send messages is a phishing engine if compromised. The OWASP Top 10 names LLM02 Sensitive Information Disclosure and LLM06 Excessive Agency for these classes. Mitigations are architectural rather than prompt-level: least- privilege tool exposure, human-in-the-loop on high-stakes actions, audit logging of all tool calls.

Glossary

See prompt injection, hallucination, tool call, context window, confidence gate.