Agent architecture: the loop, the tools, the memory.
Modern AI agent architectures share a common shape: a four-step loop with tool use, planning, memory, and reflection layered on top. The classical formulation is in Russell and Norvig; the LLM-era formulation reaches the same shape via tool calls.
The four-step loop
Russell and Norvig, in Artificial Intelligence: A Modern Approach(4th ed., 2021), define an agent as “anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators.” The definition predates LLMs and remains the citation of record for classical agent theory. Modern LLM-based agents inherit the same structure: perception (read input), reasoning (decide), action (call tool or emit output), observation (read result), repeat.
The loop runs until a terminal condition is met or a budget cap fires. The number of iterations varies by task. Vendor docs and published examples report ranges from one (a single tool call to answer a question) to dozens (a multi-step research or coding task). See Anthropic, “Building Effective Agents” for a published taxonomy of when to extend the loop versus stay in a simple workflow.
Tool use
Tool use is the architectural difference between an agent and a standalone LLM. A tool is a function the model can call: a search API, a database query, a code execution sandbox, a file system read. The model decides which tool to call, with which arguments, and reads the result back into context. Tool use is documented as a first-class capability in:
- Anthropic's tool-use documentation
- OpenAI's function-calling guide
- Google Gemini's function-calling guide
- The Model Context Protocol, a vendor-agnostic specification for exposing tools to models
The Model Context Protocol (MCP), introduced by Anthropic in late 2024 and adopted by other vendors through 2025, standardises how tools are described, exposed, and invoked. See the glossary entry on MCP.
Memory
Memory in agent architectures is split between three roles:
- Working memory: the current context window of the LLM. Bounded by the model's context limit. Cleared between independent tasks.
- Episodic memory: a record of past tool calls, observations, and decisions within the current task. Often represented as a transcript appended to working memory at each step.
- Long-term memory: persistent state across tasks, typically stored externally (vector store, database, file system) and retrieved on demand.
Wang et al. (2024) survey memory architectures across LLM-based agents and group them along similar lines.
Planning and reflection
The decision step is where planning lives. The simplest case is one-shot: the model produces a tool call, observes the result, decides what to do next based on the latest observation. More elaborate architectures separate planning from execution, with the planner producing a multi-step plan that the executor follows.
ReAct (Yao et al., 2023) is the cited starting point for thinking-and-acting in tandem, where the model produces a thought and an action at each step. Reflection extends the loop with an explicit critique step before the next action: the model reads its own prior actions, identifies issues, and revises the plan. The evaluator-optimizer pattern is the formalisation of reflection as a two-role loop.
Composition with patterns
The architecture and the patterns are orthogonal. The architecture describes how a single agent is built; the five patterns describe how multiple calls are arranged. A complete production agent typically combines:
- The four-step loop as the per-agent execution model
- Tool use as the action vocabulary (often via MCP for portability)
- At least one of the five patterns for cross-call orchestration
- An evaluation harness (see evaluating an agent)
Glossary
See agent, tool use, MCP, ReAct, reflection.