How to build an AI agent.

A step-by-step tutorial built from public vendor docs. The goal is the simplest agent that demonstrates the loop, tool use, and a budget cap. Frameworks come later, only when the simple case proves insufficient.

Step 1: Pick the simplest pattern that works

Anthropic's “Building Effective Agents” is explicit on this: start with a single LLM call augmented with retrieval and tools. Add complexity (chains, routing, orchestrators) only when measurement shows the simple case is insufficient. The same advice appears in OpenAI's and Google's respective guides.

For a first agent, the choice is between:

A simple tool-using model call.One round-trip: the model is invoked with the user's task, optionally calls one tool, returns an answer.
A short prompt chain. Two or three sequential calls with deterministic gates between them. See prompt chaining.

Step 2: Define tools

A tool is a function the model can call. Each tool needs a name, a description, and a JSON schema for its arguments. Vendor docs cover registration:

Anthropic tool-use docs
OpenAI function-calling guide
The Model Context Protocol introduction (vendor-agnostic)

A tool definition (Anthropic SDK form):

# tool definition (Anthropic SDK form, simplified)
tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a city.",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": { "type": "string" },
            },
            "required": ["city"],
        },
    },
]

Step 3: Implement the loop

On each iteration, call the model with the current transcript, parse the response, dispatch any tool calls the model requested, append the tool results to the transcript, repeat. Anthropic and OpenAI both publish reference loops in their cookbooks (Anthropic cookbook, OpenAI cookbook).

# minimal agent loop (pseudocode)
def run_agent(task, tools, max_iter=8):
    messages = [{ "role": "user", "content": task }]
    for _ in range(max_iter):
        response = model.call(messages, tools=tools)
        if response.stop_reason == "end_turn":
            return response.text
        if response.stop_reason == "tool_use":
            result = dispatch(response.tool_call)
            messages.append(response.assistant_message)
            messages.append({ "role": "tool", "content": result })
    raise BudgetExceeded("hit max_iter")

Step 4: Add a budget cap

Without a cap, an agent that gets stuck will run until something else stops it. Cap two things: the iteration count and the total token budget. Both are visible in the responses returned by the model APIs (vendor docs describe the per-response token accounting).

Step 5: Add gates

Validate tool inputs and outputs against their schemas before dispatching. Fail fast on malformed inputs. Reject responses whose shape disagrees with the schema. Gates are deterministic, cheap, and substantially reduce the surface area for failure modes (see failure modes).

Step 6: Evaluate

Run the agent against a representative test set with known-correct outputs. The metrics that matter are reliability (does it succeed consistently), cost (per-task tokens including retries), and latency. Public benchmarks for full-agent evaluation include AgentBench, SWE-Bench, and GAIA. See evaluating an agent for how to read these.

Step 7: Decide whether to add a framework

A handwritten loop with two or three tools and one or two patterns does not need a framework. Frameworks become useful when the agent's structure requires features the framework provides for free: state checkpoints, durable execution, multi-agent orchestration, observability hooks. See frameworks for the category overview.

Glossary

See tool use, agent loop, budget cap, gate.