Agent frameworks: a category overview.
The landscape of agent frameworks groups into five categories with overlapping but distinct trade-offs. Each category has a few credible occupants. This page names them with links to their own documentation. There are no ranked verdicts on this page: vendor-published trade-offs and public benchmarks are the references.
The choice of framework is downstream of the choice of pattern. Once the agent's shape is decided (one of the five patterns, or a composition of them), the relevant question is which framework offers the primitives that pattern needs without paying for primitives it does not. The categories below sort frameworks by the primitives they emphasise, not by quality.
The general advice in vendor docs is consistent: start without a framework, write the loop by hand, add a framework when the application's structure makes the framework's primitives cheaper than rewriting them. See Anthropic's “Building Effective Agents” and the OpenAI cookbook for the equivalent advice from each vendor.
Orchestration graphs
Frameworks that model an agent as a directed graph of nodes (LLM calls, tools, conditionals) with explicit state and edges. Strong for production: durable execution, checkpointing, observability are first-class.
Multi-agent / role-based
Frameworks that compose multiple specialised agents, often with explicit role prompts (planner, researcher, executor) and a coordinator. Closer to the orchestrator-worker pattern by default.
Single-agent SDKs
Library-level wrappers around vendor APIs that handle the loop, tool calls, and structured outputs without a graph or orchestration layer. Suited to a one-agent-with-tools shape.
Programming-style abstractions
Frameworks that compile prompts and pipelines from declarative descriptions, treating agents as programs to optimise rather than scripts to run.
Minimal / from-scratch
Sub-frameworks at the size of a single file. Suitable for educational use, prototypes, or when the production path is to write the loop by hand and add only what the application needs.
Framework rankings tend to age poorly because the underlying models change faster than the frameworks. Public agent benchmarks (SWE-Bench, AgentBench, GAIA) measure full-agent performance, not framework performance. See evaluating an agent for how to read these.