Definition
“Prompt chaining decomposes a task into a sequence of steps, where each LLM call processes the output of the previous one. You can add programmatic checks (see ‘gate’ in the diagram below) on any intermediate steps to ensure that the process is still on track.”
From Anthropic, “Building Effective Agents”, December 2024.
What it does
The pattern is the simplest of the five. It accepts an input, passes it to a first prompt, then passes that prompt's output to a second prompt, and so on, until a terminal step produces output. Optionally, deterministic checks (a gate) sit between steps and either pass the result through or short-circuit the chain with an error.
The strength of the pattern is that each step has fewer degrees of freedom than a single large prompt would. A small step is more accurate, more debuggable, and easier to evaluate in isolation. The trade-off is latency: the chain runs sequentially.
When it is appropriate
Prompt chaining is appropriate when the task has a clean decomposition into stages whose interfaces can be specified. Common applications include:
- Outline then draft then revise. Each stage has a clear acceptance contract.
- Parse then validate then transform. A deterministic gate after parsing rejects malformed inputs before the more expensive transform call fires.
- Translate then localise then proofread. Each step is a separate competence with a separable evaluation.
The pattern is not appropriate when steps are interdependent in a way that requires the model to plan dynamically. In that case, see orchestrator-worker.
Public examples
The simplest reference implementation is in the Anthropic Cookbook. The OpenAI cookbook hosts equivalent examples. In framework code:
- LangGraph models a chain as a directed graph of nodes; the linear case is a degenerate graph.
- CrewAI exposes
sequentialtask lists as a first-class process type. - DSPy composes
Modules; a chain is the function composition of two modules.
Cost considerations
The headline cost of a chain is the sum of per-step token usage. Vendor pricing pages publish per-million-token rates: see Anthropic's pricing, OpenAI's pricing, and Google's pricing for current rates.
Two non-obvious cost dynamics matter. First, context grows with chain depth: if each step receives the prior step's output as context, deeper chains cost super-linearly. Vendor docs on prompt caching (Anthropic, OpenAI) describe how shared prefixes can reduce that overhead. Second, gates can fail-fast and prevent later (more expensive) calls from firing on doomed inputs.
Failure mode
The dominant failure mode in chains is drift: each step compounds small errors from the previous step until the final output is unrecognisable from the desired result. Anthropic's own documentation recommends gates between steps for exactly this reason. See failure modes for the broader taxonomy.
Glossary
See prompt chaining, gate, prompt caching.
Foundational definitions on the sibling reference site: chain of thought.