Prompt Chaining: Production Cost, Drift, and the Three-Step Cap

When to use it

Prompt chaining is the right pattern when the work is genuinely linear: extract, then transform, then summarise. Each step does one thing. Each step has a clear input contract and a clear output contract. The cost is dominated by the most expensive single call, not by the chain.

In our pipeline this is the workhorse for content extraction and normalisation. It is also the right starting pattern when prototyping; chain the prompts before you reach for an orchestrator.

When not to use it

Past three steps. Drift compounds super-linearly: by step four the output is no longer measurably about the input. The error budget runs out before the cost budget does, which is a particularly demoralising kind of overspend.

Tasks where the steps need to share state more complex than a string. Once you are passing structured state between steps, you are reaching for an orchestrator-worker pattern with shared context, even if you have not named it that way yet.

Production cost data

We have telemetry on chained prompts across the bulk of our content pipeline. At depth 3 the cost-per-task is the cheapest of the five patterns by a meaningful margin. Roughly 60% of the cost of an equivalent orchestrator-worker pattern for the same task. At depth 5 the cost is roughly the same as orchestrator-worker, and the quality is worse. Beyond depth 5 prompt chaining is dominated.

The drift signal: at depth 3 we observed quality degradation in roughly 4% of runs. At depth 5 it was 18%. At depth 8 it was 54%. The numbers are pipeline-specific but the shape is universal: drift is super-linear.

Anti-patterns

Hidden depth. Chaining inside a sub-routine that is itself called from another chain. Suddenly you are at depth 8 and you did not realise. Audit total chain depth, not local chain depth.
The verification chain.Adding a final "check the previous output" step is an evaluator-optimiser pattern, not a prompt chain. Be explicit about which pattern you are running.
Re-passing the entire context every step. If your transform step needs only the extracted entities, do not pass the source text again. The full-context-per-step habit kills the cost advantage that prompt chaining is supposed to give you.

Sample code

# Linear prompt chain, three steps.
def chain(input_text):
    extracted = model.extract(input_text)
    transformed = model.transform(extracted)
    return model.summarize(transformed)

Cross-pattern interactions

Prompt chaining nests inside almost every other pattern. An orchestrator-worker pattern often has a small two-step chain inside the synthesis call. A routing pattern often has a chain inside the sub-agent. The discipline is to keep the chain short at every level: three steps maximum at any one nesting level.

Engineering FAQ

How many steps is too many in a prompt chain?

Three is the cheap window. At four, drift starts to dominate. At five and beyond, the routing or evaluator-optimiser pattern is almost always cheaper for equal output quality, because chained prompts re-pass the full context every step.

Why does drift compound super-linearly?

Each step is conditioned on the previous step's output. A small error in step 1 becomes the input premise for step 2. Step 3 is now solving a problem that has already drifted. The cost grows roughly linearly per step; the error budget grows roughly geometrically.

Can I prevent drift with a verification step?

Yes, and at that point you are running an evaluator-optimiser pattern, not a prompt chain. If you find yourself adding a verification step, switch patterns explicitly. The mental model matters.

Prompt Chaining