ORIGINAL FRAMEWORK

The Agent Maturity Curve

Five stages of agentic deployment. Most pilots stop at Stage 1. Most production deployments are Stage 2. The interesting work happens between Stage 3 and Stage 5.

By Oliver Wakefield-Smith, Digital Signet

Last verified April 2026

Prompt-and-Prayer

A single LLM call. No tool use. No recovery. Manual triggers. Most pilot deployments live here.

Markers:

Single inference per task
No instrumentation; cost and quality are estimated, not measured
Failures are detected by human review or by the absence of output
Re-runs are manual

How to graduate: Add a tool call. Even a single deterministic tool drags you out of Stage 1. Add cost telemetry per run.

Tool-Augmented

Function calling, MCP, basic retries. Most production agents in 2026 are at Stage 2 and they describe themselves as production agents in 2026.

Markers:

One or more tools are exposed via function calling or MCP
Retries on tool failures
Cost is measured per run; quality is sampled
Patterns are implicit; the engineer does not yet think in pattern names

How to graduate: Adopt explicit pattern vocabulary. Name your orchestrator-worker pattern as such. Instrument cost per pattern, not just per run.

Pattern-Instrumented

Explicit pattern adoption (orchestrator-worker, evaluator-optimiser). Per-pattern monitoring, telemetry, cost discipline. The pin discipline (pin model versions, test before upgrade) shows up here.

Markers:

Patterns are named in the code and in the runbook
Per-pattern cost telemetry is plotted, not just logged
Model versions are pinned; upgrades are tested
The team has at least one Operator Note

How to graduate: Implement a Failure Pyramid awareness layer. Detect at least three of the five named failure modes automatically.

Self-Correcting

Failure-mode taxonomy implemented (Failure Pyramid awareness). Automatic recovery on the named failures. Cost ceiling enforcement. The team has a recovery pattern for at least three of five Pyramid levels.

Markers:

Cost-cliff alerts fire automatically and a cap kicks in
Drift detection on at least one task class
Confidence Gates on routing patterns
Schema validation on tool outputs

How to graduate: Make the system propose its own pattern changes. Cross the boundary into Stage 5 in narrow domains first.

Self-Healing Operations

Agents observe their own failure rate, propose pattern changes, the pipeline evolves without human intervention. Aspirational. We are partially at Stage 5 in narrow domains.

Markers:

Pipeline auto-tunes thresholds (cost cap, retry, confidence)
Pattern changes are proposed by an evaluator process
Human approval is gated, not blocking, for low-stakes pattern updates
The system documents its own changes in a Change Log Note

How to graduate: Stage 5 is the asymptote. Treat it as a direction, not a destination.

SELF-ASSESSMENT

Where on the curve are you?

Seven questions, three answers each. Answer honestly. Stay on the page.

1. Do you measure cost per run for your agent system?
2. Do you use named patterns (orchestrator-worker, prompt-chaining, routing) in code or runbook?
3. Are model versions pinned, with upgrades tested before deploying?
4. Do you have automatic detection for any of the five Failure Pyramid modes?
5. Are cost ceilings enforced at the dispatch layer (not just monitored)?
6. Does any part of the system propose its own pattern changes?
7. Are you running at least one tool through MCP or function calling?

Where this fits with the rest of the handbook: the Failure Pyramid is the failure-mode taxonomy that Stage-3 and Stage-4 systems implement. The five patterns are the vocabulary Stage-3 systems start using explicitly. The Operator Notes are the recurring artefact a Stage-3-or-better team produces.

ABOUT THE AUTHOR

Oliver Wakefield-Smith

Founder, Digital Signet

Oliver runs Digital Signet, a research and product studio that operates ~500 production sites with AI agents as the engineering layer. The Digital Signet portfolio is built using a continuous AI-agent build pipeline, one of the largest agent-operated publishing operations on the open web. The handbook draws directly from those deployments: real cost data, real failure modes, real recovery patterns.

oliver@digitalsignet.com|About this site|Digital Signet