The Agent Maturity Curve
Five stages of agentic deployment. Most pilots stop at Stage 1. Most production deployments are Stage 2. The interesting work happens between Stage 3 and Stage 5.
Prompt-and-Prayer
A single LLM call. No tool use. No recovery. Manual triggers. Most pilot deployments live here.
Markers:- Single inference per task
- No instrumentation; cost and quality are estimated, not measured
- Failures are detected by human review or by the absence of output
- Re-runs are manual
How to graduate: Add a tool call. Even a single deterministic tool drags you out of Stage 1. Add cost telemetry per run.
Tool-Augmented
Function calling, MCP, basic retries. Most production agents in 2026 are at Stage 2 and they describe themselves as production agents in 2026.
Markers:- One or more tools are exposed via function calling or MCP
- Retries on tool failures
- Cost is measured per run; quality is sampled
- Patterns are implicit; the engineer does not yet think in pattern names
How to graduate: Adopt explicit pattern vocabulary. Name your orchestrator-worker pattern as such. Instrument cost per pattern, not just per run.
Pattern-Instrumented
Explicit pattern adoption (orchestrator-worker, evaluator-optimiser). Per-pattern monitoring, telemetry, cost discipline. The pin discipline (pin model versions, test before upgrade) shows up here.
Markers:- Patterns are named in the code and in the runbook
- Per-pattern cost telemetry is plotted, not just logged
- Model versions are pinned; upgrades are tested
- The team has at least one Operator Note
How to graduate: Implement a Failure Pyramid awareness layer. Detect at least three of the five named failure modes automatically.
Self-Correcting
Failure-mode taxonomy implemented (Failure Pyramid awareness). Automatic recovery on the named failures. Cost ceiling enforcement. The team has a recovery pattern for at least three of five Pyramid levels.
Markers:- Cost-cliff alerts fire automatically and a cap kicks in
- Drift detection on at least one task class
- Confidence Gates on routing patterns
- Schema validation on tool outputs
How to graduate: Make the system propose its own pattern changes. Cross the boundary into Stage 5 in narrow domains first.
Self-Healing Operations
Agents observe their own failure rate, propose pattern changes, the pipeline evolves without human intervention. Aspirational. We are partially at Stage 5 in narrow domains.
Markers:- Pipeline auto-tunes thresholds (cost cap, retry, confidence)
- Pattern changes are proposed by an evaluator process
- Human approval is gated, not blocking, for low-stakes pattern updates
- The system documents its own changes in a Change Log Note
How to graduate: Stage 5 is the asymptote. Treat it as a direction, not a destination.
Where on the curve are you?
Seven questions, three answers each. Answer honestly. Stay on the page.
- 1. Do you measure cost per run for your agent system?
- 2. Do you use named patterns (orchestrator-worker, prompt-chaining, routing) in code or runbook?
- 3. Are model versions pinned, with upgrades tested before deploying?
- 4. Do you have automatic detection for any of the five Failure Pyramid modes?
- 5. Are cost ceilings enforced at the dispatch layer (not just monitored)?
- 6. Does any part of the system propose its own pattern changes?
- 7. Are you running at least one tool through MCP or function calling?
Where this fits with the rest of the handbook: the Failure Pyramid is the failure-mode taxonomy that Stage-3 and Stage-4 systems implement. The five patterns are the vocabulary Stage-3 systems start using explicitly. The Operator Notes are the recurring artefact a Stage-3-or-better team produces.

Oliver runs Digital Signet, a research and product studio that operates ~500 production sites with AI agents as the engineering layer. The Digital Signet portfolio is built using a continuous AI-agent build pipeline, one of the largest agent-operated publishing operations on the open web. The handbook draws directly from those deployments: real cost data, real failure modes, real recovery patterns.