What happened
Last month our orchestrator decided to spawn 47 workers for a task that should have used three. The bill for that single run was $4.20. Across our pipeline that is a five-figure annual leak if we do not catch it.
Why it happened
The orchestrator's planning prompt was permissive about subdivision. We had said "break this task down as needed." The model interpreted "as needed" permissively. For most inputs the planner was conservative; for one input class the planner was not.
The fix
We added a max-workers-per-task cap on the orchestrator's plan, enforced at dispatch time. The cap is part of the planning prompt and the dispatch layer rejects any plan that exceeds it. Both layers, not just one. Before the fix: 47 workers. After: 4 workers, no quality loss, 91% cost reduction on that task class.
The detection
We caught the spike via a post-run cost-per-task alert. The alert fires on any task that exceeds P99 historical cost. The alert is the safety net; the cap is the prevention. You want both.
The lesson
The Cost Cliff (level 2 of the Failure Pyramid) is preventable with a single hard cap, enforced at dispatch, made visible in the planning prompt. Without both layers, the failure mode comes back the next time the model gets creative.
Read next

Oliver runs Digital Signet, a research and product studio that operates ~500 production sites with AI agents as the engineering layer. The Digital Signet portfolio is built using a continuous AI-agent build pipeline, one of the largest agent-operated publishing operations on the open web. The handbook draws directly from those deployments: real cost data, real failure modes, real recovery patterns.