What it actually does
CrewAI lets you define agents in role-shaped abstractions (a researcher, a writer, a critic) and have them collaborate on a task. The mental model is clean and the prototyping ergonomics are excellent. For a working multi-agent prototype in an afternoon, CrewAI is the fastest path.
We use LangGraph in production. We tried CrewAI for two months. We moved off CrewAI because, around five concurrent agents, the role-based coordination overhead becomes the bottleneck. Your project may not hit that ceiling. If it will not, CrewAI is the faster path to a working prototype, and that is genuinely valuable.
What is good
- Role-thinking ergonomics. The agent abstractions match how engineers think about teams; that fit accelerates prototyping.
- Documentation is approachable. The on-ramp is the gentlest of the four major frameworks.
- Production-grade for small crews. 2-3 agent setups run without drama.
What is broken or surprising
- The five-agent ceiling. Past five concurrent agents, the coordination overhead grows faster than the work. We measured this consistently across two months in production.
- State management is implicit; for failure-recovery the implicit state is harder to checkpoint than LangGraph's explicit graph state.
- Performance at scale trails LangGraph because of the role-coordination layer.
When you would choose it
Pick CrewAI if you expect to ship fast and your agent count stays below 5. Pick LangGraph if you expect to scale. The honest comparison lives at langgraph-vs-crewai; against AutoGen, at autogen-vs-crewai.
Cost at scale
Open source. Cost is model passthrough. Coordination overhead at scale shows up as additional model calls, which shows up as cost; budget for it.
Read next

Oliver runs Digital Signet, a research and product studio that operates ~500 production sites with AI agents as the engineering layer. The Digital Signet portfolio is built using a continuous AI-agent build pipeline, one of the largest agent-operated publishing operations on the open web. The handbook draws directly from those deployments: real cost data, real failure modes, real recovery patterns.