What it actually does
AutoGen models multi-agent collaboration as a conversation: agents send messages to each other until the task is done. The pattern is expressive and matches a lot of academic research on multi-agent systems. The cost economics differ from LangGraph or CrewAI because each conversational turn is a model call.
What is good
- Expressive multi-agent shapes. Patterns that are awkward in role-based or graph-based frameworks come naturally as conversations.
- Microsoft research backing means the framework follows the literature on multi-agent coordination.
- Code execution agent is competent and configurable.
What is broken or surprising
- Cost. Conversations have many turns; many turns are many model calls. We have observed AutoGen run 20+ LLM calls per task on workloads where LangGraph would use 3-5. The expressiveness has a price.
- Determinism. Conversations are less deterministic than graphs. For production this is a debugging tax.
- Scaling production deployments is the part the literature does not cover.
When you would choose it
Pick AutoGen for research-shaped problems where conversational coordination is the natural model, and for prototyping multi-agent shapes that do not fit a graph cleanly. Skip AutoGen for cost-sensitive production workloads at scale; the per-task cost will compound. The honest comparison lives at autogen-vs-crewai and autogen-vs-langgraph.
Cost at scale
Open source; cost is the model passthrough, which is the catch. 20+ calls per task at frontier-model rates compounds. Cap conversation length explicitly; the framework does not enforce it for you.
Read next

Oliver runs Digital Signet, a research and product studio that operates ~500 production sites with AI agents as the engineering layer. The Digital Signet portfolio is built using a continuous AI-agent build pipeline, one of the largest agent-operated publishing operations on the open web. The handbook draws directly from those deployments: real cost data, real failure modes, real recovery patterns.