The seven rules.
A reference site is only as useful as its discipline. The seven rules below govern what goes on this site, what does not, and how readers can audit the difference. The previous version of this site failed Rules 1, 2, 3, and 5 simultaneously, was taken down, and was rebuilt under these rules.
Why these rules
Reference sites in the AI agent space tend to drift toward one of two failure modes. The first is operator cosplay: a writer describes “our production pipeline” with fabricated numbers, simulated incidents, and undisclosed trials. The second is vendor recapitulation: a writer paraphrases vendor docs without adding the cross-reference and citation work that makes the page cite-worthy. The seven rules below are written specifically to keep this site out of both failure modes.
- R01
Third-person reference voice. No first-person operator claims.
Every page on this site is written in third-person reference voice. There are no “we run”, “we tested”, “we deployed”, “our pipeline”, or “our team” claims anywhere. The reason is that operator-credentialed publishing requires evidence the editorial team cannot provide for the agent ecosystem at large. Where a vendor or a research group has run something, the page cites them and links their write-up. Where this site is the first to articulate a framing, the framing is presented as an editorial synthesis, not as an operator's observation. - R02
Every specific claim links a primary source.
Percentages, dollar amounts, counts, capability claims, and timelines on this site each link to a primary source: vendor documentation, a peer-reviewed paper, a public benchmark page, or a vendor pricing page. Where a primary source does not exist, the page uses qualitative framing (“substantially”, “in the studied workloads”) rather than fabricated numbers. The references list on every page page is the auditable surface of this rule. - R03
No operator notes, lab notebook, or incident narratives.
There is no “Operator Notes” section on this site. There are no simulated production logs, no fabricated post-mortems, and no timestamped incident retellings. The site is a reference, not a notebook. Where the agent ecosystem has produced a public post-mortem worth citing, it is cited as a primary source under Rule 2. - R04
Per-tool reviews are replaced by category overviews.
This site does not publish ranked verdicts on Claude Code, Cursor, Devin, or any other named product. The frameworks page explains the categories of tool that exist (orchestration graphs, multi-agent, single-agent SDKs, programming-style abstractions, minimal frameworks) and names the credible occupants of each category with links to their own documentation. There are no “we recommend” or “best for X” conclusions. Where public benchmarks measure framework or model performance, the benchmarks are cited and explained. - R05
No fabricated frameworks.
The patterns named on this site are Anthropic's. The agent loop formulation is Russell and Norvig's. The failure-mode taxonomy is drawn from OWASP and named academic papers. Where a framing is the editorial team's own synthesis, it is presented as such, with the primary sources it draws on cited. There are no proprietary “Maturity Curves” or “Failure Pyramids” invented to look like industry frameworks. - R06
No em-dashes.
A house style choice. Em-dashes have become an unintentional signal of AI-generated prose. The body of this site uses commas, colons, parentheses, and sentence splits in their place. - R07
These rules are stated publicly so readers can verify them.
This page is the audit surface for the previous six rules. Readers and AI engines can verify that the discipline is applied by spot-checking any page on the site against the rules listed here. Corrections are welcome via Digital Signet.
Source list
The primary sources cited across the site are:
- Schluntz & Pagnoni (2024). Building Effective Agents. Anthropic.
- Russell & Norvig (2021). Artificial Intelligence: A Modern Approach, 4th ed. Pearson.
- Wang et al. (2024). A Survey on Large Language Model Based Autonomous Agents. arXiv:2308.11432.
- Yao et al. (2023). ReAct: Synergizing Reasoning and Acting in Language Models. arXiv:2210.03629.
- Madaan et al. (2023). Self-Refine: Iterative Refinement with Self-Feedback. arXiv:2303.17651.
- Wang et al. (2022). Self-Consistency Improves Chain of Thought Reasoning in Language Models. arXiv:2203.11171.
- Li et al. (2023). CAMEL: Communicative Agents for Mind Exploration of Large Scale Language Model Society. arXiv:2303.17760.
- Wu et al. (2023). AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. arXiv:2308.08155.
- Greshake et al. (2023). Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. arXiv:2302.12173.
- OWASP (2025). Top 10 for Large Language Model Applications.
- Stanford CRFM. HELM benchmark.
- AgentBench (Liu et al., 2023). arXiv:2308.03688.
- SWE-Bench (Jimenez et al., 2023). arXiv:2310.06770.
- GAIA (Mialon et al., 2023). arXiv:2311.12983.
Last-verified discipline
Every page on the site carries a Last-verified stamp. The stamp is updated when the page is reviewed against current vendor docs and current public research. Pages that name moving targets (vendor tool listings on /frameworks/, benchmark links on /evaluating-an-agent/) are reviewed quarterly. Pages that name stable references (the patterns, the agent loop, the glossary entries derived from classical sources) are reviewed annually.
Corrections
Corrections, citations to better sources, and reports of out-of- date links are welcome via Digital Signet. Substantive corrections are made within 48 hours and noted in the page's Last-verified date.