0.001% agent operator sheet
The frontier is not one framework. It is control over failure modes.
Use this page when building or studying agents. The strongest pattern is to begin with the low-level failure, then choose the smallest architecture that can control it.
Design laws
| Pattern | When it matters |
|---|---|
| Workflow before agent | If the path is known, use deterministic code plus model calls. Add autonomy only where observations change the route. |
| Orchestrator-worker | Use for breadth-heavy work where subproblems are independent and can return structured artifacts. |
| Evaluator-optimizer | Use when there is a clear rubric, test, verifier, or judge that can drive iteration. |
| Tool ergonomics | A tool description is an interface contract. Bad schemas create bad reasoning. |
| Memory hygiene | Store facts, plans, and artifacts separately. Treat retrieved memory as untrusted until checked. |
| End-state evals | Judge whether the environment reached the desired state, not whether the agent followed your expected path. |
| Security envelope | Least privilege, approval gates, prompt-injection boundaries, traceability, and rollback. |
| Cost-aware scaling | Parallel agents buy performance with tokens. Use them only when the task value justifies it. |
Frontier sources to keep current
UC Berkeley Agentic AI Fall 2025Course syllabus for agentic AI foundations, applications, evals, and safety.UC Berkeley Advanced LLM Agents Spring 2025Advanced course on reasoning, planning, code, math, and verification.OpenAI Agents SDK toolsAgent definitions, tools, MCP, specialist-as-tool, orchestration, guardrails, and observability.Anthropic multi-agent research systemProduction lessons for orchestrator-worker research agents, token scaling, and evals.Anthropic trustworthy agentsCurrent product and research view on trustworthy agent deployment.OpenAI BrowseCompBenchmark for persistent web-browsing research agents.OSWorldReal desktop-environment benchmark for multimodal computer-use agents.Model Context ProtocolOpen protocol for connecting models to tools, resources, prompts, and systems.