Agent Evaluation

Concepts, patterns, and practical guidance on Agent Evaluation within Agents and Orchestration.

4 articles 0 subtopics 1 topics

Articles in This Topic

Agent Evaluation: Task Success, Cost, Latency
Agent Evaluation: Task Success, Cost, Latency Agent systems can look impressive in a demo while failing quietly in production. The gap is not only model quality. It is evaluation discipline. A deployed agent is a workflow engine that reads, plans, calls tools, and produces outcomes under constraints. Evaluating an agent means evaluating the workflow, not […]
Agent Handoff Design: Clarity of Responsibility
Agent Handoff Design: Clarity of Responsibility Handoffs are where agent systems either become trustworthy infrastructure or become a source of quiet risk. A handoff happens whenever responsibility moves from one actor to another: from agent to human, from agent to another service, from agent to a different role, or from one stage of a workflow […]
Memory Systems: Short-Term, Long-Term, Episodic, Semantic
Memory Systems: Short-Term, Long-Term, Episodic, Semantic Memory is the difference between an agent that answers questions and an agent that can carry work across time. It is also the difference between a system that quietly accumulates risk and a system that stays accountable. “Memory” is not a single feature. It is a set of storage […]
Planning Patterns: Decomposition, Checklists, Loops
Planning Patterns: Decomposition, Checklists, Loops An agent that takes action without a plan is fast until it is wrong. An agent that plans without acting is safe until it is useless. The practical craft is not “planning” as a philosophical concept, but planning as a set of patterns that keep multi-step work inside budgets while […]

Subtopics

No subtopics yet.

Core Topics

Related Topics

Agents and Orchestration
Tool-using systems, planning, memory, orchestration, and operational guardrails.
Failure Recovery Patterns
Concepts, patterns, and practical guidance on Failure Recovery Patterns within Agents and Orchestration.
Guardrails and Policies
Concepts, patterns, and practical guidance on Guardrails and Policies within Agents and Orchestration.
Human-in-the-Loop Design
Concepts, patterns, and practical guidance on Human-in-the-Loop Design within Agents and Orchestration.
Memory and State
Concepts, patterns, and practical guidance on Memory and State within Agents and Orchestration.
Multi-Agent Coordination
Concepts, patterns, and practical guidance on Multi-Agent Coordination within Agents and Orchestration.
Multi-Step Reliability
Concepts, patterns, and practical guidance on Multi-Step Reliability within Agents and Orchestration.
Planning and Task Decomposition
Concepts, patterns, and practical guidance on Planning and Task Decomposition within Agents and Orchestration.
Sandbox and Permissions
Concepts, patterns, and practical guidance on Sandbox and Permissions within Agents and Orchestration.
Tool Use Patterns
Concepts, patterns, and practical guidance on Tool Use Patterns within Agents and Orchestration.
AI Foundations and Concepts
Core concepts and measurement discipline that keep AI claims grounded in reality.
AI Product and UX
Design patterns that turn capability into useful, trustworthy user experiences.
Business, Strategy, and Adoption
Adoption strategy, economics, governance, and organizational change driven by AI.
Data, Retrieval, and Knowledge
Data pipelines, retrieval systems, and grounding techniques for trustworthy outputs.
Hardware, Compute, and Systems
Compute, hardware constraints, and systems engineering behind AI at scale.
AI
A structured directory of AI topics, organized around innovation and the infrastructure shift shaping what comes next.