EPISODE · Apr 17, 2026 · 30 MIN
Episode 49: Rethinking AI Agent Evaluations
from AWS re:Think Podcast
In this episode we explore how companies should evaluate AI agents across multiple dimensions — including correctness, tool selection, multi-turn reasoning, and safety . The conversation covers building reliable evaluation frameworks, balancing automated vs. human-in-the-loop testing, and leveraging observability to debug agent behavior in production.Links from the ShowAgentCore Evaluation: https://github.com/awslabs/agentcore-samples/tree/main/01-tutorials/07-AgentCore-evaluationsStrands Evaluation: https://strandsagents.com/docs/user-guide/evals-sdk/quickstart/AWS Hosts: Nolan Chen & Malini ChatterjeeEmail Your Feedback: [email protected]
NOW PLAYING
Episode 49: Rethinking AI Agent Evaluations
No transcript for this episode yet
Similar Episodes
Mar 26, 2026 ·1m
Mar 19, 2026 ·34m
Feb 18, 2026 ·11m
Feb 11, 2026 ·45m