EPISODE · Aug 26, 2025 · 30 MIN
The Judge Model Diaries: Judging the Judges
from YAAP (Yet Another AI Podcast) · host AI21 Labs
Your LLM gave a great answer. But who decides what “great” means? In this episode, Yuval talks with Noam Gat about judge language models — reward models, critic models, and how LLMs can be trained to rate, rank, and critique each other. They dive into the difference between scoring and feedback, how to use judge models during inference, and why most evaluation benchmarks don’t tell the full story. Turns out, getting a good answer is easy. Knowing it’s good? That’s the hard part.
NOW PLAYING
The Judge Model Diaries: Judging the Judges
No transcript for this episode yet
Similar Episodes
Mar 26, 2026 ·1m
Jan 2, 2026 ·47m
Dec 21, 2025 ·46m