EPISODE · Oct 16, 2024 · 16 MIN
(Voiceover) Building on evaluation quicksand
from Interconnects · host Nathan Lambert
Read the full post here: https://www.interconnects.ai/p/building-on-evaluation-quicksandChapters00:00 Building on evaluation quicksand01:26 The causes of closed evaluation silos06:35 The challenge facing open evaluation tools10:47 Frontiers in evaluation11:32 New types of synthetic data contamination13:57 Building harder evaluationsFiguresFig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/manual/openai-predictions.webp This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe
NOW PLAYING
(Voiceover) Building on evaluation quicksand
No transcript for this episode yet
Similar Episodes
May 20, 2026 ·8m
May 12, 2026 ·4m
Apr 28, 2026 ·7m
Apr 22, 2026 ·8m