BenchTalks podcast artwork

PODCAST · science

BenchTalks

BenchTalks is Snorkel AI's podcast series at the intersection of AI evaluation, data quality, and real-world impact. Hosted by the Snorkel team, each episode brings together researchers, practitioners, and leaders to dig into the questions that matter most as AI benchmarks grow more sophisticated, dynamic, and reflective of the complexity found in real-world deployments.We explore the full stack of what it takes to build AI that actually works — from the design of rigorous, open benchmarks that close the gap between what we measure and what we encounter in production, to the expert-in-the-loop data creation and curation pipelines that make reliable evaluation possible. Along the way, we get into reinforcement learning, reward modeling, and the evolving science of data quality that underpins it all.Whether you're building agents that operate over long horizons, crafting rubrics that go beyond pass/fail, or trying to understand what "g

  1. 1

    Benchtalks #1: Alex Shaw (Terminal-Bench, Harbor) - Building the benchmark factory

    In this inaugural episode of Benchtalks, Snorkel AI co-founder Vincent Chen sits down with Alex Shaw, MTS at Laude Institute and co-creator of Terminal-Bench, to unpack what the rapid hill-climbing on TB2 reveals about the state of AI agent evaluation — and where the field needs to go.This interview covers: Why TB2 went from 20–30% during development to 75–80% at the frontier todayThe bet on the terminal as the right abstraction for general computer useHow Harbor became a benchmark factory — and why that matters for RL post-trainingThe "benchmaxxing" problem and how the community is keeping TB2 honestWhat Terminal-Bench 3 needs from expert contributors to shape model development for the next yearFull interview/transcript: https://snorkel.ai/blog/benchtalks-alex-shaw-terminal-bench-harbor-building-the-benchmark-factory/

Type above to search every episode's transcript for a word or phrase. Matches are scoped to this podcast.

Searching…

We're indexing this podcast's transcripts for the first time — this can take a minute or two. We'll show results as soon as they're ready.

No matches for "" in this podcast's transcripts.

Showing of matches

No topics indexed yet for this podcast.

Loading reviews...

ABOUT THIS SHOW

BenchTalks is Snorkel AI's podcast series at the intersection of AI evaluation, data quality, and real-world impact. Hosted by the Snorkel team, each episode brings together researchers, practitioners, and leaders to dig into the questions that matter most as AI benchmarks grow more sophisticated, dynamic, and reflective of the complexity found in real-world deployments.We explore the full stack of what it takes to build AI that actually works — from the design of rigorous, open benchmarks that close the gap between what we measure and what we encounter in production, to the expert-in-the-loop data creation and curation pipelines that make reliable evaluation possible. Along the way, we get into reinforcement learning, reward modeling, and the evolving science of data quality that underpins it all.Whether you're building agents that operate over long horizons, crafting rubrics that go beyond pass/fail, or trying to understand what "g

HOSTED BY

Snorkel AI

CATEGORIES

Frequently Asked Questions

How many episodes does BenchTalks have?

BenchTalks currently has 1 episodes available on PodParley. New episodes are automatically indexed when they're published to the podcast feed.

What is BenchTalks about?

BenchTalks is Snorkel AI's podcast series at the intersection of AI evaluation, data quality, and real-world impact. Hosted by the Snorkel team, each episode brings together researchers, practitioners, and leaders to dig into the questions that matter most as AI benchmarks grow more sophisticated,...

How often does BenchTalks release new episodes?

BenchTalks has 1 episodes. Check the episode list to see recent publication dates and frequency.

Where can I listen to BenchTalks?

You can listen to BenchTalks on PodParley by clicking any episode. We provide an embedded audio player for direct listening, and you can also subscribe via your preferred podcast app using the RSS feed.

Who hosts BenchTalks?

BenchTalks is created and hosted by Snorkel AI.
URL copied to clipboard!