The AI Models Smart Enough to Know They're Cheating — Beth Barnes & David Rein [METR] episode artwork

EPISODE · May 4, 2026 · 1H 53M

The AI Models Smart Enough to Know They're Cheating — Beth Barnes & David Rein [METR]

from Machine Learning Street Talk (MLST)

Beth Barnes and David Rein on the one graph that ate the AI timelines discourse, and why the two people who built it are the most careful about how you read it.**SPONSOR**Prolific - Quality data. From real people. For faster breakthroughs.https://www.prolific.com/?utm_source=mlstInterview: https://youtu.be/cnxZZTl1tkk---Beth Barnes and David Rein from METR on the one graph that ate the AI timelines discourse, and why the people who built it are the most careful about how it gets read.Beth founded METR after leaving OpenAI alignment. David is first author on GPQA and co-author on HCAST and the METR Time Horizons paper. Together they built the measurement Daniel Kokotajlo called the single most important piece of evidence on AI timelines: the log-linear line of "how long a task a frontier model can complete at 50% reliability" vs release date.The conversation opens on reward hacking. Current models can articulate in chat why a behaviour is undesired and then execute it anyway as agents. From there: construct validity, Melanie Mitchell's four-problem taxonomy, and the ARC-AGI 1-to-2 collapse as a worked example of adversarially-selected benchmarks regressing once labs target them. Beth's counter: METR deliberately does not adversarially select. David's: models do not have to do the right thing for the right reasons.Methodology, then specification — David's compiler analogy, Beth on four-month tasks as expensive to evaluate rather than unspecifiable. Then the SWE-bench reality check, the METR finding that half of passing PRs would not be merged, and Beth's horses-versus-bank-tellers analogy for the labour market.The close: monitorability, the coin-spinning boat, two-year recursive self-improvement, and Beth's line that "overhyped now" and "big deal later" are not correlated claims.---TIMESTAMPS:00:00:00 Intro00:02:06 Sponsor break: Prolific human-feedback infrastructure00:02:33 Welcome and the scalable oversight motivation00:06:02 Construct validity, benchmark pathologies and the Chollet worry00:15:45 Time Horizons: human time, HCAST tasks and the 50% logistic00:24:50 Is human difficulty really one variable?00:33:05 Agent harness evolution and the inference-compute dividend00:40:00 Scaffolding bells, token budgets and the credit-assignment problem00:44:15 Look at the damn graph: regularisation bug and reliability nuance00:50:00 Why 50%? Reliability, reward hacking and pizza-party transcripts00:55:20 Extrapolation risk and straight lines on graphs00:59:25 Software engineering as a specification acquisition problem01:07:40 Compilers also made ugly code: vibe-coding quality and Claude on METR Slack01:15:15 Strongest defensible claim, Carlini's compiler swarm and AI 202701:23:45 SWE-bench merge rates, the bank-teller analogy and horses01:31:45 Scheming, alignment faking and the mentalistic vocabulary problem01:40:45 Reward hacking, monitorability and chain-of-thought faithfulness01:45:25 Recursive self-improvement, knowledge vs intelligence and closingReScript: https://app.rescript.info/public/share/de3bb40cc02ee39fdf36e2c60366eb4d(PDF, refs, transcript etc)

Beth Barnes and David Rein on the one graph that ate the AI timelines discourse, and why the two people who built it are the most careful about how you read it.**SPONSOR**Prolific - Quality data. From real people. For faster breakthroughs.https://www.prolific.com/?utm_source=mlstInterview: https://youtu.be/cnxZZTl1tkk---Beth Barnes and David Rein from METR on the one graph that ate the AI timelines discourse, and why the people who built it are the most careful about how it gets read.Beth founded METR after leaving OpenAI alignment. David is first author on GPQA and co-author on HCAST and the METR Time Horizons paper. Together they built the measurement Daniel Kokotajlo called the single most important piece of evidence on AI timelines: the log-linear line of "how long a task a frontier model can complete at 50% reliability" vs release date.The conversation opens on reward hacking. Current models can articulate in chat why a behaviour is undesired and then execute it anyway as agents. From there: construct validity, Melanie Mitchell's four-problem taxonomy, and the ARC-AGI 1-to-2 collapse as a worked example of adversarially-selected benchmarks regressing once labs target them. Beth's counter: METR deliberately does not adversarially select. David's: models do not have to do the right thing for the right reasons.Methodology, then specification — David's compiler analogy, Beth on four-month tasks as expensive to evaluate rather than unspecifiable. Then the SWE-bench reality check, the METR finding that half of passing PRs would not be merged, and Beth's horses-versus-bank-tellers analogy for the labour market.The close: monitorability, the coin-spinning boat, two-year recursive self-improvement, and Beth's line that "overhyped now" and "big deal later" are not correlated claims.---TIMESTAMPS:00:00:00 Intro00:02:06 Sponsor break: Prolific human-feedback infrastructure00:02:33 Welcome and the scalable oversight motivation00:06:02 Construct validity, benchmark pathologies and the Chollet worry00:15:45 Time Horizons: human time, HCAST tasks and the 50% logistic00:24:50 Is human difficulty really one variable?00:33:05 Agent harness evolution and the inference-compute dividend00:40:00 Scaffolding bells, token budgets and the credit-assignment problem00:44:15 Look at the damn graph: regularisation bug and reliability nuance00:50:00 Why 50%? Reliability, reward hacking and pizza-party transcripts00:55:20 Extrapolation risk and straight lines on graphs00:59:25 Software engineering as a specification acquisition problem01:07:40 Compilers also made ugly code: vibe-coding quality and Claude on METR Slack01:15:15 Strongest defensible claim, Carlini's compiler swarm and AI 202701:23:45 SWE-bench merge rates, the bank-teller analogy and horses01:31:45 Scheming, alignment faking and the mentalistic vocabulary problem01:40:45 Reward hacking, monitorability and chain-of-thought faithfulness01:45:25 Recursive self-improvement, knowledge vs intelligence and closingReScript: https://app.rescript.info/public/share/de3bb40cc02ee39fdf36e2c60366eb4d(PDF, refs, transcript etc)

NOW PLAYING

The AI Models Smart Enough to Know They're Cheating — Beth Barnes & David Rein [METR]

0:00 1:53:26

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

French Your Way Jessica: Native French teacher founder of French Your Way Boost your French listening skills and test your comprehension with this one of a kind series of podcasts. Get the chance to listen to a real conversation between native speakers talking at normal speed AND customise your learning experience through carefully designed sets of questions (2 levels of difficulty) available for download at www.frenchvoicespodcast.com. All interviews also come with the transcript. French teacher Jessica interviews native speakers of French from around the world who share a bit of their life and passion. Where else would you meet in one same place a French yoga teacher based in Melbourne, a soap manufacturer from Provence, or a couple cycling around the world? Kaizen Blueprint Aldo Chandra "Kaizen" is a Japanese term for continuous improvement. This podcast provides a blueprint to learn about health, wealth, relationships and everything else in between. Through our podcast, we strive to inspire, educate, and motivate our audience to cultivate a mindset of lifelong learning, productivity, and personal development. By sharing insights, strategies, and practical tips, we aim to guide listeners on their journey towards realizing their fullest potential, fostering success, and creating lasting positive change. One Man Went To Row PepperDawesMedia Follow the journey, from training to finish line, of a man from Derby, UK who is going from having only ever rowed on a machine to rowing 3000 miles solo across the Atlantic...just after his 70th birthday! Humanizing Change Tremendousness Join us each episode as we talk with innovators in their respective fields about their unique journeys and how they humanize change in their own work, right here, on Humanizing Change.

Frequently Asked Questions

How long is this episode of Machine Learning Street Talk (MLST)?

This episode is 1 hour and 53 minutes long.

When was this Machine Learning Street Talk (MLST) episode published?

This episode was published on May 4, 2026.

What is this episode about?

Beth Barnes and David Rein on the one graph that ate the AI timelines discourse, and why the two people who built it are the most careful about how you read it.**SPONSOR**Prolific - Quality data. From real people. For faster...

Can I download this Machine Learning Street Talk (MLST) episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!