Interviewing Finbarr Timbers on the "We are So Back" Era of Reinforcement Learning

Q: How long is this episode of Interconnects?

This episode is 1 hour and 8 minutes long.

Q: When was this Interconnects episode published?

This episode was published on December 5, 2024.

Q: Can I download this Interconnects episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.

from Interconnects · host Nathan Lambert and Finbarr Timbers

Finbarr Timbers is an AI researcher who writes Artificial Fintelligence — one of the technical AI blog’s I’ve been recommending for a long time — and has a variety of experiences at top AI labs including DeepMind and Midjourney. The goal of this interview was to do a few things:* Revisit what reinforcement learning (RL) actually is, its origins, and its motivations.* Contextualize the major breakthroughs of deep RL in the last decade, from DQN for Atari to AlphaZero to ChatGPT. How could we have seen the resurgence coming? (see the timeline below for the major events we cover)* Modern uses for RL, o1, RLHF, and the future of finetuning all ML models.* Address some of the critiques like “RL doesn’t work yet.”It was a fun one. Listen on Apple Podcasts, Spotify, YouTube, and where ever you get your podcasts. For other Interconnects interviews, go here.Timeline of RL and what was happening at the timeIn the last decade of deep RL, there have been a few phases.* Era 1: Deep RL fundamentals — when modern algorithms we designed and proven.* Era 2: Major projects — AlphaZero, OpenAI 5, and all the projects that put RL on the map.* Era 3: Slowdown — when DeepMind and OpenAI no longer had the major RL projects and cultural relevance declined.* Era 4: RLHF & widening success — RL’s new life post ChatGPT.Covering these is the following events. This is incomplete, but enough to inspire a conversation.Early era: TD Gammon, REINFORCE, Etc2013: Deep Q Learning (Atari)2014: Google acquires DeepMind2016: AlphaGo defeats Lee Sedol2017: PPO paper, AlphaZero (no human data)2018: OpenAI Five, GPT 22019: AlphaStar, robotic sim2real with RL early papers (see blog post)2020: MuZero2021: Decision Transformer2022: ChatGPT, sim2real continues.2023: Scaling laws for RL (blog post), doubt of RL2024: o1, post-training, RL’s bloomInterconnects is a reader-supported publication. Consider becoming a subscriber.Chapters* [00:00:00] Introduction* [00:02:14] Reinforcement Learning Fundamentals* [00:09:03] The Bitter Lesson* [00:12:07] Reward Modeling and Its Challenges in RL* [00:16:03] Historical Milestones in Deep RL* [00:21:18] OpenAI Five and Challenges in Complex RL Environments* [00:25:24] Recent-ish Developments in RL: MuZero, Decision Transformer, and RLHF* [00:30:29] OpenAI's O1 and Exploration in Language Models* [00:40:00] Tülu 3 and Challenges in RL Training for Language Models* [00:46:48] Comparing Different AI Assistants* [00:49:44] Management in AI Research* [00:55:30] Building Effective AI Teams* [01:01:55] The Need for Personal BrandingWe mention* O1 (OpenAI model)* Rich Sutton* University of Alberta* London School of Economics* IBM’s Deep Blue* Alberta Machine Intelligence Institute (AMII)* John Schulman* Claude (Anthropic's AI assistant)* Logan Kilpatrick* Bard (Google's AI assistant)* DeepSeek R1 Lite* Scale AI* OLMo (AI2's language model)* Golden Gate Claude This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe

NOW PLAYING

Interviewing Finbarr Timbers on the "We are So Back" Era of Reinforcement Learning

0:00 1:08:33

1×

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Share this episode

Similar Episodes

Top 5 Concerns Project Managers Have About Printed Electronics – And How to Solve Them

Jun 9, 2026 ·5m

Printing instead of a PCB – how to simplify IoT device design

May 20, 2026 ·8m

KNOWLEDGE THAT STAYS – A Mini Guide for Project Managers

May 12, 2026 ·4m

How printed electronics is changing the rules in IoT ?

Apr 28, 2026 ·7m

Ethics and sustainability in the purchasing process

Apr 22, 2026 ·8m

Tools for printed electronics designers – overview and recommendations

Apr 16, 2026 ·6m

Similar Podcasts

Hardware-Conscious Data Processing (ST 2023) - tele-TASK Prof. Dr. Tilmann Rabl Hardware development continuously advances, with different technologies improving at different pace. While the amount of transistors in a CPU package are growing, the single core performance is stagnating due to physical limitations. These trends require changes in data processing to keep database management systems efficient. In this lecture, we will take a look at current computer architectures and accelerator technologies and how they can be used for efficient data processing. We will cover CPU and memory architecture; the storage hierarchy; modern memory technolgoies, such as NVM and NVMe; fast interconnects, such as Infiniband, RDMA, and NVLink; and accelerators, such as GPUs and FPGAs. The course has a significant practical part, where the students learn to implement data structures and algorithms tailored to hardware concious data processing. Musical Tourism Synapset Synapset is a blitz collective formed in Barcelona, over a week in the beginning of April 2010 by Synapskollaps and reSet Sakrecoer. This album is based on experimenting with the risk of taking opportunities in life and reproduce them with machines. It questions the space existing between people and how music interconnects them. This album was written, recorded, mixed and mastered in 7 days.It's core formation is Synapskollaps and reSet Sakrecoer, with special appearance by Dr.Tikov and MC Charlot. Recorded In The FragleRock Studio v2.59, Barcelona. Cover photo by Patsy Boop, Edit by the Sakrecoer Design Robot. Mastered By Dr. Tikov9 tracks of pure kick and base!"Including amazing holiday pictures, healthy Sub-Vibes and pure feelings." - Basspistol.com"Congratulation on the release" - Goodkarma.ru Audistorium Stygian Catalyst Audistorium is a multi-genre spanning dark anthology audio drama created by Landon 'Lemon' Whisnant. From dread horror to absurdist comedy, Audistorium weaves a web of its own that interconnects It's stories in its own macabre, sometimes goofy way.Produced by Stygian Catalyst and co-creator of the Questionable Guide to Life Podcast.At the caring chiding of those close to us, we have decided to open up a way for people to contribute to the shows production, for the price of a simple cup of coffee, you can support Audistorium by clicking here for our Ko-Fi page.For contact, email us at [email protected],We can be found @AudistoriumPod on TwitterYou can find Landon <a href="https://open.acast.com/shows/653838418299010011ba94bc/episodes/@https://twitter.com/Lemjam The Undisputed Truth. Lily Stinson The undisputed truth…is within you.We’ll be diving into resonance beyond words. The truth we’re all searching for——LOVE. Simple. Direct. Digestible truth❤️ I’m not here to dull myself down and neither are you! A peak into limitless creation—- hosted by Lily (love)! I will reflect the truth within you——what interconnects and intertwines us all. Love. The simple truth humanity has forgotten about—-the cure of it all. The lion sleeps no more.

Frequently Asked Questions

How long is this episode of Interconnects?

This episode is 1 hour and 8 minutes long.

When was this Interconnects episode published?