DeepMind Genie 3 [World Exclusive] (Jack Parker Holder, Shlomi Fruchter) episode artwork

EPISODE · Aug 5, 2025 · 58 MIN

DeepMind Genie 3 [World Exclusive] (Jack Parker Holder, Shlomi Fruchter)

from Machine Learning Street Talk (MLST)

This episode features Shlomi Fuchter and Jack Parker Holder from Google DeepMind, who are unveiling a new AI called Genie 3. The host, Tim Scarfe, describes it as the most mind-blowing technology he has ever seen. We were invited to their offices to conduct the interview (not sponsored).Imagine you could create a video game world just by describing it. That's what Genie 3 does. It's an AI "world model" that learns how the real world works by watching massive amounts of video. Unlike a normal video game engine (like Unreal or the one for Doom) that needs to be programmed manually, Genie generates a realistic, interactive, 3D world from a simple text prompt.**SPONSOR MESSAGES***Prolific: Quality data. From real people. For faster breakthroughs.https://prolific.com/mlst?utm_campaign=98404559-MLST&utm_source=youtube&utm_medium=podcast&utm_content=script-gen***Here’s a breakdown of what makes it so revolutionary:From Text to a Virtual World: You can type "a drone flying by a beautiful lake" or "a ski slope," and Genie 3 creates that world for you in about three seconds. You can then navigate and interact with it in real-time.It's Consistent: The worlds it creates have a reliable memory. If you look away from an object and then look back, it will still be there, just as it was. The guests explain that this consistency isn't explicitly programmed in; it's a surprising, "emergent" capability of the powerful AI model.A Huge Leap Forward: The previous version, Genie 2, was a major step, but it wasn't fast enough for real-time interaction and was much lower resolution. Genie 3 is 720p, interactive, and photorealistic, running smoothly for several minutes at a time.The Killer App - Training Robots: Beyond entertainment, the team sees Genie 3 as a game-changer for training AI. Instead of training a self-driving car or a robot in the real world (which is slow and dangerous), you can create infinite simulations. You can even prompt rare events to happen, like a deer running across the road, to teach an AI how to handle unexpected situations safely.The Future of Entertainment: this could lead to a "YouTube version 2" or a new form of VR, where users can create and explore endless, interconnected worlds together, like the experience machine from philosophy.While the technology is still a research prototype and not yet available to the public, it represents a monumental step towards creating true artificial worlds from the ground up.Jack Parker Holder [Research Scientist at Google DeepMind in the Open-Endedness Team]https://jparkerholder.github.io/Shlomi Fruchter [Research Director, Google DeepMind]https://shlomifruchter.github.io/TOC:[00:00:00] - Introduction: "The Most Mind-Blowing Technology I've Ever Seen"[00:02:30] - The Evolution from Genie 1 to Genie 2[00:04:30] - Enter Genie 3: Photorealistic, Interactive Worlds from Text[00:07:00] - Promptable World Events & Training Self-Driving Cars[00:14:21] - Guest Introductions: Shlomi Fuchter & Jack Parker Holder[00:15:08] - Core Concepts: What is a "World Model"?[00:19:30] - The Challenge of Consistency in a Generated World[00:21:15] - Context: The Neural Network Doom Simulation[00:25:25] - How Do You Measure the Quality of a World Model?[00:28:09] - The Vision: Using Genie to Train Advanced Robots[00:32:21] - Open-Endedness: Human Skill and Prompting Creativity[00:38:15] - The Future: Is This the Next YouTube or VR?[00:42:18] - The Next Step: Multi-Agent Simulations[00:52:51] - Limitations: Thinking, Computation, and the Sim-to-Real Gap[00:58:07] - Conclusion & The Future of Game EnginesREFS:World Models [David Ha, Jürgen Schmidhuber]https://arxiv.org/abs/1803.10122POEThttps://arxiv.org/abs/1901.01753[Akarsh Kumar, Jeff Clune, Joel Lehman, Kenneth O. Stanley]The Fractured Entangled Representation Hypothesishttps://arxiv.org/pdf/2505.11581TRANSCRIPT:https://app.rescript.info/public/share/Zk5tZXk6mb06yYOFh6nSja7Lg6_qZkgkuXQ-kl5AJqM

This episode features Shlomi Fuchter and Jack Parker Holder from Google DeepMind, who are unveiling a new AI called Genie 3. The host, Tim Scarfe, describes it as the most mind-blowing technology he has ever seen. We were invited to their offices to conduct the interview (not sponsored).Imagine you could create a video game world just by describing it. That's what Genie 3 does. It's an AI "world model" that learns how the real world works by watching massive amounts of video. Unlike a normal video game engine (like Unreal or the one for Doom) that needs to be programmed manually, Genie generates a realistic, interactive, 3D world from a simple text prompt.**SPONSOR MESSAGES***Prolific: Quality data. From real people. For faster breakthroughs.https://prolific.com/mlst?utm_campaign=98404559-MLST&utm_source=youtube&utm_medium=podcast&utm_content=script-gen***Here’s a breakdown of what makes it so revolutionary:From Text to a Virtual World: You can type "a drone flying by a beautiful lake" or "a ski slope," and Genie 3 creates that world for you in about three seconds. You can then navigate and interact with it in real-time.It's Consistent: The worlds it creates have a reliable memory. If you look away from an object and then look back, it will still be there, just as it was. The guests explain that this consistency isn't explicitly programmed in; it's a surprising, "emergent" capability of the powerful AI model.A Huge Leap Forward: The previous version, Genie 2, was a major step, but it wasn't fast enough for real-time interaction and was much lower resolution. Genie 3 is 720p, interactive, and photorealistic, running smoothly for several minutes at a time.The Killer App - Training Robots: Beyond entertainment, the team sees Genie 3 as a game-changer for training AI. Instead of training a self-driving car or a robot in the real world (which is slow and dangerous), you can create infinite simulations. You can even prompt rare events to happen, like a deer running across the road, to teach an AI how to handle unexpected situations safely.The Future of Entertainment: this could lead to a "YouTube version 2" or a new form of VR, where users can create and explore endless, interconnected worlds together, like the experience machine from philosophy.While the technology is still a research prototype and not yet available to the public, it represents a monumental step towards creating true artificial worlds from the ground up.Jack Parker Holder [Research Scientist at Google DeepMind in the Open-Endedness Team]https://jparkerholder.github.io/Shlomi Fruchter [Research Director, Google DeepMind]https://shlomifruchter.github.io/TOC:[00:00:00] - Introduction: "The Most Mind-Blowing Technology I've Ever Seen"[00:02:30] - The Evolution from Genie 1 to Genie 2[00:04:30] - Enter Genie 3: Photorealistic, Interactive Worlds from Text[00:07:00] - Promptable World Events & Training Self-Driving Cars[00:14:21] - Guest Introductions: Shlomi Fuchter & Jack Parker Holder[00:15:08] - Core Concepts: What is a "World Model"?[00:19:30] - The Challenge of Consistency in a Generated World[00:21:15] - Context: The Neural Network Doom Simulation[00:25:25] - How Do You Measure the Quality of a World Model?[00:28:09] - The Vision: Using Genie to Train Advanced Robots[00:32:21] - Open-Endedness: Human Skill and Prompting Creativity[00:38:15] - The Future: Is This the Next YouTube or VR?[00:42:18] - The Next Step: Multi-Agent Simulations[00:52:51] - Limitations: Thinking, Computation, and the Sim-to-Real Gap[00:58:07] - Conclusion & The Future of Game EnginesREFS:World Models [David Ha, Jürgen Schmidhuber]https://arxiv.org/abs/1803.10122POEThttps://arxiv.org/abs/1901.01753[Akarsh Kumar, Jeff Clune, Joel Lehman, Kenneth O. Stanley]The Fractured Entangled Representation Hypothesishttps://arxiv.org/pdf/2505.11581TRANSCRIPT:https://app.rescript.info/public/share/Zk5tZXk6mb06yYOFh6nSja7Lg6_qZkgkuXQ-kl5AJqM

NOW PLAYING

DeepMind Genie 3 [World Exclusive] (Jack Parker Holder, Shlomi Fruchter)

0:00 58:22

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

French Your Way Jessica: Native French teacher founder of French Your Way Boost your French listening skills and test your comprehension with this one of a kind series of podcasts. Get the chance to listen to a real conversation between native speakers talking at normal speed AND customise your learning experience through carefully designed sets of questions (2 levels of difficulty) available for download at www.frenchvoicespodcast.com. All interviews also come with the transcript. French teacher Jessica interviews native speakers of French from around the world who share a bit of their life and passion. Where else would you meet in one same place a French yoga teacher based in Melbourne, a soap manufacturer from Provence, or a couple cycling around the world? Kaizen Blueprint Aldo Chandra "Kaizen" is a Japanese term for continuous improvement. This podcast provides a blueprint to learn about health, wealth, relationships and everything else in between. Through our podcast, we strive to inspire, educate, and motivate our audience to cultivate a mindset of lifelong learning, productivity, and personal development. By sharing insights, strategies, and practical tips, we aim to guide listeners on their journey towards realizing their fullest potential, fostering success, and creating lasting positive change. One Man Went To Row PepperDawesMedia Follow the journey, from training to finish line, of a man from Derby, UK who is going from having only ever rowed on a machine to rowing 3000 miles solo across the Atlantic...just after his 70th birthday! Humanizing Change Tremendousness Join us each episode as we talk with innovators in their respective fields about their unique journeys and how they humanize change in their own work, right here, on Humanizing Change.

Frequently Asked Questions

How long is this episode of Machine Learning Street Talk (MLST)?

This episode is 58 minutes long.

When was this Machine Learning Street Talk (MLST) episode published?

This episode was published on August 5, 2025.

What is this episode about?

This episode features Shlomi Fuchter and Jack Parker Holder from Google DeepMind, who are unveiling a new AI called Genie 3. The host, Tim Scarfe, describes it as the most mind-blowing technology he has ever seen. We were invited to their offices to...

Can I download this Machine Learning Street Talk (MLST) episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!