Ashley Edwards - Genie Paper (DeepMind/Runway) episode artwork

EPISODE · Sep 13, 2024 · 25 MIN

Ashley Edwards - Genie Paper (DeepMind/Runway)

from Machine Learning Street Talk (MLST)

Ashley Edwards, who was working at DeepMind when she co-authored the Genie paper and is now at Runway, covered several key aspects of the Genie AI system and its applications in video generation, robotics, and game creation. MLST is sponsored by Brave: The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api. Genie's approach to learning interactive environments, balancing compression and fidelity. The use of latent action models and VQE models for video processing and tokenization. Challenges in maintaining action consistency across frames and integrating text-to-image models. Evaluation metrics for AI-generated content, such as FID and PS&R diff metrics. The discussion also explored broader implications and applications: The potential impact of AI video generation on content creation jobs. Applications of Genie in game generation and robotics. The use of foundation models in robotics and the differences between internet video data and specialized robotics data. Challenges in mapping AI-generated actions to real-world robotic actions. Ashley Edwards: https://ashedwards.github.io/ TOC (*) are best bits 00:00:00 1. Intro to Genie & Brave Search API: Trade-offs & limitations * 00:02:26 2. Genie's Architecture: Latent action, VQE, video processing * 00:05:06 3. Genie's Constraints: Frame consistency & image model integration 00:07:26 4. Evaluation: FID, PS&R diff metrics & latent induction methods 00:09:44 5. AI Video Gen: Content creation impact, depth & parallax effects 00:11:39 6. Model Scaling: Training data impact & computational trade-offs 00:13:50 7. Game & Robotics Apps: Gamification & action mapping challenges * 00:16:16 8. Robotics Foundation Models: Action space & data considerations * 00:19:18 9. Mask-GPT & Video Frames: Real-time optimization, RL from videos 00:20:34 10. Research Challenges: AI value, efficiency vs. quality, safety 00:24:20 11. Future Dev: Efficiency improvements & fine-tuning strategies Refs: 1. Genie (learning interactive environments from videos) / Ashley and DM collegues [00:01] https://arxiv.org/abs/2402.15391 2. VQ-VAE (Vector Quantized Variational Autoencoder) / Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu [02:43] https://arxiv.org/abs/1711.00937 3. FID (Fréchet Inception Distance) metric / Martin Heusel et al. [07:37] https://arxiv.org/abs/1706.08500 4. PS&R (Precision and Recall) metric / Mehdi S. M. Sajjadi et al. [08:02] https://arxiv.org/abs/1806.00035 5. Vision Transformer (ViT) architecture / Alexey Dosovitskiy et al. [12:14] https://arxiv.org/abs/2010.11929 6. Genie (robotics foundation models) / Google DeepMind [17:34] https://deepmind.google/research/publications/60474/ 7. Chelsea Finn's lab work on robotics datasets / Chelsea Finn [17:38] https://ai.stanford.edu/~cbfinn/ 8. Imitation from observation in reinforcement learning / YuXuan Liu [20:58] https://arxiv.org/abs/1707.03374 9. Waymo's autonomous driving technology / Waymo [22:38] https://waymo.com/ 10. Gen3 model release by Runway / Runway [23:48] https://runwayml.com/ 11. Classifier-free guidance technique / Jonathan Ho and Tim Salimans [24:43] https://arxiv.org/abs/2207.12598

Ashley Edwards, who was working at DeepMind when she co-authored the Genie paper and is now at Runway, covered several key aspects of the Genie AI system and its applications in video generation, robotics, and game creation. MLST is sponsored by Brave: The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api. Genie's approach to learning interactive environments, balancing compression and fidelity. The use of latent action models and VQE models for video processing and tokenization. Challenges in maintaining action consistency across frames and integrating text-to-image models. Evaluation metrics for AI-generated content, such as FID and PS&R diff metrics. The discussion also explored broader implications and applications: The potential impact of AI video generation on content creation jobs. Applications of Genie in game generation and robotics. The use of foundation models in robotics and the differences between internet video data and specialized robotics data. Challenges in mapping AI-generated actions to real-world robotic actions. Ashley Edwards: https://ashedwards.github.io/ TOC (*) are best bits 00:00:00 1. Intro to Genie & Brave Search API: Trade-offs & limitations * 00:02:26 2. Genie's Architecture: Latent action, VQE, video processing * 00:05:06 3. Genie's Constraints: Frame consistency & image model integration 00:07:26 4. Evaluation: FID, PS&R diff metrics & latent induction methods 00:09:44 5. AI Video Gen: Content creation impact, depth & parallax effects 00:11:39 6. Model Scaling: Training data impact & computational trade-offs 00:13:50 7. Game & Robotics Apps: Gamification & action mapping challenges * 00:16:16 8. Robotics Foundation Models: Action space & data considerations * 00:19:18 9. Mask-GPT & Video Frames: Real-time optimization, RL from videos 00:20:34 10. Research Challenges: AI value, efficiency vs. quality, safety 00:24:20 11. Future Dev: Efficiency improvements & fine-tuning strategies Refs: 1. Genie (learning interactive environments from videos) / Ashley and DM collegues [00:01] https://arxiv.org/abs/2402.15391 2. VQ-VAE (Vector Quantized Variational Autoencoder) / Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu [02:43] https://arxiv.org/abs/1711.00937 3. FID (Fréchet Inception Distance) metric / Martin Heusel et al. [07:37] https://arxiv.org/abs/1706.08500 4. PS&R (Precision and Recall) metric / Mehdi S. M. Sajjadi et al. [08:02] https://arxiv.org/abs/1806.00035 5. Vision Transformer (ViT) architecture / Alexey Dosovitskiy et al. [12:14] https://arxiv.org/abs/2010.11929 6. Genie (robotics foundation models) / Google DeepMind [17:34] https://deepmind.google/research/publications/60474/ 7. Chelsea Finn's lab work on robotics datasets / Chelsea Finn [17:38] https://ai.stanford.edu/~cbfinn/ 8. Imitation from observation in reinforcement learning / YuXuan Liu [20:58] https://arxiv.org/abs/1707.03374 9. Waymo's autonomous driving technology / Waymo [22:38] https://waymo.com/ 10. Gen3 model release by Runway / Runway [23:48] https://runwayml.com/ 11. Classifier-free guidance technique / Jonathan Ho and Tim Salimans [24:43] https://arxiv.org/abs/2207.12598

NOW PLAYING

Ashley Edwards - Genie Paper (DeepMind/Runway)

0:00 25:04

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

French Your Way Jessica: Native French teacher founder of French Your Way Boost your French listening skills and test your comprehension with this one of a kind series of podcasts. Get the chance to listen to a real conversation between native speakers talking at normal speed AND customise your learning experience through carefully designed sets of questions (2 levels of difficulty) available for download at www.frenchvoicespodcast.com. All interviews also come with the transcript. French teacher Jessica interviews native speakers of French from around the world who share a bit of their life and passion. Where else would you meet in one same place a French yoga teacher based in Melbourne, a soap manufacturer from Provence, or a couple cycling around the world? Kaizen Blueprint Aldo Chandra "Kaizen" is a Japanese term for continuous improvement. This podcast provides a blueprint to learn about health, wealth, relationships and everything else in between. Through our podcast, we strive to inspire, educate, and motivate our audience to cultivate a mindset of lifelong learning, productivity, and personal development. By sharing insights, strategies, and practical tips, we aim to guide listeners on their journey towards realizing their fullest potential, fostering success, and creating lasting positive change. One Man Went To Row PepperDawesMedia Follow the journey, from training to finish line, of a man from Derby, UK who is going from having only ever rowed on a machine to rowing 3000 miles solo across the Atlantic...just after his 70th birthday! Humanizing Change Tremendousness Join us each episode as we talk with innovators in their respective fields about their unique journeys and how they humanize change in their own work, right here, on Humanizing Change.

Frequently Asked Questions

How long is this episode of Machine Learning Street Talk (MLST)?

This episode is 25 minutes long.

When was this Machine Learning Street Talk (MLST) episode published?

This episode was published on September 13, 2024.

What is this episode about?

Ashley Edwards, who was working at DeepMind when she co-authored the Genie paper and is now at Runway, covered several key aspects of the Genie AI system and its applications in video generation, robotics, and game creation. MLST is sponsored by...

Can I download this Machine Learning Street Talk (MLST) episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!