PODCAST · technology

Embodied AI 101

by Shaoqing Tan

Stay in the loop on research in AI and physical intelligence.

Subscribe · 0 Bookmark

67

Claw-Eval: Toward Trustworthy and Transparent Evaluation of Autonomous Agents

Benchmark with 2,159 rubric items across 300 tasks using trajectory-aware grading and 3-trial Pass^3 scoring to mitigate luck. Evaluates agent reliability in real-world robotics settings.

Apr 8, 2026

28m
66

LIBERO-Para: Paraphrase Robustness in Robotic Manipulation

Reveals paraphrase fragility in VLAs causing 22-52% success drops due to task misidentification. Introduces PRIDE metric weighting success by paraphrase difficulty on LIBERO benchmark manipulation tasks.

Apr 8, 2026

32m
65

YOR: Your Own Mobile Manipulator for Generalizable Robotics

Low-cost mobile manipulator design and training strategies for broad generalization in real-world tasks.

Apr 7, 2026

27m
64

EgoSim: Egocentric World Simulator for Embodied Interaction Generation

Closed-loop egocentric video simulator maintaining persistent 3D scene state for consistent interactions, enabling cross-embodiment transfer from human videos to robotic manipulation.

Apr 7, 2026

50m
63

Accelerating Video World Models: From Generative Videos to Real-Time Simulators

Comprehensive survey taxonomizing efficient architectures/algorithms for video world models as simulators, targeting compute bottlenecks in embodied AI, autonomous driving, and games with techniques like short-window attention for real-time long-horizon prediction.

Apr 7, 2026

39m
62

From Tokens to Thoughts: Continuous Latent Reasoning in Large Models and Robot Control

Curated collection of 100+ works surveying shift to continuous latent spaces in LLMs/VLMs/VLAs for improved reasoning over discrete tokens, with relevance to robotics action modeling.

Apr 7, 2026

26m
61

CaP-X: Coding Agents for Physical eXecution

CaP-X is an open-source agentic robotics framework where LLMs/VLMs generate code to call perception and control APIs for execution across diverse simulated and real robots in CaP-Gym's 187 manipulation tasks. The framework includes CaP-Bench for evaluating frontier models and CaP-RL, which boosts a 7B model's success from 20% to 72% with minimal sim-to-real gap.

Apr 6, 2026

13m
60

DoRA: Weight-Decomposed Low-Rank Adaptation

An upgrade over LoRA for parameter-efficient fine-tuning, enabling better performance in LLMs by decomposing weights into magnitude and direction components.

Apr 6, 2026

39m
59

AI Model Collapse: What Happens When AI Trains on Its Own Outputs

Seminal work showing how training on AI-generated data leads to 'model collapse' in neural networks, with urgent implications for future scaling.

Apr 6, 2026

29m
58

PhAIL: Benchmarking Vision-Language-Action Models on Real-World Bin-Picking

Real-world hardware evaluation of VLAs on blind bin-to-bin picking, achieving max 64 picks/hour across hundreds of runs, with full videos/data exposing gaps in production-scale robotic manipulation reliability.

Apr 5, 2026

33m
57

Co-training Large Behavior Models: Data Modalities and Training Strategies for Robot Manipulation

Comprehensive evaluation of 89 policies showing optimal co-training practices mixing real robot data with sim/egocentric human videos to boost diversity and performance in large robotics foundation models.

Apr 5, 2026

28m
56

HyDRA: Hybrid Memory for Dynamic Video World Models

Novel memory system preserving dynamic object identity and motion continuity across occlusions in video world models, addressing frozen/vanishing issues for improved predictive physics in embodied AI.

Apr 5, 2026

21m
55

# WildWorld: Dynamic World Modeling with Actions and Explicit State

Massive dataset enabling dynamic world models with explicit states and actions, supporting predictive modeling for cross-embodiment robotic control.

Apr 4, 2026

32m
54

Omni-WorldBench: Evaluating Interactive 4D World Models

New benchmark assessing world models on interaction tasks, pushing predictive physics and video modeling towards robotics applications with action-conditioned evaluation.

Apr 4, 2026

39m
53

SIMART: From Static Meshes to Sim-Ready Articulated Models

Unified MLLM framework with Sparse 3D VQ-VAE (70% token reduction) for part-level mesh decomposition and kinematic chain prediction, enabling physics-based robotic simulation from monolithic assets.

Apr 4, 2026

38m
52

EgoSim: An Egocentric World Simulator for Embodied Interaction

Closed-loop egocentric simulator persistently updating 3D scene state to generate spatially consistent interaction videos for continuous simulation, enabling cross-embodiment transfer from human videos to robotic manipulation tasks.

Apr 4, 2026

36m
51

Digit's New Motor Cortex: Sim-to-Real RL for Whole-Body Control

AI-trained capabilities for new whole-body motions using mocap/teleop data and sim-to-real reinforcement learning, deployable overnight on hardware.

Apr 3, 2026

31m
50

EgoNav: Diffusion-Based Humanoid Navigation from Human Egocentric Video

Diffusion-based humanoid navigation trained solely on 5 hours of human egocentric video data, enabling zero-shot deployment on Unitree G1 for complex behaviors like handling glass walls, crowds, and dynamic obstacles via 360° visual memory and hybrid trajectory sampling; upcoming release of dataset, models, and code.

Apr 3, 2026

42m
49

CaP-X: A Code-as-Policy Framework for Robot Manipulation

Comprehensive open-source agentic robotics framework treating VLMs/LLMs as code-generating APIs for perception (SAM3, Molmo) and control (IK, grasping), with CaP-Gym benchmark of 187 diverse manipulation tasks (tabletop, bimanual, mobile; sim/real) and CaP-Bench evaluating 12 frontier models; demonstrates rapid RL gains (7B model from 20% to 72% success) with strong sim-to-real transfer.

Apr 3, 2026

13m
48

Embodied Intelligence Breakthrough: Generalist AI’s GEN-1 Robots

We've created GEN-1, our latest milestone in scaling robot learning. We believe it to be the first general-purpose AI model that crosses a new performance threshold: mastery of simple physical tasks. It improves average success rates to 99% on tasks where previous models achieve 64%, completes tasks roughly 3x faster than state of the art, and requires only 1 hour of robot data for each of these results. GEN-1 unlocks commercial viability across a broad range of applications—and while it cannot solve all tasks today, it is a significant step towards our mission of creating generalist intelligence for the physical world.

Apr 2, 2026

15m
47

CaP-X: LMs' First Physical Exam

A novel benchmark that evaluates language models on physical examination tasks, testing their ability to understand and perform clinical physical exam procedures in simulated environments. This work introduces a comprehensive evaluation framework for AI systems in medical/clinical settings.

Apr 2, 2026

22m
46

AI Model Collapse: The Danger of Training on AI-Generated Data

Demonstrated that LLMs trained recursively on AI-generated data suffer model collapse, a degenerative process where they lose grasp of true data distributions. Sparked critical debates on data provenance and the importance of preserving human-generated training data.

Mar 31, 2026

31m
45

High-Level Automated Reasoning with Qwen2.5-7B

Qwen2.5-7B achieved 79.6% on MATH benchmark, surpassing GPT-4o, by employing atomic reasoning actions combined with Monte Carlo Tree Search. Demonstrated that strategic reasoning architectures can enable smaller models to outperform much larger ones.

Mar 31, 2026

27m
44

Co-Training Large Behavior Models: Multimodal Data for Robot Manipulation

Explores data modalities and co-training strategies to enhance large behavior models (foundation models) for improved performance in robot manipulation tasks, supporting end-to-end learning and cross-embodiment generalization.

Mar 31, 2026

33m
43

HyDRA: Hybrid Memory for Dynamic Video World Models

Memory architecture preserving identity and motion continuity for out-of-view dynamic subjects, addressing frozen/vanishing issues in video world models.

Mar 30, 2026

35m
42

DexWM: Leveraging Human Videos for Dexterous Robot World Models

Dataset of robot trajectories designed for training world models to learn dexterous hand-object interactions directly from human videos.

Mar 30, 2026

31m
41

World Models in Robotics

Technical survey categorizing world models into action-conditioned, video-inverse dynamics, and joint world-action models (WAMs), discussing their generalization, video data leverage, and trends for closing the robotics data gap.

Mar 29, 2026

26m
40

SIMART: Decomposing Monolithic Meshes into Sim-Ready Articulated Assets

Unified MLLM framework with Sparse 3D VQ-VAE that reduces tokens by 70% for efficient part-level decomposition and kinematic prediction in physics-based robotic simulations.

Mar 28, 2026

45m
39

LeWorldModel: A Stable JEPA World Model from Pixels

Stable end-to-end JEPA world model trained directly from pixels using simple MSE prediction loss and SIGReg anti-collapse regularization, enabling efficient latent planning under 1 second on 15M params with emergent spatial structure outperforming prior methods.

Mar 28, 2026

13m
38

World Models for Robots: The Next Big Leap?

Technical overview defining world models in robotics, their potential to solve diverse problems via video prediction, and key enablers like scale.

Mar 27, 2026

20m
37

Harnessing Long-Running AI in Embodied Systems

As AI moves from quick Q&A to marathon tasks, designers grapple with continuity. This episode explores how Anthropics harness design principles translate to embodied AI - robots that need to maintain context across long-running missions.

Mar 27, 2026

27m
36

HoMMI: Learning Whole-Body Mobile Manipulation from Human Demonstrations

Whole-Body Mobile Manipulation Interface (HoMMI) that learns bimanual and whole-body manipulation, long-horizon navigation, and active perception directly from egocentric human demonstrations without teleoperation.

Mar 26, 2026

17m
35

TurboQuant: Redefining AI Efficiency with Extreme Compression

This episode explores TurboQuant, a revolutionary set of quantization algorithms from Google Research that redefines AI efficiency through extreme compression.We dive deep into how TurboQuant addresses one of AI's most pressing challenges: the memory bottleneck created by high-dimensional vectors in key-value caches. The research introduces theoretically grounded quantization methods that enable massive compression for large language models and vector search engines without sacrificing performance.Key topics covered:The theoretical foundations of TurboQuant's quantization algorithmsHow extreme compression works for LLMs and vector search enginesImpact on high-dimensional vectors and key-value cache memory bottlenecksPerformance metrics and comparisons with existing methodsPractical implications for AI deployment and efficiencyLinks:Paper: https://arxiv.org/pdf/2504.19874Blog: https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/

Mar 26, 2026

20m
34

DexWM: Learning Dexterous Object Manipulation from Human Videos

Dataset of robot trajectories designed for training world models that learn dexterous hand-object interactions from human videos, released on Hugging Face.

Mar 25, 2026

32m
33

FlashAttention-3: Fast & Accurate Attention with Asynchrony & Low-Precision

Major efficiency leap for Transformer attention mechanisms, enabling faster training/inference on long sequences with low-precision compute.

Mar 25, 2026

17m
32

When AI Trains on Its Own Output: The Model Collapse Problem

Warns of "model collapse" in LLMs trained on synthetic data from prior models, urging preservation of human-generated data. One of 2024's most influential papers.

Mar 25, 2026

24m
31

MolmoBot: A Vision-Language Model for Zero-Shot Robot Manipulation

Vision-language model (VLM) for zero-shot robot manipulation, trained entirely in simulation without real-world data; achieves 79.2% success rate on real-world tabletop tasks, outperforming π₀.₅ baseline at 39.2%.

Mar 24, 2026

38m
30

LeWorldModel: Stable End-to-End JEPA from Pixels

A stable end-to-end Joint Embedding Predictive Architecture (JEPA) trained directly from pixels that enables robust world modeling for embodied AI systems.

Mar 24, 2026

13m
29

EgoVerse: An Egocentric Data Ecosystem for Scaling Robot Learning

Ecosystem with over 1300 hours of egocentric human video data spanning 240 scenes and 2000+ tasks, designed for scalable robot policy training via behavior cloning; includes cloud infrastructure, data viewer, and human-to-robot transfer algorithms to enable cross-embodiment learning without teleoperation.

Mar 24, 2026

41m
28

HSImul3R: Physics-Driven Reconstruction of Human–Scene Interactions

Physics-in-the-loop bi-directional optimization pipeline reconstructing stable, simulation-ready 3D human-scene interactions from casual videos, deployable directly to humanoid robots for world modeling and manipulation.

Mar 24, 2026

28m
27

MolmoSpaces: A Large-Scale Open Ecosystem for Robot Navigation and Manipulation

Open-source suite of large-scale simulation environments and benchmarks designed for advancing end-to-end learning in robot navigation and manipulation across multiple embodiments.

Mar 23, 2026

29m
26

DreamZero: World Action Models Are Zero-Shot Policies

Introduces World Action Models (WAMs), a family of 14B-parameter autoregressive diffusion models that jointly predict video and robotic actions to enable zero-shot generalization across manipulation tasks, outperforming fine-tuned Vision-Language-Action models on benchmarks like MolmoSpaces and RoboArena.

Mar 23, 2026

26m
25

Kinema4D: A 4D Generative Simulator for Embodied AI

An action-conditioned 4D generative robotic simulator that disentangles precise kinematic control from environmental dynamics, facilitating physically-plausible simulations of complex robot-world interactions for training and world modeling.

Mar 23, 2026

31m
24

VEGA-3D: Teaching multimodal LLMs spatial reasoning through video generation

A plug-and-play framework extracts implicit 3D priors from video diffusion models to enhance multimodal LLMs with spatial reasoning capabilities, enabling improved geometric scene understanding and embodied decision-making without explicit 3D supervision.

Mar 23, 2026

32m

View all 67 episodes →

Type above to search every episode's transcript for a word or phrase. Matches are scoped to this podcast.

Searching…

We're indexing this podcast's transcripts for the first time — this can take a minute or two. We'll show results as soon as they're ready.

No matches for "" in this podcast's transcripts.

Showing of matches

No topics indexed yet for this podcast.

Loading reviews...

Share your thoughts

ABOUT THIS SHOW

Stay in the loop on research in AI and physical intelligence.

HOSTED BY

Shaoqing Tan

Frequently Asked Questions

How many episodes does Embodied AI 101 have?

Embodied AI 101 currently has 44 episodes available on PodParley. New episodes are automatically indexed when they're published to the podcast feed.

What is Embodied AI 101 about?

Stay in the loop on research in AI and physical intelligence.

How often does Embodied AI 101 release new episodes?

Embodied AI 101 has 44 episodes. Check the episode list to see recent publication dates and frequency.

Where can I listen to Embodied AI 101?

You can listen to Embodied AI 101 on PodParley by clicking any episode. We provide an embedded audio player for direct listening, and you can also subscribe via your preferred podcast app using the RSS feed.

Who hosts Embodied AI 101?

Embodied AI 101 is created and hosted by Shaoqing Tan.

URL copied to clipboard!

Claw-Eval: Toward Trustworthy and Transparent Evaluation of Autonomous Agents

LIBERO-Para: Paraphrase Robustness in Robotic Manipulation

YOR: Your Own Mobile Manipulator for Generalizable Robotics

EgoSim: Egocentric World Simulator for Embodied Interaction Generation

Accelerating Video World Models: From Generative Videos to Real-Time Simulators

From Tokens to Thoughts: Continuous Latent Reasoning in Large Models and Robot Control

CaP-X: Coding Agents for Physical eXecution

DoRA: Weight-Decomposed Low-Rank Adaptation

AI Model Collapse: What Happens When AI Trains on Its Own Outputs

PhAIL: Benchmarking Vision-Language-Action Models on Real-World Bin-Picking

Co-training Large Behavior Models: Data Modalities and Training Strategies for Robot Manipulation

HyDRA: Hybrid Memory for Dynamic Video World Models

# WildWorld: Dynamic World Modeling with Actions and Explicit State

Omni-WorldBench: Evaluating Interactive 4D World Models

SIMART: From Static Meshes to Sim-Ready Articulated Models

EgoSim: An Egocentric World Simulator for Embodied Interaction

Digit's New Motor Cortex: Sim-to-Real RL for Whole-Body Control

EgoNav: Diffusion-Based Humanoid Navigation from Human Egocentric Video

CaP-X: A Code-as-Policy Framework for Robot Manipulation

Embodied Intelligence Breakthrough: Generalist AI’s GEN-1 Robots

CaP-X: LMs' First Physical Exam

AI Model Collapse: The Danger of Training on AI-Generated Data

High-Level Automated Reasoning with Qwen2.5-7B

Co-Training Large Behavior Models: Multimodal Data for Robot Manipulation

HyDRA: Hybrid Memory for Dynamic Video World Models

DexWM: Leveraging Human Videos for Dexterous Robot World Models

World Models in Robotics

SIMART: Decomposing Monolithic Meshes into Sim-Ready Articulated Assets

LeWorldModel: A Stable JEPA World Model from Pixels

World Models for Robots: The Next Big Leap?

Harnessing Long-Running AI in Embodied Systems

HoMMI: Learning Whole-Body Mobile Manipulation from Human Demonstrations

TurboQuant: Redefining AI Efficiency with Extreme Compression

DexWM: Learning Dexterous Object Manipulation from Human Videos

FlashAttention-3: Fast & Accurate Attention with Asynchrony & Low-Precision

When AI Trains on Its Own Output: The Model Collapse Problem

MolmoBot: A Vision-Language Model for Zero-Shot Robot Manipulation

LeWorldModel: Stable End-to-End JEPA from Pixels

EgoVerse: An Egocentric Data Ecosystem for Scaling Robot Learning

HSImul3R: Physics-Driven Reconstruction of Human–Scene Interactions

MolmoSpaces: A Large-Scale Open Ecosystem for Robot Navigation and Manipulation

DreamZero: World Action Models Are Zero-Shot Policies

Kinema4D: A 4D Generative Simulator for Embodied AI

VEGA-3D: Teaching multimodal LLMs spatial reasoning through video generation

Authentication Required

Frequently Asked Questions

How many episodes does Embodied AI 101 have?

What is Embodied AI 101 about?

How often does Embodied AI 101 release new episodes?

Where can I listen to Embodied AI 101?

Who hosts Embodied AI 101?