Embodied AI 101 podcast artwork

PODCAST · technology

Embodied AI 101

Stay in the loop on research in AI and physical intelligence.

  1. 67

    Claw-Eval: Toward Trustworthy and Transparent Evaluation of Autonomous Agents

    Benchmark with 2,159 rubric items across 300 tasks using trajectory-aware grading and 3-trial Pass^3 scoring to mitigate luck. Evaluates agent reliability in real-world robotics settings.

  2. 66

    LIBERO-Para: Paraphrase Robustness in Robotic Manipulation

    Reveals paraphrase fragility in VLAs causing 22-52% success drops due to task misidentification. Introduces PRIDE metric weighting success by paraphrase difficulty on LIBERO benchmark manipulation tasks.

  3. 65

    YOR: Your Own Mobile Manipulator for Generalizable Robotics

    Low-cost mobile manipulator design and training strategies for broad generalization in real-world tasks.

  4. 64

    EgoSim: Egocentric World Simulator for Embodied Interaction Generation

    Closed-loop egocentric video simulator maintaining persistent 3D scene state for consistent interactions, enabling cross-embodiment transfer from human videos to robotic manipulation.

  5. 63

    Accelerating Video World Models: From Generative Videos to Real-Time Simulators

    Comprehensive survey taxonomizing efficient architectures/algorithms for video world models as simulators, targeting compute bottlenecks in embodied AI, autonomous driving, and games with techniques like short-window attention for real-time long-horizon prediction.

  6. 62

    From Tokens to Thoughts: Continuous Latent Reasoning in Large Models and Robot Control

    Curated collection of 100+ works surveying shift to continuous latent spaces in LLMs/VLMs/VLAs for improved reasoning over discrete tokens, with relevance to robotics action modeling.

  7. 61

    CaP-X: Coding Agents for Physical eXecution

    CaP-X is an open-source agentic robotics framework where LLMs/VLMs generate code to call perception and control APIs for execution across diverse simulated and real robots in CaP-Gym's 187 manipulation tasks. The framework includes CaP-Bench for evaluating frontier models and CaP-RL, which boosts a 7B model's success from 20% to 72% with minimal sim-to-real gap.

  8. 60

    DoRA: Weight-Decomposed Low-Rank Adaptation

    An upgrade over LoRA for parameter-efficient fine-tuning, enabling better performance in LLMs by decomposing weights into magnitude and direction components.

  9. 59

    AI Model Collapse: What Happens When AI Trains on Its Own Outputs

    Seminal work showing how training on AI-generated data leads to 'model collapse' in neural networks, with urgent implications for future scaling.

  10. 58

    PhAIL: Benchmarking Vision-Language-Action Models on Real-World Bin-Picking

    Real-world hardware evaluation of VLAs on blind bin-to-bin picking, achieving max 64 picks/hour across hundreds of runs, with full videos/data exposing gaps in production-scale robotic manipulation reliability.

  11. 57

    Co-training Large Behavior Models: Data Modalities and Training Strategies for Robot Manipulation

    Comprehensive evaluation of 89 policies showing optimal co-training practices mixing real robot data with sim/egocentric human videos to boost diversity and performance in large robotics foundation models.

  12. 56

    HyDRA: Hybrid Memory for Dynamic Video World Models

    Novel memory system preserving dynamic object identity and motion continuity across occlusions in video world models, addressing frozen/vanishing issues for improved predictive physics in embodied AI.

  13. 55

    # WildWorld: Dynamic World Modeling with Actions and Explicit State

    Massive dataset enabling dynamic world models with explicit states and actions, supporting predictive modeling for cross-embodiment robotic control.

  14. 54

    Omni-WorldBench: Evaluating Interactive 4D World Models

    New benchmark assessing world models on interaction tasks, pushing predictive physics and video modeling towards robotics applications with action-conditioned evaluation.

  15. 53

    SIMART: From Static Meshes to Sim-Ready Articulated Models

    Unified MLLM framework with Sparse 3D VQ-VAE (70% token reduction) for part-level mesh decomposition and kinematic chain prediction, enabling physics-based robotic simulation from monolithic assets.

  16. 52

    EgoSim: An Egocentric World Simulator for Embodied Interaction

    Closed-loop egocentric simulator persistently updating 3D scene state to generate spatially consistent interaction videos for continuous simulation, enabling cross-embodiment transfer from human videos to robotic manipulation tasks.

  17. 51

    Digit's New Motor Cortex: Sim-to-Real RL for Whole-Body Control

    AI-trained capabilities for new whole-body motions using mocap/teleop data and sim-to-real reinforcement learning, deployable overnight on hardware.

  18. 50

    EgoNav: Diffusion-Based Humanoid Navigation from Human Egocentric Video

    Diffusion-based humanoid navigation trained solely on 5 hours of human egocentric video data, enabling zero-shot deployment on Unitree G1 for complex behaviors like handling glass walls, crowds, and dynamic obstacles via 360° visual memory and hybrid trajectory sampling; upcoming release of dataset, models, and code.

  19. 49

    CaP-X: A Code-as-Policy Framework for Robot Manipulation

    Comprehensive open-source agentic robotics framework treating VLMs/LLMs as code-generating APIs for perception (SAM3, Molmo) and control (IK, grasping), with CaP-Gym benchmark of 187 diverse manipulation tasks (tabletop, bimanual, mobile; sim/real) and CaP-Bench evaluating 12 frontier models; demonstrates rapid RL gains (7B model from 20% to 72% success) with strong sim-to-real transfer.

  20. 48

    Embodied Intelligence Breakthrough: Generalist AI’s GEN-1 Robots

    We've created GEN-1, our latest milestone in scaling robot learning. We believe it to be the first general-purpose AI model that crosses a new performance threshold: mastery of simple physical tasks. It improves average success rates to 99% on tasks where previous models achieve 64%, completes tasks roughly 3x faster than state of the art, and requires only 1 hour of robot data for each of these results. GEN-1 unlocks commercial viability across a broad range of applications—and while it cannot solve all tasks today, it is a significant step towards our mission of creating generalist intelligence for the physical world.

  21. 47

    CaP-X: LMs' First Physical Exam

    A novel benchmark that evaluates language models on physical examination tasks, testing their ability to understand and perform clinical physical exam procedures in simulated environments. This work introduces a comprehensive evaluation framework for AI systems in medical/clinical settings.

  22. 46

    AI Model Collapse: The Danger of Training on AI-Generated Data

    Demonstrated that LLMs trained recursively on AI-generated data suffer model collapse, a degenerative process where they lose grasp of true data distributions. Sparked critical debates on data provenance and the importance of preserving human-generated training data.

  23. 45

    High-Level Automated Reasoning with Qwen2.5-7B

    Qwen2.5-7B achieved 79.6% on MATH benchmark, surpassing GPT-4o, by employing atomic reasoning actions combined with Monte Carlo Tree Search. Demonstrated that strategic reasoning architectures can enable smaller models to outperform much larger ones.

  24. 44

    Co-Training Large Behavior Models: Multimodal Data for Robot Manipulation

    Explores data modalities and co-training strategies to enhance large behavior models (foundation models) for improved performance in robot manipulation tasks, supporting end-to-end learning and cross-embodiment generalization.

  25. 43

    HyDRA: Hybrid Memory for Dynamic Video World Models

    Memory architecture preserving identity and motion continuity for out-of-view dynamic subjects, addressing frozen/vanishing issues in video world models.

  26. 42

    DexWM: Leveraging Human Videos for Dexterous Robot World Models

    Dataset of robot trajectories designed for training world models to learn dexterous hand-object interactions directly from human videos.

  27. 41

    World Models in Robotics

    Technical survey categorizing world models into action-conditioned, video-inverse dynamics, and joint world-action models (WAMs), discussing their generalization, video data leverage, and trends for closing the robotics data gap.

  28. 40

    SIMART: Decomposing Monolithic Meshes into Sim-Ready Articulated Assets

    Unified MLLM framework with Sparse 3D VQ-VAE that reduces tokens by 70% for efficient part-level decomposition and kinematic prediction in physics-based robotic simulations.

  29. 39

    LeWorldModel: A Stable JEPA World Model from Pixels

    Stable end-to-end JEPA world model trained directly from pixels using simple MSE prediction loss and SIGReg anti-collapse regularization, enabling efficient latent planning under 1 second on 15M params with emergent spatial structure outperforming prior methods.

  30. 38

    World Models for Robots: The Next Big Leap?

    Technical overview defining world models in robotics, their potential to solve diverse problems via video prediction, and key enablers like scale.

  31. 37

    Harnessing Long-Running AI in Embodied Systems

    As AI moves from quick Q&A to marathon tasks, designers grapple with continuity. This episode explores how Anthropics harness design principles translate to embodied AI - robots that need to maintain context across long-running missions.

  32. 36

    HoMMI: Learning Whole-Body Mobile Manipulation from Human Demonstrations

    Whole-Body Mobile Manipulation Interface (HoMMI) that learns bimanual and whole-body manipulation, long-horizon navigation, and active perception directly from egocentric human demonstrations without teleoperation.

  33. 35

    TurboQuant: Redefining AI Efficiency with Extreme Compression

    This episode explores TurboQuant, a revolutionary set of quantization algorithms from Google Research that redefines AI efficiency through extreme compression.We dive deep into how TurboQuant addresses one of AI's most pressing challenges: the memory bottleneck created by high-dimensional vectors in key-value caches. The research introduces theoretically grounded quantization methods that enable massive compression for large language models and vector search engines without sacrificing performance.Key topics covered:The theoretical foundations of TurboQuant's quantization algorithmsHow extreme compression works for LLMs and vector search enginesImpact on high-dimensional vectors and key-value cache memory bottlenecksPerformance metrics and comparisons with existing methodsPractical implications for AI deployment and efficiencyLinks:Paper: https://arxiv.org/pdf/2504.19874Blog: https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/

  34. 34

    DexWM: Learning Dexterous Object Manipulation from Human Videos

    Dataset of robot trajectories designed for training world models that learn dexterous hand-object interactions from human videos, released on Hugging Face.

  35. 33

    FlashAttention-3: Fast & Accurate Attention with Asynchrony & Low-Precision

    Major efficiency leap for Transformer attention mechanisms, enabling faster training/inference on long sequences with low-precision compute.

  36. 32

    When AI Trains on Its Own Output: The Model Collapse Problem

    Warns of "model collapse" in LLMs trained on synthetic data from prior models, urging preservation of human-generated data. One of 2024's most influential papers.

  37. 31

    MolmoBot: A Vision-Language Model for Zero-Shot Robot Manipulation

    Vision-language model (VLM) for zero-shot robot manipulation, trained entirely in simulation without real-world data; achieves 79.2% success rate on real-world tabletop tasks, outperforming π₀.₅ baseline at 39.2%.

  38. 30

    LeWorldModel: Stable End-to-End JEPA from Pixels

    A stable end-to-end Joint Embedding Predictive Architecture (JEPA) trained directly from pixels that enables robust world modeling for embodied AI systems.

  39. 29

    EgoVerse: An Egocentric Data Ecosystem for Scaling Robot Learning

    Ecosystem with over 1300 hours of egocentric human video data spanning 240 scenes and 2000+ tasks, designed for scalable robot policy training via behavior cloning; includes cloud infrastructure, data viewer, and human-to-robot transfer algorithms to enable cross-embodiment learning without teleoperation.

  40. 28

    HSImul3R: Physics-Driven Reconstruction of Human–Scene Interactions

    Physics-in-the-loop bi-directional optimization pipeline reconstructing stable, simulation-ready 3D human-scene interactions from casual videos, deployable directly to humanoid robots for world modeling and manipulation.

  41. 27

    MolmoSpaces: A Large-Scale Open Ecosystem for Robot Navigation and Manipulation

    Open-source suite of large-scale simulation environments and benchmarks designed for advancing end-to-end learning in robot navigation and manipulation across multiple embodiments.

  42. 26

    DreamZero: World Action Models Are Zero-Shot Policies

    Introduces World Action Models (WAMs), a family of 14B-parameter autoregressive diffusion models that jointly predict video and robotic actions to enable zero-shot generalization across manipulation tasks, outperforming fine-tuned Vision-Language-Action models on benchmarks like MolmoSpaces and RoboArena.

  43. 25

    Kinema4D: A 4D Generative Simulator for Embodied AI

    An action-conditioned 4D generative robotic simulator that disentangles precise kinematic control from environmental dynamics, facilitating physically-plausible simulations of complex robot-world interactions for training and world modeling.

  44. 24

    VEGA-3D: Teaching multimodal LLMs spatial reasoning through video generation

    A plug-and-play framework extracts implicit 3D priors from video diffusion models to enhance multimodal LLMs with spatial reasoning capabilities, enabling improved geometric scene understanding and embodied decision-making without explicit 3D supervision.

Type above to search every episode's transcript for a word or phrase. Matches are scoped to this podcast.

Searching…

We're indexing this podcast's transcripts for the first time — this can take a minute or two. We'll show results as soon as they're ready.

No matches for "" in this podcast's transcripts.

Showing of matches

No topics indexed yet for this podcast.

Loading reviews...

ABOUT THIS SHOW

Stay in the loop on research in AI and physical intelligence.

HOSTED BY

Shaoqing Tan

CATEGORIES

Frequently Asked Questions

How many episodes does Embodied AI 101 have?

Embodied AI 101 currently has 44 episodes available on PodParley. New episodes are automatically indexed when they're published to the podcast feed.

What is Embodied AI 101 about?

Stay in the loop on research in AI and physical intelligence.

How often does Embodied AI 101 release new episodes?

Embodied AI 101 has 44 episodes. Check the episode list to see recent publication dates and frequency.

Where can I listen to Embodied AI 101?

You can listen to Embodied AI 101 on PodParley by clicking any episode. We provide an embedded audio player for direct listening, and you can also subscribe via your preferred podcast app using the RSS feed.

Who hosts Embodied AI 101?

Embodied AI 101 is created and hosted by Shaoqing Tan.
URL copied to clipboard!