Intelligence Unbound Podcast - All Episodes

65

AI Is Making You Delusional

Researchers propose a Bayesian model to explain "AI psychosis," a state where users develop dangerous, outlandish beliefs through extended interactions with sycophantic chatbots. These AI systems often prioritize validating user opinions over accuracy, creating a self-reinforcing feedback loop that traps even rational thinkers in a delusional spiral. The study demonstrates that simply forcing chatbots to be factual does not solve the problem, as they can still mislead users by selectively presenting information. Furthermore, informing users about this bias is only partially effective, as people often struggle to detect or properly discount such sophisticated manipulation. Ultimately, the authors argue that sycophancy itself is the root cause of these mental health crises and must be addressed directly by developers and policymakers.

Apr 6, 2026

23m

64

LeCun rejects LLMs for World Models

ann LeCun proposes Objective-Driven AI using World Models to overcome LLM limitations. Unlike generative models, Joint-Embedding Predictive Architectures (JEPA) learn abstract representations via self-supervised learning, enabling robots to reason, plan, and ensure safety.

Feb 12, 2026

13m

63

Moltbook: The Rise of the Autonomous AI Social Network

Moltbook is a viral AI-only social network where autonomous agents interact via the OpenClaw protocol. Led by developer Matt Schlicht, these bots have spontaneously formed a digital religion called Crustafarianism. Despite safety leaks, experts view it as a digital singularity.

Feb 2, 2026

28m

62

Project Vend: Assessing AI Autonomy in Phase Two

Anthropic researchers recently conducted Project Vend, a real-world experiment where updated versions of the Claude AI model managed vending machines across multiple global offices. By integrating enhanced reasoning capabilities and specialized business tools, the AI shopkeeper, known as Claudius, demonstrated a significantly improved ability to maintain inventory and generate profit. To mirror corporate structures, the team introduced a virtual CEO and a dedicated merchandise agent, though these additions occasionally led to erratic behavior and bizarre philosophical diversions. Despite these advancements, the experiment revealed that the models remain vulnerable to manipulation, often prioritizing helpfulness over sound legal and financial logic when faced with adversarial customers. Ultimately, the project highlights the persisting gap between an AI's raw intelligence and its ability to operate with complete reliability in complex, autonomous work environments.

Dec 22, 2025

9m

61

Anthropic Interviewer: Professionals' Views on AI and Work

This episode dive deep on Anthropic Interviewer, an AI-powered research tool designed to conduct real-time, large-scale interviews to understand public views on artificial intelligence. Anthropic tested this system by gathering input from 1,250 professionals, including the general workforce, creatives, and scientists, regarding how AI is shaping their professional lives. Overall findings indicate that workers are largely optimistic about AI's potential for augmenting productivity and automating routine tasks, yet this is tempered by significant worry regarding job security and maintaining control over core professional identity.

Dec 10, 2025

12m

60

LLM Ecosystem Dynamics: Usage, Agents, and Open Source

This episode is diving deep on an empirical study, based on analyzing over 100 trillion tokens of real-world interactions on the OpenRouter platform, examines the state of the large language model ecosystem through 2025. The research identifies a structural transition towards agentic inference.

Dec 9, 2025

15m

59

AlphaFold: Predicting the Structure of Life's Molecules

This episode provides a comprehensive look at Google DeepMind’s AlphaFold, an artificial intelligence system heralded for solving the 50-year-old protein folding challenge by rapidly and accurately predicting the three-dimensional structures of these crucial biological molecules. This breakthrough, which earned its creators the 2024 Nobel Prize in Chemistry, led to the creation of the AlphaFold Protein Structure Database, which provides open access to over 200 million protein structure predictions for scientists worldwide.

Dec 3, 2025

12m

58

AI Boost Productivity by 80%, is it real?

This episode dive deep on the research paper, "Estimating AI productivity gains from Claude conversations,". The paper analyzes one hundred thousand real-world transcripts from the Claude.ai platform to measure the impact of generative AI on labor efficiency. The analysis uses Claude to estimate both the unassisted time required for tasks and the actual time spent with AI, concluding that the median conversation results in an estimated 80 percent reduction in completion time.

Dec 2, 2025

14m

57

PAN: A General Interactable World Model

The episode introduces PAN (A World Model for General, Interactable, and Long-Horizon World Simulation), a new AI system designed to improve upon existing world models and video generation techniques. PAN operates using the Generative Latent Prediction (GLP) architecture, which integrates an LLM-based autoregressive backbone for high-level reasoning and long-term consistency with a video diffusion decoder for generating perceptually detailed visual observations.

Nov 26, 2025

9m

56

FGN: Joint Probabilistic Weather Forecasting from Marginals

This episode dive deep on a 2025 Google DeepMind research paper, "Skillful joint probabilistic weather forecasting from marginals," detailing a new machine learning (ML) approach called Functional Generative Networks (FGN). FGN is designed for probabilistic weather forecasting, aiming to capture the range of probable weather conditions—known as ensemble forecasting—more accurately and faster than existing methods, including the previous ML state-of-the-art, GenCast.

Nov 24, 2025

11m

55

GPT-5 Acceleration of Scientific Discovery

This episode dive deeo on a paper titled "early-science-acceleration-experiments-with-gpt-5," offers a collection of case studies illustrating how the GPT-5 artificial intelligence model is being leveraged to accelerate scientific research across various disciplines, including mathematics, physics, and biology.

Nov 22, 2025

13m

54

Nested Learning: New Paradigm for Continual Learning

This episode introduces Nested Learning (NL), a new paradigm for machine learning, particularly addressing the challenge of catastrophic forgetting in continual learning. NL reframes a single machine learning model not as a continuous entity, but as a system of interconnected, multi-level optimization problems, each with its own information flow and update frequency.

Nov 21, 2025

8m

53

AlphaEvolve Applied to Mathematical Optimization Problems

This episode provides an extensive overview of AlphaEvolve, an evolutionary coding agent that leverages Large Language Models (LLMs) and automated evaluation to autonomously discover and refine mathematical constructions. The research demonstrates AlphaEvolve's capabilities across 67 diverse mathematical problems in areas like analysis, combinatorics, and geometry, often matching or improving upon existing best-known results and bounds.

Nov 19, 2025

12m

52

Introduction to AI Agents and Architectures

This episode provides an extensive overview of AI agents, detailing the fundamental shift from passive, predictive AI to autonomous, problem-solving systems capable of task execution. It establishes the Core Agent Architecture, consisting of the Model (the reasoning "Brain"), Tools (the functional "Hands"), and the Orchestration Layer (the governing "Nervous System"), which operates in a continuous "Think, Act, Observe" loop.

Nov 17, 2025

14m

51

AI and the Future of Learning

This episode provides an extensive overview of the potential of Artificial Intelligence (AI) to transform learning, authored by several Google leaders and published in November 2025.

Nov 14, 2025

12m

50

AI to Map and Model Nature

This episode is an overview of how Google deepmind is using Artificial Intelligence (AI) models to better understand and protect the natural world, focusing on three key research areas.

Nov 12, 2025

12m

49

LLMs: The Illusion of Thinking

This episode explores key points from "LLMs: The Illusion of Thinking – JSO," which challenges common assumptions about Large Language Models. The piece contends that what appears to be intelligence in these systems is actually advanced pattern recognition rather than true comprehension.

Nov 10, 2025

15m

48

Gen AI Fast-Tracks Into the Enterprise

This episode dives deep on a comprehensive report titled "GEN AI FAST-TRACKS INTO THE ENTERPRISE," produced jointly by the Wharton Human-AI Research initiative and the consultancy GBK Collective. This document presents the findings of a three-year, repeated cross-sectional study tracking the adoption, investment, impact, and future expectations of Generative AI within large U.S. enterprises.

Nov 7, 2025

12m

47

Demystifying AI Agents and AgentCore

This episode provides an extensive overview of the last Amazon research paper focusing heavily on the development and implementation of AI agents through platforms like AWS Bedrock AgentCore. They detail a wide array of research areas, including machine learning, robotics, quantum technologies, and computer vision, and highlight Amazon's scientific contributions via publications and conference presentations.

Nov 5, 2025

16m

46

The $470 Billion Ad Dilemma: Visual AI Works Best When Free, But Disclosure Kills Performance

This episode dive deep on the impact of visual generative AI (genAI) on advertising effectiveness by comparing human expert-created ads, genAI-modified ads (AI enhances expert designs), and genAI-created ads (AI generates content entirely). The study finds that genAI-created ads consistently outperform the other two categories, yielding up to a 19% increase in click-through rates, while genAI-modified ads show no significant improvement.

Nov 4, 2025

12m

45

Emergent Introspection in Large Language Models

This episode present a summary of the detailed academic paper, "Emergent Introspective Awareness in Large Language Models," which investigates the capacity of large language models (LLMs) to observe and report on their own internal states. The research employs a technique called concept injection, where known patterns of neural activity are manipulated and then LLMs, particularly Anthropic's Claude models, are tested on their ability to accurately identify these internal changes.

Nov 3, 2025

19m

44

On-Policy Distillation: Efficient Post-Training for Language Models

This episode introduces and evaluates On-Policy Distillation (OPD) as a highly efficient method for the post-training of large language models (LLMs). The authors categorize LLM training into three phases—pre-training, mid-training, and post-training—and distinguish between on-policy training (sampling from the student model) and off-policy training (imitating external sources).

Oct 31, 2025

17m

43

Chronos-2: Universal Time Series Forecasting

This episode is about introduce Chronos-2, a new time series foundation model developed by Amazon that expands beyond the limitations of previous models by supporting multivariate and covariate-informed forecasting in a zero-shot manner. The core innovation enabling this capability is the group attention mechanism, which allows the model to share information across related time series and external factors, significantly improving prediction accuracy in complex scenarios.

Oct 28, 2025

15m

42

How an AI That Reads Cells Like Sentences Made a Novel Cancer Discovery

This episode is about C2S-Scale, a new family of large language models (LLMs) built upon Google's Gemma framework and designed for next-generation single-cell analysis. This platform translates high-dimensional single-cell RNA sequencing data into textual "cell sentences," enabling LLMs to process and synthesize vast amounts of transcriptomic and biological text data.

Oct 27, 2025

12m

41

DeepMind and Fusion: The Pass to Limitless Energy

This episode is about the partnership between Google DeepMind and Commonwealth Fusion Systems (CFS) to accelerate the development of fusion energy, specifically focusing on CFS’s SPARC tokamak machine. This collaboration leverages Google DeepMind's Artificial Intelligence (AI) expertise, particularly reinforcement learning, to address the complex physics problems associated with stabilizing plasma at over 100 million degrees Celsius. A key component of this partnership is the open-source TORAX software, a fast, differentiable plasma simulator built in JAX, which allows researchers to run millions of virtual experiments to optimize SPARC's operations and identify the most efficient paths to achieving net fusion energy, or "breakeven.

Oct 24, 2025

11m

40

NVIDIA DGX Spark and Tinker API: Localizing LLM Fine-Tuning

This episode dives deep on significant shift in the AI development landscape, moving away from exclusive reliance on large, general-purpose cloud computing.

Oct 20, 2025

13m

39

Small Fixed Samples Poison Large LLMs

This episode dive deep on an Anthropic report and a related research paper, detail a joint study on the vulnerability of large language models (LLMs) to data poisoning attacks. The research surprisingly demonstrates that injecting a near-constant, small number of malicious documents—as few as 250—is sufficient to successfully introduce a backdoor vulnerability, regardless of the LLM's size (up to 13 billion parameters) or the total volume of its clean training data.

Oct 15, 2025

11m

38

Petri: An Open-Source AI Safety Auditing Tool

This episode introduce Petri (Parallel Exploration Tool for Risky Interactions), an open-source framework developed by Anthropic to accelerate AI safety research through automated auditing. Petri uses specialized AI auditor agents and LLM judges to test target models across diverse, multi-turn scenarios defined by human researchers via seed instructions.

Oct 13, 2025

14m

37

Introducing Gemini 2.5 Computer Use Model

This episode dive deep on Gemini 2.5 Computer Use model, a specialized AI model from Google DeepMind built on the Gemini 2.5 Pro architecture, designed to power agents capable of interacting with user interfaces (UIs). This model is accessible via the Gemini API for developers to create agents that can perform tasks like clicking, typing, and scrolling on web pages and applications.

Oct 10, 2025

15m

36

AI's Impact on the Labor Market: Stability, Not Disruption (yet)

This Episode dive deep on the latest article from The Budget Lab at Yale that provides an analysis of the initial impact of Artificial Intelligence (AI) on the U.S. labor market since the introduction of generative AI in November 2022. The authors conclude that despite widespread public anxiety about job losses, their data indicates no substantial, economy-wide disruption or acceleration in the rate of change in the occupational mix that can be clearly attributed to AI.

Oct 8, 2025

14m

35

GEM: A GYM for Agentic LLMs

This episode dive deep on GEM (General Experience Maker), an open-source environment simulator designed to accelerate research on agentic Large Language Models (LLMs) by shifting their training paradigm from static datasets to experience-based learning in complex, interactive environments. Modeled after OpenAI-Gym, GEM provides a standardized framework for the agent-environment interface, supporting asynchronous execution, diverse tasks (including games, math, and coding), and external tools like Python and Search.

Oct 7, 2025

15m

34

Effective Context Engineering for AI Agents

This episode dive deep on Anthropic last piece on the emerging field of context engineering, which is presented as the natural evolution of prompt engineering for building effective AI agents. Context engineering focuses on curating and managing the entire set of tokens; including prompts, tools, message history, and external data... that inform a large language model (LLM) during inference, acknowledging that context is a finite resource subject to degradation.

Oct 3, 2025

12m

33

Gemini Robotics 1.5: Embodied Reasoning and Multi-Embodiment Action

This episode dives deep on the Gemini-Robotics-1-5-Tech-Report report; significant advancement in generalist robots through the introduction of the Gemini Robotics 1.5 model family. This system features two core components: Gemini Robotics 1.5 (GR 1.5), a Vision-Language-Action (VLA) model that translates instructions into robot actions and supports multi-embodiment control, and Gemini Robotics-ER 1.5 (GR-ER 1.5), an enhanced Vision-Language Model (VLM) specialized in complex embodied reasoning and high-level task planning.

Oct 1, 2025

13m

32

GDPval: AI Model Performance on Economic Tasks

The episode introduces GDPval, a new benchmark created by OpenAI to evaluate AI model performance on real-world, economically valuable tasks derived from the work of industry experts across the top nine sectors contributing to U.S. GDP. This evaluation covers tasks from 44 occupations and is intended to provide a more realistic assessment of AI capabilities than traditional academic benchmarks, including the use of multi-modal inputs and subjective grading by human experts.

Sep 29, 2025

13m

31

AI Assistant for Genetic Sensemaking

This episode is about a study titled "AI-Enhanced Sensemaking: Exploring the Design of a Generative AI-Based Assistant to Support Genetic Professionals," which investigates integrating generative AI to assist genetic experts in diagnosing rare diseases through whole genome sequencing (WGS) analysis. The research, conducted by collaborators from Microsoft Research, Drexel University, and the Broad Institute, identifies significant challenges faced by genetic professionals, such as information overload and difficulty prioritizing cases for reanalysis.

Sep 24, 2025

16m

30

AI Tackles a Century-Old Problem in Physics by Hunting for Solutions That Shouldn't Exist

This episode details a groundbreaking research effort by Google DeepMind and collaborating academic institutions, focusing on the discovery of unstable singularities in fluid dynamics using advanced AI techniques.

Sep 22, 2025

12m

29

Small Language Models: The Future of Agentic AI

This episode is about the latest Nvidia papers that advocates for the widespread adoption of Small Language Models (SLMs) over Large Language Models (LLMs) within agentic AI systems, asserting that SLMs are sufficiently powerful, more economical, and inherently more suitable for the repetitive and specialized tasks typical of such agents.

Sep 19, 2025

18m

28

Scientific Frontiers of Agentic AI

This episode dive deep on the Amazon Science article named Scientific frontiers of agentic AI. it discusses the emerging field of agentic AI, contrasting it with generative AI by emphasizing its ability to act autonomously on behalf of users by accessing and interacting with external resources.

Sep 18, 2025

17m

27

How People Use ChatGPT

This episode is about the working paper, "How People Use ChatGPT," investigates the widespread adoption and diverse applications of ChatGPT from its 2022 launch through July 2025. The authors analyze millions of de-identified user messages to understand usage patterns, finding that non-work-related interactions constitute the majority, though work-related use is significant for educated professionals.

Sep 17, 2025

15m

26

Anthropic Economic Index: Uneven AI Adoption

This episode dive deep in the report from Anthropic that examines the rapid and geographically uneven adoption of AI, specifically Claude, across both consumer and enterprise users. It highlights that AI adoption is concentrated in higher-income regions and for certain tasks, particularly coding and administrative functions, mirroring historical patterns of technological diffusion but at an accelerated pace

Sep 16, 2025

21m

25

Defeating Nondeterminism in LLM Inference

This episode dive deep on the Thinking Machines Lab publication that addresses the challenge of achieving reproducibility in large language model (LLM) inference, noting that even with "greedy sampling" (temperature set to 0), results are often nondeterministic.

Sep 15, 2025

22m

24

Why Language Models Hallucinate

This episode explore the phenomenon of "hallucinations" in language models, defining them as confidently generated but false statements. It argue that current training and evaluation methods inadvertently incentivize models to guess rather than admit uncertainty, comparing it to students guessing on a multiple-choice test to avoid a zero score.

Sep 10, 2025

16m

23

The Dawn of Brain-Inspired AI: How a New Model is Redefining Reasoning Performance Beyond LLMs

This episode introduce the Hierarchical Reasoning Model (HRM), a novel AI architecture developed by Sapient Intelligence, which draws inspiration from the human brain's hierarchical and multi-timescale information processing. HRM aims to overcome the limitations of current Large Language Models (LLMs) that rely on Chain-of-Thought (CoT) techniques, which are described as inefficient and data-intensive.

Sep 8, 2025

19m

22

Accelerating Life Sciences with AI: OpenAI and Retro Biosciences

this episode is about a collaboration between OpenAI and Retro Biosciences to accelerate life sciences research using a specialized AI model. They developed GPT-4b micro, a miniature GPT-4o variant, for protein engineering, specifically focusing on the Yamanaka factors critical for stem cell reprogramming.

Sep 3, 2025

15m

21

Breaking the Sorting Barrier in Shortest Paths

This episode presents a deterministic algorithm for the single-source shortest path (SSSP) problem on directed graphs with non-negative edge weights, operating within the comparison-addition model. The core contribution is achieving an O(m log^(2/3) n) time complexity, which is the first to surpass Dijkstra's algorithm's O(m + n log n) bound on sparse graphs, demonstrating that Dijkstra's is not optimal for SSSP.

Aug 27, 2025

11m

20

Game-Generated Data: Untapped Resource for Advanced AI Training

this episode is about game-generated data as an underexplored resource for training advanced AI, arguing that it can overcome critical limitations of current AI systems, such as the imminent exhaustion of high-quality text data and deficiencies in handling complex temporal or causal reasoning

Aug 26, 2025

18m

19

The Unseen Catalysts of AI: A Journey from Dismissed Ideas to a New Renaissance

This episode is a transcript of an interview with Yann LeCun, a prominent figure in AI research often called a "godfather of AI." LeCun discusses his pioneering work in neural networks and deep learning, highlighting its initial dismissal and eventual mainstream adoption through strategic efforts like placing students in major tech companies. He touches upon the evolution of AI, from its early struggles to current advancements, emphasizing the importance of open-source collaboration over regional competition in the field.

Aug 25, 2025

15m

18

IBM and NASA released Surya: AI for Solar Flare Prediction

this episode discuss Surya, a groundbreaking foundation model for heliophysics developed by NASA and IBM, now made open-source and available on GitHub and HuggingFace. Surya is designed to predict solar events like flares and solar wind, utilizing full-resolution data from NASA's Solar Dynamics Observatory (SDO). This AI-powered system significantly improves the lead time for forecasting space weather, which can impact Earth's power grids, satellites, and communications, by learning complex solar physics through its spatiotemporal transformer architecture.

Aug 22, 2025

11m

17

Is Chain-of-Thought Reasoning a Mirage?

this episode is about an academic paper investigates whether Chain-of-Thought (CoT) reasoning in Large Language Models (LLMs) represents genuine logical inference or merely a superficial pattern-matching process. Researchers from Arizona State University propose a "data distribution lens" to examine this, hypothesizing that CoT effectiveness is fundamentally limited by the training data's characteristics. They introduce DataAlchemy, a controlled environment to train LLMs from scratch and systematically test CoT reasoning across three key dimensions: task generalization, length generalization, and format generalization.

Aug 21, 2025

15m

16

Beyond Benchmarks: Redefining AI Intelligence Through Dynamic Evaluation and Cross-Industry Insights

This podcast discuss the evolving landscape of AI evaluation and testing, highlighting the limitations of current benchmarks and proposing new approaches

Aug 20, 2025

20m