AI, Reasoning or Rambling?

from muckrAIkers · host Jacob Haimes and Igor Krawczuk

In this episode, we redefine AI's "reasoning" as mere rambling, exposing the "illusion of thinking" and "Potemkin understanding" in current models. We contrast the classical definition of reasoning (requiring logic and consistency) with Big Tech's new version, which is a generic statement about information processing. We explain how Large Rambling Models generate extensive, often irrelevant, rambling traces that appear to improve benchmarks, largely due to best-of-N sampling and benchmark gaming.Words and definitions actually matter! Carelessness leads to misplaced investments and an overestimation of systems that are currently just surprisingly useful autocorrects.(00:00) - Intro (00:40) - OBB update and Meta's talent acquisition (03:09) - What are rambling models? (04:25) - Definitions and polarization (09:50) - Logic and consistency (17:00) - Why does this matter? (21:40) - More likely explanations (35:05) - The "illusion of thinking" and task complexity (39:07) - "Potemkin understanding" and surface-level recall (50:00) - Benchmark gaming and best-of-n sampling (55:40) - Costs and limitations (58:24) - Claude's anecdote and the Vending Bench (01:03:05) - Definitional switch and implications (01:10:18) - Outro LinksApple paper - The Illusion of ThinkingICML 2025 paper - Potemkin Understanding in Large Language ModelsPreprint - Large Language Monkeys: Scaling Inference Compute with Repeated SamplingTheoretical understandingMax M. Schlereth Manuscript - The limits of AGI part IIPreprint - (How) Do Reasoning Models Reason?Preprint - A Little Depth Goes a Long Way: The Expressive Power of Log-Depth TransformersNeurIPS 2024 paper - How Far Can Transformers Reason? The Globality Barrier and Inductive ScratchpadEmpirical explanationsPreprint - How Do Large Language Monkeys Get Their Power (Laws)?Andon Labs Preprint - Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous AgentsLeapLab, Tsinghua University and Shanghai Jiao Tong University paper - Does Reinforcement Learning Really Incentivize Reasoning CapacityPreprint - RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMsPreprint - Mind The Gap: Deep Learning Doesn't Learn DeeplyPreprint - Measuring AI Ability to Complete Long TasksPreprint - GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language ModelsOther sourcesZuck's Haul webpage - Meta's talent acquisition trackerHacker News discussion - Opinions from the AI communityInterconnects blogpost - The rise of reasoning machinesAnthropic blog - Project Vend: Can Claude run a small shop?