PODCAST · science

AI Research Today

by Aaron

AI Research Today unpacks the latest advancements in artificial intelligence, one paper at a time. We go beyond abstracts and headlines, walking through architectures, experiments, training details, ablations, failure modes, and the implications for future work. Each episode will choose between one and three new, impactful research papers and go through them in depth. We will discuss the papers at the level of an industry practitioner or AI researcher. If you want to understand the newest topics in AI research but don't have the time to dig through the papers yourself, this is your solution.

Subscribe · 0 Bookmark

9

OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language Environment Simulation

Send us Fan MailIn this episode, we break down the new paper “OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language Environment Simulation,” which explores how AI agents can be benchmarked across real occupational domains like healthcare, logistics, manufacturing, customs processing, and more.The paper introduces OccuBench, a large-scale benchmark spanning 100 professional task scenarios across 65 specialized domains. One of the most interesting ideas is the use of Language Environment Simulators (LESs), where LLMs simulate enterprise environments and tool responses for domains that normally have no public APIs or accessible evaluation environments.We discuss:Why current agent benchmarks miss most real-world enterprise workHow simulated environments can evaluate professional AI agentsFault injection testing and robustness evaluationCross-industry capability differences between frontier modelsWhat this means for autonomous enterprise systems and AI agents in productionPaper:https://arxiv.org/abs/2604.10866PDF:https://arxiv.org/pdf/2604.10866Arkitekt AI:arkitekt-ai.comContact:[email protected]

May 12, 2026

32m
8

GradMem: Teaching LLMs to Remember (Without Retraining)

Send us Fan MailIn this episode, we break down GradMem, a new approach to memory in large language models: https://arxiv.org/pdf/2603.13875v1Instead of relying on the transformer KV cache or repeatedly reprocessing documents (like in RAG), GradMem introduces a different idea—learn a compact memory representation at inference time. Using a few steps of gradient descent, the model “writes” important information from a context into a small set of memory tokens, allowing it to answer future queries without needing the original context.We cover: Why KV cache is a brute-force solution to long context How test-time optimization turns memory into something learnable The difference between storing text vs. storing information What this means for agents, RAG systems, and long-horizon tasks Big takeaway:Instead of reading context over and over, models can learn to compress and reuse it intelligently.Learn more / build with AIhttps://www.arkitekt-ai.com/

Apr 23, 2026

29m
7

Language Models are Injective and Hence Invertible

Send us Fan MailIn this episode, we break down a fascinating new result from recent research: that modern Transformer language models are almost surely injective—meaning different prompts map to unique internal representations, with no information loss.We dig into the paper:Read the paper on arXivAt the core of the proof is a surprisingly deep mathematical idea: Transformers are real analytic functions of their parameters, which allows researchers to rigorously reason about when “collisions” (two prompts producing the same representation) can occur. The result? Collisions only happen on a measure zero set—mathematically possible, but practically never observed. We unpack:What it means for a function to be real analyticWhy this implies near-perfect uniqueness of representationsHow gradient descent preserves this property during trainingAnd what this says about interpretability, privacy, and reversibility of LLMsWe also explore the practical implications—if models are truly invertible, could we reconstruct inputs from activations? What does that mean for safety and data leakage?About the HostThis episode is brought to you by Arkitekt AI — an automated enterprise software development platform that builds full analytics, ML, and data systems from natural language.Learn more: https://arkitekt-ai.com

Mar 23, 2026

26m
6

Learning to Reason in 13 Parameters

Send us Fan MailLink to arxiv: https://arxiv.org/pdf/2602.04118Large language models have recently shown impressive reasoning abilities, often learned through reinforcement learning and low-rank adaptation techniques like LoRA. But these approaches still assume that effective reasoning requires relatively large adaptation layers. This new paper challenges that assumption by asking a provocative question: how small can a reasoning update really be?In this episode, we explore Learning to Reason in 13 Parameters, which introduces TinyLoRA, a method that compresses low-rank adapters down to the extreme — in some cases to just a single parameter. Instead of relying on large adaptation matrices, TinyLoRA demonstrates that reasoning behavior can be steered using ultra-minimal parameter updates, dramatically reducing the computational and memory footprint required to teach models new reasoning skills. We break down:Why conventional LoRA and low-rank adapters hit a floor at model dimensionality,How TinyLoRA scales reasoning adapters down to near-zero parameter counts,What this reveals about where reasoning ability actually lives inside neural networks,And why tiny adaptation layers could reshape efficient fine-tuning, on-device intelligence, and rapid deployment.The results suggest that reasoning competence may not require massive structural changes — only precisely targeted parameter nudges. This challenges assumptions about scaling, efficiency, and the true complexity of learned reasoning.

Feb 16, 2026

26m
5

SPIRAL: Symbolic LLM Planning via Grounded and Reflective Search

Send us Fan MailLarge Language Models often struggle with complex planning tasks that require exploration, backtracking, and self-correction. Once an LLM commits to an early mistake, its linear chain-of-thought reasoning makes recovery difficult. While search methods like Monte Carlo Tree Search (MCTS) offer a way to explore alternatives, they typically rely on sparse rewards and fail to fully exploit the semantic strengths of language models.In this episode, we dive into SPIRAL (Symbolic LLM Planning via Grounded and Reflective Search), a new framework that fundamentally rethinks how planning and search interact in LLM-based agents. Instead of treating MCTS as a brute-force optimizer, SPIRAL embeds a cognitive architecture of three specialized LLM roles directly into the search loop:A Planner proposes creative next actions,A Simulator grounds those actions by predicting realistic outcomes, andA Critic reflects on the results to provide dense, informative reward signals.This planner–simulator–critic loop transforms search into a guided, self-correcting reasoning process, allowing agents to recover from mistakes, evaluate alternatives more effectively, and plan with far greater robustness.Paper link: https://arxiv.org/pdf/2512.23167Repo: https://github.com/IBM/SPIRAL

Jan 26, 2026

28m
4

Meta-RL Induces Exploration In Language Agents

Send us Fan MailEpisode Paper: https://arxiv.org/pdf/2512.16848In this episode, we dive into a cutting-edge AI research breakthrough that tackles one of the biggest challenges in training intelligent agents: how to explore effectively. Standard reinforcement learning (RL) methods help language model agents learn to interact with environments and solve multi-step tasks, but they often struggle when the tasks require active exploration—that is, learning what to try next when the best strategy isn’t obvious from past experience. The new paper introduces LaMer, a Meta-Reinforcement Learning (Meta-RL) framework designed to give language agents the ability to learn how to explore. Unlike conventional RL agents that learn a fixed policy, LaMer’s Meta-RL approach encourages agents to flexibly adapt by learning from their own trial-and-error experiences. This means agents can better adapt to novel or more difficult environments without needing massive retraining. We’ll explain:Why exploration is critical for long-horizon tasks with delayed or sparse rewards. How Meta-RL shifts the focus from fixed policies to adaptable exploration behavior. What LaMer’s results suggest about learned exploration and generalization in AI systems. Whether you’re into reinforcement learning, multi-agent systems, or the future of adaptive AI, this episode breaks down how Meta-RL could help agents think more like explorers—not just pattern followers.

Jan 12, 2026

29m
3

DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Send us Fan MailIn this episode, we unpack DeepSearch, a new paradigm in reinforcement learning with verifiable rewards (RLVR) that aims to overcome one of the biggest bottlenecks in training reasoning-capable AI systems. Traditional reinforcement learning methods often plateau after extensive training because they rely on sparse exploration and limited rollouts, leaving critical reasoning paths undiscovered and unlearned.DeepSearch turns this model training approach on its head by embedding Monte Carlo Tree Search (MCTS) directly into the training loop—not just at inference time. This fundamentally changes how models explore the space of possible solutions: instead of brute-force parameter scaling or longer training runs, DeepSearch uses structured, systematic exploration to dramatically improve learning efficiency.We break down how DeepSearch:Injects tree search into training, enabling richer exploration of reasoning paths.Uses a global frontier strategy to prioritize promising reasoning trajectories.Improves training-time credit assignment, so models learn not only from success but from strategic exploration itself.Achieves impressive results on benchmarks for mathematical reasoning, setting new state-of-the-art performance and using fewer computational resources. Whether you’re a machine learning researcher, an AI enthusiast, or just curious about the future of intelligent systems, this episode explores how search-augmented learning could redefine how future AI systems master complex reasoning problems.DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Dec 29, 2025

37m
2

Transformer-Squared: Self-Adaptive LLMs

Send us Fan MailIn this episode we’re diving into “Transformer-Squared: Self-Adaptive LLMs” — a new framework for adapting large language models to unseen tasks on the fly by tuning only a small part of their weights. The central idea is Singular Value Fine-Tuning (SVF), a parameter-efficient fine-tuning technique that decomposes each weight matrix with Singular Value Decomposition (SVD) and then only trains a small vector that scales the singular values. These vectors become compact “expert” modules that specialize in different tasks and, unlike traditional methods like LoRA, can be composed, mixed, and reused because they’re in a principled, orthogonal basis.During inference, Transformer-Squared runs a two-pass process — the first pass identifies the task or context, and the second pass combines the appropriate expert vectors to dynamically adapt the model’s behavior in real time. Across benchmarks and architectures, SVF consistently outperforms LoRA despite requiring orders of magnitude fewer parameters, and the framework even shows versatility on multimodal tasks like vision-language.If you’re into efficient adaptation, reinforcement-learning optimization of model components, and self-organizing AI systems, this paper is a big step toward real-time adaptive foundation models. Read the full paper here: https://arxiv.org/pdf/2501.06252

Dec 11, 2025

39m
1

Nested Learning: The Illusion of Deep Learning Architectures

Send us Fan Mail NL.pdfIn this episode, we dive into Nested Learning (NL) — a new framework that rethinks how neural networks learn, store information, and even modify themselves. While modern language models have made remarkable progress, fundamental questions remain: How do they truly memorize? How do they improve over time? And why does in-context learning emerge at scale?Nested Learning proposes a bold answer. Instead of viewing a model as a single optimization problem, NL treats it as a hierarchy of nested, multi-level learning processes, each with its own evolving context flow. This perspective sheds new light on how deep models compress information, how in-context learning arises naturally, and how we might build systems with richer, higher-order reasoning abilities.We explore the paper’s three major contributions:• Deep Optimizers — A reinterpretation of classic optimizers like Adam and SGD-Momentum as associative memory systems that compress gradients. The authors introduce deeper, more expressive optimizers built directly from NL principles.• Self-Modifying Titans — A new type of sequence model that learns not just from data, but from its own update rules, enabling it to modify itself during training.• Continuum Memory System — A unified framework that extends the idea of short- vs long-term memory into a continuous space. Combined with self-modifying models, it leads to HOPE, a learning module showing strong results in language modeling, continual learning, and long-context reasoning.This episode breaks down what NL means for the future of AI, why it’s mathematically transparent and neuroscientifically inspired, and how it might open a new dimension in deep learning research.

Dec 1, 2025

50m
0

AgentEvolver: An Autonomous Agent Framework

Send us Fan Mailhttps://arxiv.org/pdf/2511.10395What if AI agents could teach themselves? In this episode, we dive into AgentEvolver, a groundbreaking framework from Alibaba's Tongyi Lab that flips the script on how we train autonomous AI agents.Traditional agent training is brutal: you need manually crafted datasets, expensive random exploration, and mountains of compute. AgentEvolver introduces a self-evolving system with three elegant mechanisms that let the LLM drive its own learning:Self-Questioning – The agent explores environments and generates its own tasks through curiosity-driven interaction, eliminating the need for hand-crafted training data.Self-Navigating – Instead of random exploration, the agent builds an experience pool, retrieves relevant past solutions, and uses hybrid rollouts that mix experience-guided and vanilla trajectories. They tackle the off-policy learning problem with selective boosting for high-performing trajectories.Self-Attributing – Fine-grained credit assignment that goes beyond simple trajectory-level rewards, using step-level attribution to figure out which specific actions and states actually contributed to success.We break down the advantage calculation mechanics, discuss how they handle the inference/learning sample mismatch through experience stripping, and explore why broadcasting trajectory advantages to token-level might be leaving performance on the table.The results are compelling: their 7B model outperforms much larger baselines on AppWorld and BFCL-v3 benchmarks while reducing training steps by up to 67%. This isn't just another incremental improvement – it's a fundamental shift from human-engineered training pipelines to LLM-guided self-improvement.Key topics: reinforcement learning for LLMs, experience replay, credit assignment, autonomous task generation, agent systems, GRPO/PPO optimization

Nov 24, 2025

41m

Type above to search every episode's transcript for a word or phrase. Matches are scoped to this podcast.

Searching…

We're indexing this podcast's transcripts for the first time — this can take a minute or two. We'll show results as soon as they're ready.

No matches for "" in this podcast's transcripts.

Showing of matches

No topics indexed yet for this podcast.

Loading reviews...

Share your thoughts

ABOUT THIS SHOW

HOSTED BY

Aaron

OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language Environment Simulation

GradMem: Teaching LLMs to Remember (Without Retraining)

Language Models are Injective and Hence Invertible

Learning to Reason in 13 Parameters

SPIRAL: Symbolic LLM Planning via Grounded and Reflective Search

Meta-RL Induces Exploration In Language Agents

DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Transformer-Squared: Self-Adaptive LLMs

Nested Learning: The Illusion of Deep Learning Architectures

AgentEvolver: An Autonomous Agent Framework

Authentication Required