All Episodes
Best AI papers explained — 739 episodes
EVOLM: Self-Evolving Language Models through Co-Evolved Discriminative Rubrics
Personalized Alignment Revisited: The Necessity and Sufficiency of User Diversity
OGPO: Sample Efficient Full-Finetuning of Generative Control Policies
Adaptive Querying with AI Persona Priors
Rethinking the Role of LLMs in Time Series Forecasting
Robust Representation Learning through Explicit Environment Modeling
Magentic Marketplace: An Open-Source Environment for studying Agentic Markets
Hyperloop Transformers
Scaling Self-Play with Self-Guidance
RL Token: Bootstrapping Online RL with Vision-Language-Action Models
Agentic Data Environments
AI organizations are more effective but less aligned than individual agents
Text-to-Distribution Prediction with Quantile Tokens and Neighbor Context
Distortion of AI alignment revisited: RLHF is a decent utilitarian aligner
Llms get lost in multi-turn conversation
Transformers are inherently succint
The Coasean Singularity? Demand, Supply, and Market Design with AI Agents
Demystifying the unreasonable effectiveness of online alignment methods
Specialization after generalization: towards understanding test-time training in foundation models
Exploration and Exploitation Errors Are Measurable for Language Model Agents
A Mechanistic Analysis of Looped Reasoning Language Models
Sample Complexity of Autoregressive Reasoning: Chain-of-Thought vs. End-to-End
Why AI systems don’t learn and what to do about it
The Illusion of Learning from Observational Data: An Empirical Bayes Perspective
Ads in AI chatbots? An analysis of how large language models navigate conflicts of interest
Beyond Semantic Manipulation: Token-Space Attacks on Reward Models
LLM Evaluation as Tensor Completion: Low-Rank Efficiency and Uncertainty Quantification
Neural Computers
How AI Aggregation Affects Knowledge
World Action Verifier: Self-Improving World Models via Forward-Inverse Asymmetry
In-Place Test-Time Training
Test-Time Scaling Makes Overtraining Compute-Optimal
AI Agent Prevalence and Data Quality Across Multiple Online Sample Providers
POLCA: Stochastic Generative Optimization with LLM
Agentic Markets: Equilibrium Effects of Improving Consumer Search
One Model, Two Markets: Bid-Aware Generative Recommendation
How Well Do LLMs Predict Human Behavior? A Measure of their Pretrained Knowledge
Learning to Reason with Curriculum I: Provable Benefits of Autocurriculum
Agentic AI and the next intelligence explosion
Understanding Behavior Cloning with Action Quantization
HyperAgents: : Open-Ended Metacognitive Self-Improvement for Any Computable Task
Harness design for long-running application development \ Anthropic
Reasonably reasoning AI agents can avoid game-theoretic failures in zero-shot, provably
How Log-Barrier Helps Exploration in Policy Optimization
The Finetuner’s Fallacy: When to Pretrain with Your Finetuning Data
TURNWISE: The Gap between Single- and Multi-turn Language Model Capabilities
Temporal Straightening for Latent Planning
Fine-Tuning Strategies for Preserving In-Context Learning in Linear Attention
LLMs Can Learn to Reason Via Off-Policy RL
Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning
Provable and practical in-context policy optimization for self-improvement
Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models
Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights
AdaEvolve: Adaptive LLM Driven Zeroth-Order Optimization
∇−reasoner: LLM reasoning via test-time gradient descent in latent space
Inference for Regression with Variables Generated by AI or Machine Learning
Fast KV Compaction via Attention Matching
Position: stop anthropomorphizing intermediate tokens as reasoning/thinking traces!
Code World Models for General Game Playing
Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought
Task Descriptors Help Transformers Learn Linear Models In-Context
Equivalence of Context and Parameter Updates in Modern Transformer Blocks
Learning without training: The implicit dynamics of in-context learning
Causal Identification from Counterfactual Data: Completeness and Bounding Results
Is Cosine-Similarity of Embeddings Really About Similarity?
Diffusion LLMs are Natural Adversaries for any LLM
Are you going to finish that? A Practical Study of the Partial Token Problem
Language Models Struggle to Use Representations Learned In-Context
LLMs are Bayesian, In Expectation, Not in Realization
Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs
LLMs Can Learn to Reason Via Off-Policy RL
Test-Time Training with KV Binding Is Secretly Linear Attention
Unified Latents (UL): How to train your latents
Spectral Bellman Method: Unifying RL Representation and Exploration
Prescriptive Scaling Reveals the Evolution of Language Model Capabilities
Experiential Reinforcement Learning
Learning Personalized Agents from Human Feedback
Learning to summarize user information for personalized RLHF
Intrinsic Credit Assignment for Long Horizon Interaction
Learning to Continually Learn via Meta-learning Agentic Memory Designs
Why Self-Rewarding Works: Theoretical Guarantees for Iterative Alignment of Language Models
PAD: Personalized Alignment of LLMs at Decoding-Time
The Reward Model Selection Crisis in Personalized Alignment
Causal-JEPA: Learning World Models through Object-Level Latent Interventions
How Sampling Shapes LLM Alignment: From One-Shot Optima to Iterative Dynamics
Deriving neural scaling laws from the statistics of natural language
Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL
Scaling In-Context Online Learning Capability of LLMs via Cross-Episode Meta-RL
Divide-and-Conquer CoT: RL for Reducing Latency via Parallel Reasoning
Owning the AI Pareto Frontier — Jeff Dean
Learning to Reason in 13 Parameters
Nearly Optimal Active Preference Learning and Its Application to LLM Alignment
Language Model Circuits Are Sparse in the Neuron Basis
Rethinking the Trust Region in LLM Reinforcement Learning
Principled Fine-tuning of LLMs from User-Edits: A Medley of Preference, Supervision, and Reward
Self-distillation enables continual learning
Maximum Likelihood Reinforcement Learning
In-Context Algorithm Emulation in Fixed-Weight Transformers
PPI-SVRG: Unifying Prediction-Powered Inference and Variance Reduction for Semi-Supervised Optimization
When Models Don’t Collapse: On the Consistency of Iterative MLE
An orthogonal learner for individualized outcomes In markov decision processes
Shaping capabilities with token-level data filtering
Self-Improving Pretraining: using post-trained models to pretrain better models
Success Conditioning as Policy Improvement: The Optimization Problem Solved by Imitating Success
Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning
GameTalk: Training LLMs for Strategic Multi-Turn Conversation
Reinforcement Learning via Self-Distillation
Self-Supervised Contrastive Learning is Approximately Supervised Contrastive Learning
On the alignment between supervised and self-supervised contrastive learning
Rethinking the value of multi-agent work-flow: a strong single agent baseline
Greedy Sampling Is Provably Efficient for RLHF
A Generalization Theory for Zero-Shot Prediction
Learning to Discover at Test Time
How Does the Pretraining Distribution Shape In-Context Learning? Task Selection, Generalization, and Robustness
Highlighting What Matters: Promptable Embeddings for Attribute-Focused Retrieval
Activation Reward Models for Few-Shot Model Alignment
Reward is enough: LLMs are in-context reinforcement learners
Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO
The End of Reward Engineering: How LLMs Are Redefining Multi-Agent Coordination
PRL: Process Reward Learning Improves LLMs’ Reasoning Ability and Broadens the Reasoning Boundary
Coverage Improvement and Fast Convergence of On-policy Preference Learning
Stagewise Reinforcement Learning and the Geometry of the Regret Landscape
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
Learning Latent Action World Models In The Wild
From Unstructured Data to Demand Counterfactuals: Theory and Practice
In-context reinforcement learning through bayesian fusion of context and value prior
Digital RedQueen: Adversarial Program Evolution in Core War with LLMs
Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings
Representation-Based Exploration for Language Models: from test-time to post-training
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation
RelayLLM: Efficient Reasoning via Collaborative Decoding
A Unified Definition of Hallucination, Or: It’s the World Model, Stupid
Deep sequence models tend to memorize geometrically; it is unclear why.
From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence
Diffusion Language Models are Provably Optimal Parallel Samplers
Universal Reasoning Model
Recursive language models
Adapting fast and slow: transportable circuits for few shot learning
Position: Probabilistic Modelling is Sufficient for Causal Inference
End-to-End Test-Time Training for Long Context
Parallel Token Generation for Language Models
Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning
Activation oracles: training and evaluating llms as general-purpose activation explainers
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning
Joint-Embedding vs Reconstruction: Provable Benefits of Latent Space Prediction
Monitoring Monitorability/ OpenAI
Detailed Balance in Large Language Model-Driven Agents
Learning to reason in LLMs by expectation maximization
Exploratory Causal Inference in SAEnce
Detailed balance in large language model-driven agents
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding
Adaptation of Agentic AI
Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning
Let’s (not) just put things in Context: Test-Time Training for Long-Context LLMs
TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models
What’s In My Human Feedback? Learning Interpretable Descriptions of Preference Data
Bolmo: Byteifying the Next Generation of Language Models
What happened with sparse autoencoders?
What Matters Right Now in Mechanistic Interpretability
CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning
Self-Improving AI and Human Co-Improvement for Safer Co-Superintelligence
Towards a Science of Scaling Agent Systems / Google Deepmind
Emergent hierarchical reasoning in LLMs through reinforcement learning
AI revolution finally comes to Relational foundational models for structured data
REFRAG: Rethinking RAG based Decoding
Provable Long-Range Benefits of Next-Token Prediction
Jeff Dean on TPUs, AI Research, and Funding
Latent Debate: surrogate framework for Interpreting LLM Thinking
Distribution-calibrated inference time compute for thinking llm-as-a-judge
Principled RL for diffusion LLMs emerges from sequence level perspective
Algorithmic Thinking Theory
On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models
Natural language actor-critic: Scalable off-policy learning in language space
Beyond the Transformer: Titans, MIRAS, and the Future of Infinite Context
On the Limits of Test-Time Compute: Sequential Reward Filtering for Better Inference
The Universal Weight Subspace Hypothesis
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Benchmarking In-context Experiential Learning Through Repeated Product Recommendations
Training LLMs for Honesty via Confessions
STOIC REASONER: Dual-Mode Transformers that Compress to Think and Decompress to Speak
E-GEO: A Testbed for Generative Engine Optimization in E-Commerce
1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities
Treatment Effect Estimation for Optimal Decision-Making
Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems
Debugging misaligned completions with sparse-autoencoder latent attribution
Building Effective AI Agents \ Anthropic
How to Correctly Report LLM-as-a-Judge Evaluations
In-Context Learning with Hypothesis-Class Guidance
Selecting Belief-State Approximations in Simulators with Latent States
Latent Collaboration in Multi-Agent Systems
CausalPFN: Amortized Causal Effect Estimation via In-Context Learning
DELTA: How Does RL Unlock and Transfer New Algorithms in LLMs?
Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing
Prompted Policy Search: Reinforcement Learning through Linguistic and Numerical Reasoning in LLMs
Ilya Sutskever – We're moving from the age of scaling to the age of research
Cognitive Foundations for Reasoning and Their Manifestation in LLMs
Natural emergent misalignment from reward hacking in production RL
Evolution Strategies at the Hyperscale
The Path Not Taken: RLVR Provably Learns Off the Principals
Back to Basics: Let Denoising Generative Models Denoise
LLM Prompt Duel Optimizer: Efficient Label-Free Prompt Optimization
Black-Box On-Policy Distillation of Large Language Models
Solving a million step LLM task with zero errors
Not All Thoughts Matter: Selective Attention for Efficient Reasoning
Sample-Efficient Parametric Learning from Natural Language
Bayesian Optimization in Language space: An Eval-Efficient AI Self-Improvement Framework
Context Engineering: Sessions, Memory
The Era of Agentic Organization: Learning to Organize with Language Models
Understanding neural networks through sparse circuits
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning
Multi-Agent Evolve: LLM Self-Improvement Through Co-Evolution
LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics
PREFDISCO: Evaluating Proactive Personalization through Interactive Preference Discovery
Reusing pre-training data at test time is a compute multiplier
Scaling Agent Learning via Experience Synthesis
Continuous Autoregressive Language Models
Toward a Theory of Agents as Tool-Use Decision-Makers
Nested Learning: The Illusion of Deep Learning Architectures
GST-UNet: A Neural Framework for Spatiotemporal Causal Inference with Time-Varying Confounding
Beyond a million tokens: benchmarking and enhancing long-term memory in llms
Agentic Economic Modeling
Emergent Introspective Awareness in Large Language Models
Can Large reasoning models self-train?
ALITA-G: Self-Evolving Generative Agent for Agent Generation
Self-improving LLM agents at test-time
Offline RL by Reward-Weighted Fine-Tuning for Conversation Optimization
Language models are injective and hence invertible
ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory
RLAD: Training LLMs to Discover Abstractions
How to Train Your Advisor: Steering Black-Box LLMs with ADVISOR MODELS
Self-improving LLM agents at Test-Time
KL-Regularized Reinforcement Learning is designed to Mode Collapse
How do LLMs use their depth?
Thought Communication in Multiagent Collaboration
Reasoning with Sampling: Base Models Outperform RL
Continual Learning via Sparse Memory Finetuning
Direct Preference Optimization with Unobserved Preference Heterogeneity: The Necessity of Ternary Preferences
The Coverage Principle: How Pre-Training Enables Post-Training
The Era of Real-World Human Interaction: RL from User Conversations
Agent Learning via Early Experience
Demystifying the Mechanisms Behind Emergent Exploration in Goal-conditioned RL
Rewriting History: A Recipe for Interventional Analyses to Study Data Effects on Model Behavior
A Definition of AGI
Provably Learning from Language Feedback
In-Context Learning for Pure Exploration
On the Role of Preference Variance in Preference Optimization
Training LLM Agents to Empower Humans
Richard Sutton Declares LLMs a Dead End
Demystifying Reinforcement Learning in Agentic Reasoning
Emergent coordination in multi-agent language models
Learning-to-measure: in-context active feature acquisition
Andrej Karpathy's insights: AGI, Intelligence, and Evolution
Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data
Representation-Based Exploration for Language Models: From Test-Time to Post-Training
The attacker moves second: stronger adaptive attacks bypass defenses against LLM jail- Breaks and prompt injections
When can in-context learning generalize out of task distribution?
The Art of Scaling Reinforcement Learning Compute for LLMs
A small number of samples can poison LLMs of any size
Dual Goal Representations
Welcome to the Era of Experience
Value Flows: Flow-Based Distributional Reinforcement Learning
Self-Adapting Language Models
The Markovian Thinker
Moloch’s Bargain: emergent misalignment when LLMs compete for audiences
Transformer Predictor Dynamics and Task Diversity
Base models know how to reason, thinking models learn when
Spectrum tuning: Post-training for distributional coverage and in-context steerability
Understanding Prompt Tuning and In-Context Learning via Meta-Learning
MLPs Learn In-Context on Regression and Classification tasks
Is Pre-Training Truly Better than Meta-Learning?
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs
Learning dynamics of LLM finetuning
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
OpenAI Agent Builder and n8n: Orchestrating Reasoning Versus Automating Process
Training Agents Inside of Scalable World Models
Small Language Models are the Future of Agentic AI
Activation Steering in Generative Settings via Contrastive Causal Mediation Analysis
Eliciting Secret Knowledge from Language Models
Temporal difference flow
Personalized reasoning: just-in-time personalization and why LLMs fail at it
Prompt Curriculum Learning for Efficient LLM Post-Training
Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning
Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward
Learning to summarize user information for personalized reinforcement learning from human feedback
Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF
LIMI: Less is More for Agency
LoRA Without Regret
Actor-Critic without Actor: Critic-Guided Denoising for RL
DELTA-Code: How Does RL Unlock and Transfer New Programming Algorithms in LLMs?
Linear Transformers Implicitly Discover Unified Numerical Algorithms
Regularizing Extrapolation in Causal Inference
DoubleGen - Debiased Generative Modeling of Counterfactuals
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT
Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision
Learning without training: The implicit dynamics of in-context learning
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model
Open Problems in Mechanistic Interpretability
Maestro: Joint Graph & Config Optimization for Reliable AI Agents
Thought Anchors: Which LLM Reasoning Steps Matter?
RL's Razor: Why Online RL Forgets Less
Why Language Models Hallucinate
ALFA: Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning
Sample Efficient Preference Alignment in LLMs via Active Exploration
Adventures in Demand Analysis Using AI
Memento: Fine-tuning LLM Agents without Fine-tuning LLMs
On the Theoretical Limitations of Embedding-Based Retrieval
Performance Prediction for Large Systems via Text-to-Text Regression
Demystifying the Visual Quality Paradox in Multimodal Large Language Models
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
Compute-Optimal Scaling for Value-Based Deep RL
LLM-based Conversational Recommendation Agents with Collaborative Verbalized Experience
Signal and Noise: Evaluating Language Model Benchmarks
Breaking Feedback Loops in Recommender Systems with Causal Inference
RAG is Dead, Context Engineering is King: Building Reliable AI Systems
A Survey of Personalization: From RAG to Agent
Facilitating the Adoption of Causal Infer-ence Methods Through LLM-Empowered Co-Pilot
Performance Prediction for Large Systems via Text-to-Text Regression
Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning
DINOv3: Vision Models for Self-Supervised Learning
Agent Lightning: Training Any AI Agents with Reinforcement Learning
Computational-Statistical Tradeoffs at the Next-Token Prediction Barrier
From Model Weights to Agent Workflows: Charting the New Frontier of Optimization in Large Language Models
Is Chain-of-Thought Reasoning a Mirage?
Agentic Web: Weaving the Next Web with AI Agents
The Assimilation-Accommodation Gap in LLM Intelligence
The Minimalist AI Kernel: A New Frontier in Reasoning
Statistical Rigor for Interpretable AI
Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value
A foundation model to predict and capture human cognition
Generative Recommendation with Semantic IDs: A Practitioner’s Handbook
Hierarchical Reasoning Model
Test-time Offline Reinforcement Learning on Goal-related Experience
Interpreting Chain of Thought: A Walkthrough and Discussion
The wall confronting large language models
COLLABLLM: LLMs From Passive to Collaborative
A decade's battle on dataset bias: are we there yet?
GEPA: Generative Feedback for AI System Optimization
From AI-Curious to AI-First: Engineering Production AI Systems
Context Engineering: Beyond Simple Prompting to LLM Architecture
Agentic Misalignment: LLMs as Insider Threats
Small Language Models: Future of Agentic AI
Learning without training: The implicit dynamics of in-context learning
Inverse Scaling in Test-Time Compute
LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra
Microsoft's Blueprint: AI, Quantum, and the Agentic Future
Zuckerberg's AI Vision Analyzed
Inside Claude: Scaling, Agency, and Interpretability
Personalized language modeling from personalized human feedback
Position: Empowering Time Series Reasoning with Multimodal LLMs
An empirical risk minimization approach for offline inverse RL and Dynamic Discrete Choice models
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities
The Invisible Leash: Why RLVR May Not Escape Its Origin
Language Model Personalization via Reward Factorization
Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions
Do We Need to Verify Step by Step? Rethinking Process Supervision from a Theoretical Perspective
Soft Best-of-n Sampling for Model Alignment
On Temporal Credit Assignment and Data-Efficient Reinforcement Learning
Bradley–Terry and Multi-Objective Reward Modeling Are Complementary
Probing Foundation Models for World Models
GenAI-Powered Statistical Inference (with Unstructured Data)
Interpretable Reward Modeling with Active Concept Bottlenecks
PrefillOnly: An Inference Engine for Prefill-only Workloads in Large Language Model Applications
A Collectivist, Economic Perspective on AI
Textual Bayes: Quantifying Uncertainty in LLM-Based Systems
The Winner's Curse in Data-Driven Decisions
SPIRAL: Self-Play for Reasoning Through Zero-Sum Games
Beyond Statistical Learning: Exact Learning Is Essential for General Intelligence
Aligning Learning and Endogenous Decision-Making
Reliable Statistical Inference with Synthetic Data from Large Language Models
Multi-Turn Reinforcement Learning from Human Preference Feedback
Provably Learning from Language Feedback
Markets with Heterogeneous Agents: Dynamics and Survival of Bayesian vs. No-Regret Learners
Why Neural Network Can Discover Symbolic Structures with Gradient-based Training: An Algebraic and Geometric Foundation
Causal Abstraction with Lossy Representations
The Winner's Curse in Data-Driven Decisions
Embodied AI Agents: Modeling the World
Beyond Statistical Learning: Exact Learning Is Essential for General Intelligence
What Has a Foundation Model Found? Inductive Bias Reveals World Models
Language Bottleneck Models: A Framework for Interpretable Knowledge Tracing and Beyond
Learning to Explore: An In-Context Learning Approach for Pure Exploration
Human-AI Matching: The Limits of Algorithmic Search
Uncertainty Quantification Needs Reassessment for Large-language Model Agents
Bayesian Meta-Reasoning for Robust LLM Generalization
General Intelligence Requires Reward-based Pretraining
Deep Learning is Not So Mysterious or Different
AI Agents Need Authenticated Delegation
Probabilistic Modelling is Sufficient for Causal Inference
Not All Explanations for Deep Learning Phenomena Are Equally Valuable
e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs
Extrapolation by Association: Length Generalization Transfer in Transformers
Uncovering Causal Hierarchies in Language Model Capabilities
Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers
Improving Treatment Effect Estimation with LLM-Based Data Augmentation
LLM Numerical Prediction Without Auto-Regression
Why in-context learning models are good few-shot learners?
Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina∗
The Logic of Machines: The AI Reasoning Debate
Layer by Layer: Uncovering Hidden Representations in Language Models
Causal Attribution Analysis for Continuous Outcomes
Training a Generally Curious Agent
Estimation of Treatment Effects Under Nonstationarity via Truncated Difference-in-Q’s
Strategy Coopetition Explains the Emergence and Transience of In-Context Learning
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Agentic Supernet for Multi-agent Architecture Search
Sample Complexity and Representation Ability of Test-time Scaling Paradigms
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
LLMs Get Lost In Multi-Turn Conversation
PromptPex: Automatic Test Generation for Prompts
General Agents Need World Models
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models
Decisions With Algorithms
Adapting, fast and slow: Causal Approach to Few-Shot Sequence Learning
Conformal Arbitrage for LLM Objective Balancing
Simulation-Based Inference for Adaptive Experiments
Agents as Tool-Use Decision-Makers
Quantitative Judges for Large Language Models
Self-Challenging Language Model Agents
Learning to Explore: An In-Context Learning Approach for Pure Exploration
How Bidirectionality Helps Language Models Learn Better via Dynamic Bottleneck Estimation
A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models
Simplifying Bayesian Optimization Via In-Context Direct Optimum Sampling
Bayesian Teaching Enables Probabilistic Reasoning in Large Language Models
IPO: Interpretable Prompt Optimization for Vision-Language Models
Evolutionary Prompt Optimization discovers emergent multimodal reasoning strategies
Evaluating the Unseen Capabilities: How Many Theorems Do LLMs Know?
Diffusion Guidance Is a Controllable Policy Improvement Operator
Alita: Generalist Agent With Self-Evolution
A Snapshot of Influence: A Local Data Attribution Framework for Online Reinforcement Learning
Learning Compositional Functions with Transformers from Easy-to-Hard Data
Preference Learning with Response Time
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Algorithms for reliable decision-making need causal reasoning
Belief Attribution as Mental Explanation: The Role of Accuracy, Informativity, and Causality
Distances for Markov chains from sample streams
When and Why LLMs Fail to Reason Globally
IDA-Bench: Evaluating LLMs on Interactive Guided Data Analysis
No Free Lunch: Non-Asymptotic Analysis of Prediction-Powered Inference
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Statistical Inference for Online Algorithms
Prismatic Synthesis for Diverse LLM Reasoning Data
Position: Uncertainty Quantification Needs Reassessment for Large-language Model Agents
The Agentic Economy
Statistics for Large Language Models
Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search
Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning
Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL
Value-Guided Search for Efficient Chain-of-Thought Reasoning
Shallow Preference Signals: Large Language model aligns even better without truncated data?
Gaming Tool Preferences in Agentic LLMs
Partner Modelling Emerges in Recurrent Agents (But Only When It Matters)
LLM Populations Form Social Conventions and Collective Bias
LLM Generated Persona is a Promise with a Catch
Large Language Models for Digital Twin Simulation
From RL Distillation to Autonomous LLM Agents
Prompting, Auto-Prompting, and Human-AI Communication
Textual Gradients for LLM Optimization
Large Language Models as Markov Chains
Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation
Selective induction heads: how transformers select causal structures in context
The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains
How Transformers Learn Causal Structure with Gradient Descent
Planning anything with rigor: general-purpose zero-shot planning with llm-based formalized programming
Automated Design of Agentic Systems
What’s the Magic Word? A Control Theory of LLM Prompting
BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling
RL with KL penalties is better viewed as Bayesian inference
Asymptotics of Language Model Alignment
Qwen 2.5, RL, and Random Rewards
Theoretical guarantees on the best-of-n alignment policy
Score Matching Enables Causal Discovery of Nonlinear Additive Noise Models
Improved Techniques for Training Score-Based Generative Models
Your Pre-trained LLM is Secretly an Unsupervised Confidence Calibrator
AlphaEvolve: A coding agent for scientific and algorithmic discovery
Harnessing the Universal Geometry of Embeddings
Goal Inference using Reward-Producing Programs in a Novel Physics Environment
Trial-Error-Explain In-Context Learning for Personalized Text Generation
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Test-Time Reinforcement Learning (TTRL)
Interpreting Emergent Planning in Model-Free Reinforcement Learning
Agentic Reward Modeling_Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
Beyond Reward Hacking: Causal Rewards for Large LanguageModel Alignment
Learning How Hard to Think: Input-Adaptive Allocation of LM Computation
Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval
UFT: Unifying Supervised and Reinforcement Fine-Tuning
Understanding High-Dimensional Bayesian Optimization
Inference time alignment in continuous space
Efficient Test-Time Scaling via Self-Calibration
Conformal Prediction via Bayesian Quadrature
Predicting from Strings: Language Model Embeddings for Bayesian Optimization
Self-Evolving Curriculum for LLM Reasoning
Online Decision-Focused Learning in Dynamic Environments
FisherSFT: Data-Efficient Supervised Fine-Tuning of Language Models Using Information Gain
Reward Shaping from Confounded Offline Data
Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning
Understanding Best-of-N Language Model Alignment
Maximizing Acquisition Functions for Bayesian Optimization - and its relation to Gradient Descent
Bayesian Prompt Ensembles: Model Uncertainty Estimation for Black-Box Large Language Models
Prompting Strategies for Enabling Large Language Models to Infer Causation from Correlation
The Parallel Knowledge Gradient Method for Batch Bayesian Optimization
FunBO: Discovering Acquisition Functions for Bayesian Optimization with FunSearch
Automated Social Science: A Structural Causal Model-Based Approach
Causal Interpretation of Transformer Self-Attention
A Causal World Model Underlying Next Token Prediction: Exploring GPT in a Controlled Environment
Trace is the Next AutoDiff: Generative Optimization with Rich Feedback, Execution Traces, and LLMs
Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation
Prompts from Reinforcement Learning (PRL)
Logits are All We Need to Adapt Closed Models
Large Language Models Are (Bayesian) Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models
LLM In-Context Learning as Kernel Regression
Personalizing LLMs via Decode-Time Human Preference Optimization
Almost Surely Safe LLM Inference-Time Alignment
Survey of In-Context Learning Interpretation and Analysis
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models
LLM In-Context Learning as Kernel Regression
Where does In-context Learning Happen in Large Language Models?
Auto-Differentiating Any LLM Workflow: A Farewell to Manual Prompting
metaTextGrad: Learning to learn with language models as optimizers
Semantic Operators: A Declarative Model for Rich, AI-based Data Processing
Isolated Causal Effects of Language
Sleep-time Compute: Beyond Inference Scaling at Test-time
J1: Incentivizing Thinking in LLM-as-a-Judge
ShiQ: Bringing back Bellman to LLMs
Policy Learning with a Natural Language Action Space: A Causal Approach
Multi-Objective Preference Optimization: Improving Human Alignment of Generative Models
End-to-End Learning for Stochastic Optimization: A Bayesian Perspective
TEXTGRAD: Automatic Differentiation via Text
Steering off Course: Reliability Challenges in Steering Language Models
Past-Token Prediction for Long-Context Robot Policies
Recovering Coherent Event Probabilities from LLM Embeddings
Systematic Meta-Abilities Alignment in Large Reasoning Models
Predictability Shapes Adaptation: An Evolutionary Perspective on Modes of Learning in Transformers
Efficient Exploration for LLMs
Rankers, Judges, and Assistants: Towards Understanding the Interplay of LLMs in Information Retrieval Evaluation
Bayesian Concept Bottlenecks with LLM Priors
Transformers for In-Context Reinforcement Learning
Evaluating Large Language Models Across the Lifecycle
Active Ranking from Human Feedback with DopeWolfe
Optimal Designs for Preference Elicitation
Dual Active Learning for Reinforcement Learning from Human Feedback
Active Learning for Direct Preference Optimization
Active Preference Optimization for RLHF
Test-Time Alignment of Diffusion Models without reward over-optimization
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment
Advantage-Weighted Regression: Simple and Scalable Off-Policy RL
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Transformers can be used for in-context linear regression in the presence of endogeneity
Bayesian Concept Bottlenecks with LLM Priors
In-Context Parametric Inference: Point or Distribution Estimators?
Enough Coin Flips Can Make LLMs Act Bayesian
Bayesian Scaling Laws for In-Context Learning
Posterior Mean Matching Generative Modeling
Can Generative AI Solve Your In-Context Learning Problem? A Martingale Perspective
Dynamic Search for Inference-Time Alignment in Diffusion Models
Is In-Context Learning in Large Language Models Bayesian? A Martingale Perspective
Leaked Claude Sonnet 3.7 System Instruction tuning
Converging Predictions with Shared Information
Test-Time Alignment Via Hypothesis Reweighting
Rethinking Diverse Human Preference Learning through Principal Component Analysis
Active Statistical Inference
Data Mixture Optimization: A Multi-fidelity Multi-scale Bayesian Framework
AI-Powered Bayesian Inference
Can Unconfident LLM Annotations Be Used for Confident Conclusions?
Predictions as Surrogates: Revisiting Surrogate Outcomes in the Age of AI
Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control
How to Evaluate Reward Models for RLHF
LLMs as Judges: Survey of Evaluation Methods
The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs
Limits to scalable evaluation at the frontier: LLM as Judge won’t beat twice the data
Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation
Accelerating Unbiased LLM Evaluation via Synthetic Feedback
Prediction-Powered Statistical Inference Framework
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
RM-R1: Reward Modeling as Reasoning
Reexamining the Aleatoric and Epistemic Uncertainty Dichotomy
Decoding Claude Code: Terminal Agent for Developers
Emergent Strategic AI Equilibrium from Pre-trained Reasoning
Benefiting from Proprietary Data with Siloed Training
Advantage Alignment Algorithms
Asymptotic Safety Guarantees Based On Scalable Oversight
What Makes a Reward Model a Good Teacher? An Optimization Perspective
Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Identifiable Steering via Sparse Autoencoding of Multi-Concept Shifts
You Are What You Eat - AI Alignment Requires Understanding How Data Shapes Structure and Generalisation
Interplay of LLMs in Information Retrieval Evaluation
Trade-Offs Between Tasks Induced by Capacity Constraints Bound the Scope of Intelligence
Toward Efficient Exploration by Large Language Model Agents
Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT
Self-Consuming Generative Models with Curated Data
Bootstrapping Language Models with DPO Implicit Rewards
DeepSeek-Prover-V2: Advancing Formal Reasoning
THINKPRM: Data-Efficient Process Reward Models
Societal Frameworks and LLM Alignment
Risks from Multi-Agent Advanced AI
Causality-Aware Alignment for Large Language Model Debiasing
Reward Models Evaluate Consistency, Not Causality
Causal Rewards for Large Language Model Alignment
Sycophancy to subterfuge: Investigating reward-tampering in large language models
Bidirectional AI Alignment
Why Do Multi-Agent LLM Systems Fail?
LLMs as Greedy Agents: RL Fine-tuning for Decision-Making
LLM Feedback Loops and the Lock-in Hypothesis
Representational Alignment Drives Effective Teaching and Learning
Adaptive Parallel Reasoning with Language Models
AI: Rewiring the Flow of Ideas and Human Knowledge
Learning and Equilibrium with Ranking Feedback
Designing Human-AI Collaboration: A Sufficient-Statistic Approach
GOAT: Generative Adversarial Training for Human-AI Coordination
π0.5: Generalization in Robotic Manipulation via Diverse Data
NoWag: Unified Compression for Large Language Models
Optimal Tool Calls in Language Model Reasoning
Data Selection for Empirical Risk Minimization
LoRe: Low-Rank Reward Modeling for Personalized LLMs
ParaPO: Reducing Language Model Verbatim Reproduction
Test-Time RL: Self-Evolving LLMs via Majority Voting Rewards
Tina: Tiny LoRA Reasoning Models
Evaluating large language models in theory of mind tasks
QUEST: Quality Sampling for Machine Translation
Offline Preference Learning via Simulated Trajectory Feedback
Reasoning Elicitation in Language Models via Counterfactual Feedback
Eliciting Human Preferences with Language Models
Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning
γ-Bench: Evaluating LLMs in Multi-Agent Games
DRAFT: Self-Driven LLM Tool Mastery via Documentation Refinement
Optimal Prediction Sets for Enhanced Human-AI Accuracy
Self-Correction via Reinforcement Learning for Language Models
Tractable Multi-Agent Reinforcement Learning through Behavioral Economics
Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement
Iterative Nash Policy Optimization for Language Model Alignment
SycEval: Benchmarking LLM Sycophancy in Mathematics and Medicine
Stack AI: Democratizing Enterprise AI Development
Evaluating Modern Recommender Systems: Challenges and Future Directions
AI in the Enterprise: Seven Lessons from Frontier Companies by OpenAI
Discussion: Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
AI Agent Protocols and Human Preference
Cross-Environment Cooperation for Zero-Shot Multi-Agent Coordination
Sutton and Silver: The Era of Experience: Learning Beyond Human Data
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models
AI Agents: Echoes of Past Technology Pivots?
Minimalist LLM Reasoning: Rejection Sampling to Reinforcement
Securing the Model Context Protocol in Enterprise Environments
Improving Multi-Turn Tool Use with Reinforcement Learning
Cultural Knowledge Conservation and Control in Large Language Models
Data Quality, Repetition, and Scaling of Language Models
Compute-Optimal Scaling Laws for Language Models Revisited
Concise Reasoning via Reinforcement Learning
Throughput Limits for LLM Inference and AI Agent Scheduling
RL Post-training Amplifies Pretraining Behaviors in Language Models
Fast Adaptation of Behavioral Foundation Models
Proprietary Reward Models: Sustaining Advantage in Agentic AI
Why Multi-Agent LLM Systems Fail: A Comprehensive Study
Play2Prompt: Zero-Shot Tool Instruction Optimization via Tool Play
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
API and GUI Agents: Divergence, Convergence, and Hybrid Approaches
AI, Chess, and Competitive Advantage: Substitution and Complementation
Knowledge of the Firm and Replication of Technology
Firm Resources and Sustained Competitive Advantage
Evaluating Pharmaceutical Marketing to Physicians with Panel Data
Theory of the firm in the era of Agents
Large Language Models: An Applied Econometric Framework
Evaluating the World Model Implicit in a Generative Model
Machine Learning for Hypothesis Generation in Social Science
Active Learning for Moral Preference Elicitation: Challenges and Nuances
Gradient-Based Surveys for Nonparametric Discrete Choice Experiments
Explainable Data-driven Share-of-choice Product Line Design Optimization
The More You Ask, the Less You Get: When Additional Questions Hurt External Validity
Conjoint topics from Handbook of Marketing Analytics: Methods and Applications
Choice-Based Conjoint Analysis: Methods and Applications
Beyond Conjoint Analysis: The Future of Preference Measurement
An Optimization Framework for Adaptive Questionnaire Design
Adaptive Self-Explication of Multiattribute Preferences
Conjoint Analysis: Methods, Applications, and Recent Developments
Current Issues and a “Wish List” for Conjoint Analysis
Ellipsoidal Methods for Adaptive Choice-Based Conjoint Analysis
Adaptive Polyhedral Methods for Conjoint Analysis
MSL: Enhancing LLM Recommenders via Masked Softmax Loss
Self-Supervised Deep Reinforcement Learning for Optimal Question Ranking
Adaptive Language Elicitation for Latent Information Discovery
LLM Persona Bias: Promise and Peril in Simulation
AutoTools: Automating Tool Use for Large Language Models
Tool Learning with Large Language Models: A Comprehensive Survey
All Roads Lead to Likelihood: RL for Fine-Tuning Value
ATLAS: Tuning Agents via Critical Step Learning
Thinking Faster by Writing Less: Chain of Draft Reasoning
Meta Plan Optimization for Boosting LLM Agents
L1: Length Controlled Reasoning with Reinforcement Learning
WikiBigEdit: Benchmarking Lifelong Knowledge Editing in LLMs
PLAN-AND-ACT: LLM Agent Planning with Synthetic Data
SEARCH-R1: LLMs Learn to Reason and Search via Reinforcement Learning
The Theory of the Firm: Information, Incentives, and Organization
Four Formalizable Theories of the Firm
Efficient Tool Use with Chain-of-Abstraction Reasoning
CodeTool: Process Supervision for Enhanced LLM Tool Invocation
Evaluating LLM Agents in Multi-Turn Conversations: A Survey
Epistemic Alignment in User-LLM Knowledge Delivery
MCP is (not) all you need
AI, Human Skills, and Competitive Advantage in Chess
Inference-Time Scaling for Generalist Reward Modeling
Optimal Pure Exploration in Linear Bandits via Sampling
Presidential Address: The Economist as Designer in the Innovation Process for Socially Impactful Digital Products
Emergent Symbolic Mechanisms for Reasoning in Large Language Models
Inference-Time Alignment: Coverage, Scaling, and Optimality
Sharpe Ratio-Guided Active Learning for Preference Optimization
Active Learning for Adaptive In-Context Prompt Design
Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
On the Biology of a Large Language Model
Async-TB: Asynchronous Trajectory Balance for Scalable LLM RL
Instacart's Economics Team: A Hybrid Role in Tech
Data Mixture Optimization: A Multi-fidelity Multi-scale Bayesian Framework
Why MCP won
SWEET-RL: Training LLM Agents for Collaborative Reasoning
TheoryCoder: Bilevel Planning with Synthesized World Models
Driving Forces in AI: Scaling to 2025 and Beyond (Jason Wei, OpenAI)
Expert Demonstrations for Sequential Decision Making under Heterogeneity
TextGrad: Backpropagating Language Model Feedback for Generative AI Optimization
MemReasoner: Generalizing Language Models on Reasoning-in-a-Haystack Tasks
RAFT: In-Domain Retrieval-Augmented Fine-Tuning for Language Models
Inductive Biases for Exchangeable Sequence Modeling
InverseRLignment: LLM Alignment via Inverse Reinforcement Learning
Prompt-OIRL: Offline Inverse RL for Query-Dependent Prompting
Alignment from Demonstrations for Large Language Models
Q♯: Distributional RL for Optimal LLM Post-Training
Scaling Test-Time Compute Without Verification or RL is Suboptimal
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Revisiting Superficial Alignment Hypothesis
Diagnostic uncertainty: teaching language Models to describe open-ended uncertainty
Language Model Personalization via Reward Factorization
How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach
Can Large Language Models Extract Customer Needs as well as Professional Analysts?
Spurlens: finding spurious correlations in Multimodal llms
Improving test-time search with backtrack- Ing Improving test-time search with backtrack- Ing against in-context value verifiersagainst in-context value verifiers
Adaptive elicitation of latent information Using natural language
Document Valuation in LLM Summaries: A Cluster Shapley Approach
s1: simple test time scaling