PodParley - Discover, Search, and Explore Podcasts

1

Position: Interpretability can be actionable

Jul 17, 2026

24:42

2

High-accuracy sampling for diffusion models and log-concave distributions

Jul 17, 2026

22:29

3

Causal Inference with Video Features as Treatments

Jul 15, 2026

22:13

4

What Does Thompson Sampling Optimize?

Jul 15, 2026

22:18

5

Globally Convergent Offline Reinforcement Learning with Smoothed Bellman Residual Minimization

Jul 13, 2026

12:25

6

LLM-as-a-Verifier: A General-Purpose Verification Framework

Jul 10, 2026

20:02

7

How Much Do Language Models Memorize?

Jul 9, 2026

23:52

8

Position: Uncertainty Quantification in LLMs is Just Unsupervised Clustering

Jul 7, 2026

21:47

9

Position: Agents Should Invoke External Tools ONLY When Epistemically Necessary

Jul 6, 2026

12:08

10

From conversations to mechanisms: aligning advertiser Incentives in ai-powered product recommendations

Jul 5, 2026

22:02

11

Is one layer enough? Training a single transformer layer can match full-parameter RL training

Jul 4, 2026

23:05

12

RL Excursions during Pre-Training: Re-examining Policy Optimization for LLM training

Jul 2, 2026

21:47

13

Language Generation with Feedback: Queries and Mistakes

Jul 1, 2026

20:07

14

Quantifying Theoretical AI Alignment Guarantees: Receiver-Utility Bounds in Bayesian Persuasion

Jul 1, 2026

22:18

15

SPIRAL: Learning to search and aggregate

Jun 29, 2026

22:15

16

Qwen-AgentWorld: Language World Models for General Agents

Jun 27, 2026

20:44

17

When Does Trajectory-Level Supervision Permit Efficient Offline Reinforcement Learning?

Jun 27, 2026

18:56

18

SuperThoughts: Reasoning Tokens in Superposition

Jun 26, 2026

19:00

19

First-Explore PPO : Learning Meta-Exploration with Proximal Policy Optimization

Jun 25, 2026

22:35

20

Self-Distillation for Data-Scarce Language Model Pretraining

Jun 24, 2026

21:45

21

Meta-Harness for Agent-State Construction

Jun 21, 2026

23:02

22

ExpRL: Using Reference Solutions as Rewards for LLM Mid-Training

Jun 21, 2026

21:03

23

Valid Inference with Synthetic Data via Task Exchangeability

Jun 18, 2026

13:08

24

GRPO is Secretly a Process Reward Model

Jun 17, 2026

20:33

25

Agentic Interactions

Jun 17, 2026

19:05

26

A Unifying View of Attention Sinks: Two Algorithms, Two Solutions

Jun 16, 2026

22:35

27

From AGI to ASI

Jun 14, 2026

23:37

28

Correct Looks Better: Pairwise Comparisons Reveal Accuracy Rankings

Jun 13, 2026

19:52

29

Critical Batch Size for LLM Policy Optimization

Jun 11, 2026

18:41

30

Self-supervised User Profile Generation for Personalization

Jun 9, 2026

22:05

31

From Augmentation to Reconstruction: Guiding the AI Disruption to the Good Place

Jun 7, 2026

22:10

32

Self-Distilled Agentic Reinforcement Learning

Jun 7, 2026

22:14

33

Subliminal Learning Is Steering Vector Distillation

Jun 5, 2026

23:29

34

Subsidizing Sequential Search

Jun 5, 2026

20:20

35

Meta-Harness: End-to-End Optimization of Model Harnesses

Jun 2, 2026

17:39

36

Self-Improving Language Models with Bidirectional Evolutionary Search

Jun 1, 2026

20:59

37

Generative Modeling via Drifting

May 31, 2026

21:53

38

Instance-Optimal Estimation with Multiple LLM Judges on a Budget

May 31, 2026

21:23

39

Robust AI Personalization Will Require a Human Context Protocol

May 29, 2026

22:59

40

Equilibrium Reasoners: Learning Attractors Enables Scalable Reasoning

May 27, 2026

17:42

41

Position: The Pre/Post-Training Boundary Should Govern IP in Industry–Academia ML Collaborations

May 25, 2026

12:57

42

MEMO: Memory as a Model

May 24, 2026

17:49

43

Agent Bazaar: Enabling Economic Alignment in Multi-Agent Marketplaces

May 23, 2026

23:03

44

General Preference Reinforcement Learning

May 23, 2026

21:50

45

Explaining and Preventing Alignment Collapse in Iterative RLHF

May 21, 2026

20:47

46

Curriculum Learning-Guided Progressive Distillation in Large Language Models

May 19, 2026

16:10

47

Think Twice, Act Once: Verifier-Guided Action Selection For Embodied Agents

May 19, 2026

25:30

48

How Much Should a Conversational Recommender System Converse?

May 17, 2026

21:41

49

FUSE: Ensembling Verifiers with Zero Labeled Data

May 14, 2026

20:15

50

EVOLM: Self-Evolving Language Models through Co-Evolved Discriminative Rubrics

May 14, 2026

23:10

51

Personalized Alignment Revisited: The Necessity and Sufficiency of User Diversity

May 12, 2026

22:13

52

OGPO: Sample Efficient Full-Finetuning of Generative Control Policies

May 11, 2026

22:02

53

Adaptive Querying with AI Persona Priors

May 9, 2026

22:30

54

Rethinking the Role of LLMs in Time Series Forecasting

May 8, 2026

21:48

55

Robust Representation Learning through Explicit Environment Modeling

May 7, 2026

23:01

56

Magentic Marketplace: An Open-Source Environment for studying Agentic Markets

May 5, 2026

22:09

57

Hyperloop Transformers

May 5, 2026

22:01

58

Scaling Self-Play with Self-Guidance

May 4, 2026

20:12

59

RL Token: Bootstrapping Online RL with Vision-Language-Action Models

May 3, 2026

22:26

60

Agentic Data Environments

May 3, 2026

24:42

61

AI organizations are more effective but less aligned than individual agents

May 1, 2026

20:09

62

Text-to-Distribution Prediction with Quantile Tokens and Neighbor Context

Apr 28, 2026

22:41

63

Distortion of AI alignment revisited: RLHF is a decent utilitarian aligner

Apr 27, 2026

17:53

64

Llms get lost in multi-turn conversation

Apr 25, 2026

21:23

65

Transformers are inherently succint

Apr 23, 2026

20:43

66

The Coasean Singularity? Demand, Supply, and Market Design with AI Agents

Apr 23, 2026

21:34

67

Demystifying the unreasonable effectiveness of online alignment methods

Apr 21, 2026

18:28

68

Specialization after generalization: towards understanding test-time training in foundation models

Apr 21, 2026

22:04

69

Exploration and Exploitation Errors Are Measurable for Language Model Agents

Apr 20, 2026

23:05

70

A Mechanistic Analysis of Looped Reasoning Language Models

Apr 19, 2026

18:37

71

Sample Complexity of Autoregressive Reasoning: Chain-of-Thought vs. End-to-End

Apr 19, 2026

19:03

72

Why AI systems don’t learn and what to do about it

Apr 17, 2026

21:28

73

The Illusion of Learning from Observational Data: An Empirical Bayes Perspective

Apr 17, 2026

22:23

74

Ads in AI chatbots? An analysis of how large language models navigate conflicts of interest

Apr 17, 2026

21:44

75

Beyond Semantic Manipulation: Token-Space Attacks on Reward Models

Apr 13, 2026

17:39

76

LLM Evaluation as Tensor Completion: Low-Rank Efficiency and Uncertainty Quantification

Apr 12, 2026

18:47

77

Neural Computers

Apr 11, 2026

12:45

78

How AI Aggregation Affects Knowledge

Apr 11, 2026

22:53

79

World Action Verifier: Self-Improving World Models via Forward-Inverse Asymmetry

Apr 10, 2026

20:26

80

In-Place Test-Time Training

Apr 9, 2026

20:12

81

Test-Time Scaling Makes Overtraining Compute-Optimal

Apr 7, 2026

21:29

82

AI Agent Prevalence and Data Quality Across Multiple Online Sample Providers

Apr 7, 2026

21:43

83

POLCA: Stochastic Generative Optimization with LLM

Apr 4, 2026

19:27

84

Agentic Markets: Equilibrium Effects of Improving Consumer Search

Apr 4, 2026

21:34

85

One Model, Two Markets: Bid-Aware Generative Recommendation

Apr 1, 2026

21:27

86

How Well Do LLMs Predict Human Behavior? A Measure of their Pretrained Knowledge

Apr 1, 2026

21:38

87

Learning to Reason with Curriculum I: Provable Benefits of Autocurriculum

Apr 1, 2026

21:22

88

Agentic AI and the next intelligence explosion

Mar 30, 2026

22:43

89

Understanding Behavior Cloning with Action Quantization

Mar 29, 2026

21:24

90

HyperAgents: : Open-Ended Metacognitive Self-Improvement for Any Computable Task

Mar 27, 2026

21:48

91

Harness design for long-running application development \ Anthropic

Mar 26, 2026

21:23

92

Reasonably reasoning AI agents can avoid game-theoretic failures in zero-shot, provably

Mar 24, 2026

20:23

93

How Log-Barrier Helps Exploration in Policy Optimization

Mar 22, 2026

21:05

94

The Finetuner’s Fallacy: When to Pretrain with Your Finetuning Data

Mar 22, 2026

18:24

95

TURNWISE: The Gap between Single- and Multi-turn Language Model Capabilities

Mar 22, 2026

11:15

96

Temporal Straightening for Latent Planning

Mar 20, 2026

21:19

97

Fine-Tuning Strategies for Preserving In-Context Learning in Linear Attention

Mar 19, 2026

18:53

98

LLMs Can Learn to Reason Via Off-Policy RL

Mar 19, 2026

19:48

99

Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning

Mar 17, 2026

23:42

100

Provable and practical in-context policy optimization for self-improvement

Mar 17, 2026

21:27

101

Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models

Mar 16, 2026

23:25

102

Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights

Mar 14, 2026

20:25

103

AdaEvolve: Adaptive LLM Driven Zeroth-Order Optimization

Mar 14, 2026

20:13

104

∇−reasoner: LLM reasoning via test-time gradient descent in latent space

Mar 14, 2026

21:16

105

Inference for Regression with Variables Generated by AI or Machine Learning

Mar 12, 2026

21:55

106

Fast KV Compaction via Attention Matching

Mar 12, 2026

23:27

107

Position: stop anthropomorphizing intermediate tokens as reasoning/thinking traces!

Mar 11, 2026

18:42

108

Code World Models for General Game Playing

Mar 8, 2026

21:42

109

Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought

Mar 7, 2026

17:00

110

Task Descriptors Help Transformers Learn Linear Models In-Context

Mar 7, 2026

18:59

111

Equivalence of Context and Parameter Updates in Modern Transformer Blocks

Mar 7, 2026

21:03

112

Learning without training: The implicit dynamics of in-context learning

Mar 7, 2026

23:51

113

Causal Identification from Counterfactual Data: Completeness and Bounding Results

Mar 7, 2026

19:44

114

Is Cosine-Similarity of Embeddings Really About Similarity?

Mar 6, 2026

21:31

115

Diffusion LLMs are Natural Adversaries for any LLM

Mar 5, 2026

24:37

116

Are you going to finish that? A Practical Study of the Partial Token Problem

Mar 4, 2026

18:52

117

Language Models Struggle to Use Representations Learned In-Context

Mar 2, 2026

18:36

118

LLMs are Bayesian, In Expectation, Not in Realization

Mar 1, 2026

19:12

119

Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs

Feb 27, 2026

17:54

120

LLMs Can Learn to Reason Via Off-Policy RL

Feb 27, 2026

20:14

121

Test-Time Training with KV Binding Is Secretly Linear Attention

Feb 27, 2026

17:40

122

Unified Latents (UL): How to train your latents

Feb 26, 2026

20:17

123

Spectral Bellman Method: Unifying RL Representation and Exploration

Feb 25, 2026

20:30

124

Prescriptive Scaling Reveals the Evolution of Language Model Capabilities

Feb 24, 2026

17:32

125

Experiential Reinforcement Learning

Feb 23, 2026

23:16

126

Learning Personalized Agents from Human Feedback

Feb 21, 2026

14:48

127

Learning to summarize user information for personalized RLHF

Feb 20, 2026

18:28

128

Intrinsic Credit Assignment for Long Horizon Interaction

Feb 20, 2026

17:32

129

Learning to Continually Learn via Meta-learning Agentic Memory Designs

Feb 20, 2026

20:19

130

Why Self-Rewarding Works: Theoretical Guarantees for Iterative Alignment of Language Models

Feb 19, 2026

18:07

131

PAD: Personalized Alignment of LLMs at Decoding-Time

Feb 19, 2026

14:14

132

The Reward Model Selection Crisis in Personalized Alignment

Feb 19, 2026

16:27

133

Causal-JEPA: Learning World Models through Object-Level Latent Interventions

Feb 18, 2026

15:25

134

How Sampling Shapes LLM Alignment: From One-Shot Optima to Iterative Dynamics

Feb 17, 2026

16:06

135

Deriving neural scaling laws from the statistics of natural language

Feb 15, 2026

18:30

136

Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL

Feb 15, 2026

15:20

137

Scaling In-Context Online Learning Capability of LLMs via Cross-Episode Meta-RL

Feb 14, 2026

15:23

138

Divide-and-Conquer CoT: RL for Reducing Latency via Parallel Reasoning

Feb 12, 2026

15:52

139

Owning the AI Pareto Frontier — Jeff Dean

Feb 12, 2026

15:41

140

Learning to Reason in 13 Parameters

Feb 11, 2026

18:30

141

Nearly Optimal Active Preference Learning and Its Application to LLM Alignment

Feb 8, 2026

16:37

142

Language Model Circuits Are Sparse in the Neuron Basis

Feb 8, 2026

15:30

143

Rethinking the Trust Region in LLM Reinforcement Learning

Feb 8, 2026

15:55

144

Principled Fine-tuning of LLMs from User-Edits: A Medley of Preference, Supervision, and Reward

Feb 8, 2026

13:33

145

Self-distillation enables continual learning

Feb 7, 2026

20:19

146

Maximum Likelihood Reinforcement Learning

Feb 6, 2026

15:46

147

In-Context Algorithm Emulation in Fixed-Weight Transformers

Feb 5, 2026

16:52

148

PPI-SVRG: Unifying Prediction-Powered Inference and Variance Reduction for Semi-Supervised Optimization

Feb 5, 2026

16:59

149

When Models Don’t Collapse: On the Consistency of Iterative MLE

Feb 3, 2026

16:13

150

An orthogonal learner for individualized outcomes In markov decision processes

Feb 3, 2026

17:35

151

Shaping capabilities with token-level data filtering

Feb 1, 2026

12:24

152

Self-Improving Pretraining: using post-trained models to pretrain better models

Feb 1, 2026

15:24

153

Success Conditioning as Policy Improvement: The Optimization Problem Solved by Imitating Success

Jan 31, 2026

19:49

154

Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning

Jan 31, 2026

17:37

155

GameTalk: Training LLMs for Strategic Multi-Turn Conversation

Jan 30, 2026

15:59

156

Reinforcement Learning via Self-Distillation

Jan 30, 2026

14:15

157

Self-Supervised Contrastive Learning is Approximately Supervised Contrastive Learning

Jan 28, 2026

14:47

158

On the alignment between supervised and self-supervised contrastive learning

Jan 28, 2026

16:07

159

Rethinking the value of multi-agent work-flow: a strong single agent baseline

Jan 24, 2026

17:14

160

Greedy Sampling Is Provably Efficient for RLHF

Jan 24, 2026

13:06

161

A Generalization Theory for Zero-Shot Prediction

Jan 24, 2026

15:16

162

Learning to Discover at Test Time

Jan 23, 2026

16:24

163

How Does the Pretraining Distribution Shape In-Context Learning? Task Selection, Generalization, and Robustness

Jan 23, 2026

18:38

164

Highlighting What Matters: Promptable Embeddings for Attribute-Focused Retrieval

Jan 20, 2026

13:36

165

Activation Reward Models for Few-Shot Model Alignment

Jan 20, 2026

16:12

166

Reward is enough: LLMs are in-context reinforcement learners

Jan 19, 2026

10:50

167

Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO

Jan 19, 2026

14:26

168

The End of Reward Engineering: How LLMs Are Redefining Multi-Agent Coordination

Jan 18, 2026

17:55

169

PRL: Process Reward Learning Improves LLMs’ Reasoning Ability and Broadens the Reasoning Boundary

Jan 18, 2026

15:48

170

Coverage Improvement and Fast Convergence of On-policy Preference Learning

Jan 17, 2026

14:48

171

Stagewise Reinforcement Learning and the Geometry of the Regret Landscape

Jan 16, 2026

13:04

172

Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

Jan 16, 2026

14:18

173

Learning Latent Action World Models In The Wild

Jan 16, 2026

14:15

174

From Unstructured Data to Demand Counterfactuals: Theory and Practice

Jan 14, 2026

13:30

175

In-context reinforcement learning through bayesian fusion of context and value prior

Jan 14, 2026

12:09

176

Digital RedQueen: Adversarial Program Evolution in Core War with LLMs

Jan 14, 2026

13:53

177

Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings

Jan 13, 2026

11:30

178

Representation-Based Exploration for Language Models: from test-time to post-training

Jan 12, 2026

13:30

179

NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation

Jan 10, 2026

14:52

180

RelayLLM: Efficient Reasoning via Collaborative Decoding

Jan 10, 2026

13:09

181

A Unified Definition of Hallucination, Or: It’s the World Model, Stupid

Jan 8, 2026

12:25

182

Deep sequence models tend to memorize geometrically; it is unclear why.

Jan 8, 2026

13:27

183

From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence

Jan 8, 2026

14:12

184

Diffusion Language Models are Provably Optimal Parallel Samplers

Jan 7, 2026

12:00

185

Universal Reasoning Model

Jan 6, 2026

14:16

186

Recursive language models

Jan 6, 2026

15:37

187

Adapting fast and slow: transportable circuits for few shot learning

Jan 4, 2026

15:25

188

Position: Probabilistic Modelling is Sufficient for Causal Inference

Jan 3, 2026

12:27

189

End-to-End Test-Time Training for Long Context

Jan 3, 2026

13:52

190

Parallel Token Generation for Language Models

Jan 2, 2026

15:39

191

Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning

Dec 31, 2025

15:59

192

Activation oracles: training and evaluating llms as general-purpose activation explainers

Dec 30, 2025

15:18

193

Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

Dec 29, 2025

13:41

194

Joint-Embedding vs Reconstruction: Provable Benefits of Latent Space Prediction

Dec 29, 2025

14:17

195

Monitoring Monitorability/ OpenAI

Dec 28, 2025

14:03

196

Detailed Balance in Large Language Model-Driven Agents

Dec 28, 2025

11:49

197

Learning to reason in LLMs by expectation maximization

Dec 28, 2025

13:53

198

Exploratory Causal Inference in SAEnce

Dec 25, 2025

15:13

199

Detailed balance in large language model-driven agents

Dec 24, 2025

11:49

200

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

Dec 24, 2025

16:11

201

Adaptation of Agentic AI

Dec 23, 2025

13:20

202

Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning

Dec 22, 2025

10:30

203

Let’s (not) just put things in Context: Test-Time Training for Long-Context LLMs

Dec 21, 2025

13:45

204

TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models

Dec 20, 2025

14:30

205

What’s In My Human Feedback? Learning Interpretable Descriptions of Preference Data

Dec 19, 2025

16:14

206

Bolmo: Byteifying the Next Generation of Language Models

Dec 19, 2025

13:13

207

What happened with sparse autoencoders?

Dec 17, 2025

30:09

208

What Matters Right Now in Mechanistic Interpretability

Dec 16, 2025

32:30

209

CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning

Dec 16, 2025

14:45

210

Self-Improving AI and Human Co-Improvement for Safer Co-Superintelligence

Dec 16, 2025

13:13

211

Towards a Science of Scaling Agent Systems / Google Deepmind

Dec 15, 2025

15:46

212

Emergent hierarchical reasoning in LLMs through reinforcement learning

Dec 14, 2025

13:07

213

AI revolution finally comes to Relational foundational models for structured data

Dec 13, 2025

14:39

214

REFRAG: Rethinking RAG based Decoding

Dec 13, 2025

13:48

215

Provable Long-Range Benefits of Next-Token Prediction

Dec 12, 2025

12:03

216

Jeff Dean on TPUs, AI Research, and Funding

Dec 12, 2025

38:17

217

Latent Debate: surrogate framework for Interpreting LLM Thinking

Dec 11, 2025

15:15

218

Distribution-calibrated inference time compute for thinking llm-as-a-judge

Dec 11, 2025

11:48

219

Principled RL for diffusion LLMs emerges from sequence level perspective

Dec 11, 2025

11:47

220

Algorithmic Thinking Theory

Dec 10, 2025

17:15

221

On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

Dec 10, 2025

13:48

222

Natural language actor-critic: Scalable off-policy learning in language space

Dec 9, 2025

13:49

223

Beyond the Transformer: Titans, MIRAS, and the Future of Infinite Context

Dec 7, 2025

38:47

224

On the Limits of Test-Time Compute: Sequential Reward Filtering for Better Inference

Dec 7, 2025

13:45

225

The Universal Weight Subspace Hypothesis

Dec 7, 2025

15:54

226

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Dec 7, 2025

14:39

227

Benchmarking In-context Experiential Learning Through Repeated Product Recommendations

Dec 4, 2025

15:37

228

Training LLMs for Honesty via Confessions

Dec 4, 2025

15:53

229

STOIC REASONER: Dual-Mode Transformers that Compress to Think and Decompress to Speak

Dec 4, 2025

12:11

230

E-GEO: A Testbed for Generative Engine Optimization in E-Commerce

Dec 4, 2025

32:52

231

1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities

Dec 4, 2025

15:02

232

Treatment Effect Estimation for Optimal Decision-Making

Dec 4, 2025

13:41

233

Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems

Dec 3, 2025

14:27

234

Debugging misaligned completions with sparse-autoencoder latent attribution

Dec 2, 2025

29:58

235

Building Effective AI Agents \ Anthropic

Dec 2, 2025

39:25

236

How to Correctly Report LLM-as-a-Judge Evaluations

Dec 2, 2025

11:40

237

In-Context Learning with Hypothesis-Class Guidance

Dec 2, 2025

12:40

238

Selecting Belief-State Approximations in Simulators with Latent States

Dec 1, 2025

11:04

239

Latent Collaboration in Multi-Agent Systems

Nov 29, 2025

13:15

240

CausalPFN: Amortized Causal Effect Estimation via In-Context Learning

Nov 28, 2025

27:56

241

DELTA: How Does RL Unlock and Transfer New Algorithms in LLMs?

Nov 28, 2025

10:39

242

Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing

Nov 27, 2025

15:24

243

Prompted Policy Search: Reinforcement Learning through Linguistic and Numerical Reasoning in LLMs

Nov 27, 2025

31:00

244

Ilya Sutskever – We're moving from the age of scaling to the age of research

Nov 26, 2025

39:00

245

Cognitive Foundations for Reasoning and Their Manifestation in LLMs

Nov 26, 2025

14:52

246

Natural emergent misalignment from reward hacking in production RL

Nov 25, 2025

15:31

247

Evolution Strategies at the Hyperscale

Nov 25, 2025

14:18

248

The Path Not Taken: RLVR Provably Learns Off the Principals

Nov 23, 2025

12:16

249

Back to Basics: Let Denoising Generative Models Denoise

Nov 23, 2025

15:27

250

LLM Prompt Duel Optimizer: Efficient Label-Free Prompt Optimization

Nov 22, 2025

13:20

251

Black-Box On-Policy Distillation of Large Language Models

Nov 20, 2025

14:20

252

Solving a million step LLM task with zero errors

Nov 20, 2025

14:38

253

Not All Thoughts Matter: Selective Attention for Efficient Reasoning

Nov 19, 2025

12:38

254

Sample-Efficient Parametric Learning from Natural Language

Nov 19, 2025

11:00

255

Bayesian Optimization in Language space: An Eval-Efficient AI Self-Improvement Framework

Nov 18, 2025

34:24

256

Context Engineering: Sessions, Memory

Nov 16, 2025

13:52

257

The Era of Agentic Organization: Learning to Organize with Language Models

Nov 15, 2025

10:47

258

Understanding neural networks through sparse circuits

Nov 14, 2025

12:47

259

Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning

Nov 14, 2025

10:53

260

Multi-Agent Evolve: LLM Self-Improvement Through Co-Evolution

Nov 14, 2025

9:59

261

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

Nov 14, 2025

12:36

262

PREFDISCO: Evaluating Proactive Personalization through Interactive Preference Discovery

Nov 12, 2025

14:42

263

Reusing pre-training data at test time is a compute multiplier

Nov 10, 2025

15:55

264

Scaling Agent Learning via Experience Synthesis

Nov 9, 2025

16:54

265

Continuous Autoregressive Language Models

Nov 8, 2025

16:03

266

Toward a Theory of Agents as Tool-Use Decision-Makers

Nov 7, 2025

19:57

267

Nested Learning: The Illusion of Deep Learning Architectures

Nov 5, 2025

13:06

268

GST-UNet: A Neural Framework for Spatiotemporal Causal Inference with Time-Varying Confounding

Nov 5, 2025

17:53

269

Beyond a million tokens: benchmarking and enhancing long-term memory in llms

Nov 4, 2025

15:03

270

Agentic Economic Modeling

Nov 3, 2025

14:27

271

Emergent Introspective Awareness in Large Language Models

Nov 3, 2025

15:41

272

Can Large reasoning models self-train?

Nov 1, 2025

11:54

273

ALITA-G: Self-Evolving Generative Agent for Agent Generation

Nov 1, 2025

15:47

274

Self-improving LLM agents at test-time

Oct 30, 2025

19:04

275

Offline RL by Reward-Weighted Fine-Tuning for Conversation Optimization

Oct 30, 2025

14:40

276

Language models are injective and hence invertible

Oct 30, 2025

11:37

277

ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory

Oct 29, 2025

15:13

278

RLAD: Training LLMs to Discover Abstractions

Oct 29, 2025

16:18

279

How to Train Your Advisor: Steering Black-Box LLMs with ADVISOR MODELS

Oct 29, 2025

13:05

280

Self-improving LLM agents at Test-Time

Oct 27, 2025

23:01

281

KL-Regularized Reinforcement Learning is designed to Mode Collapse

Oct 27, 2025

15:30

282

How do LLMs use their depth?

Oct 27, 2025

12:10

283

Thought Communication in Multiagent Collaboration

Oct 27, 2025

16:39

284

Reasoning with Sampling: Base Models Outperform RL

Oct 26, 2025

16:03

285

Continual Learning via Sparse Memory Finetuning

Oct 26, 2025

14:18

286

Direct Preference Optimization with Unobserved Preference Heterogeneity: The Necessity of Ternary Preferences

Oct 24, 2025

12:19

287

The Coverage Principle: How Pre-Training Enables Post-Training

Oct 24, 2025

16:11

288

The Era of Real-World Human Interaction: RL from User Conversations

Oct 24, 2025

13:46

289

Agent Learning via Early Experience

Oct 24, 2025

12:36

290

Demystifying the Mechanisms Behind Emergent Exploration in Goal-conditioned RL

Oct 22, 2025

14:33

291

Rewriting History: A Recipe for Interventional Analyses to Study Data Effects on Model Behavior

Oct 22, 2025

19:04

292

A Definition of AGI

Oct 22, 2025

16:28

293

Provably Learning from Language Feedback

Oct 21, 2025

19:55

294

In-Context Learning for Pure Exploration

Oct 21, 2025

16:30

295

On the Role of Preference Variance in Preference Optimization

Oct 20, 2025

14:42

296

Training LLM Agents to Empower Humans

Oct 20, 2025

13:38

297

Richard Sutton Declares LLMs a Dead End

Oct 20, 2025

13:20

298

Demystifying Reinforcement Learning in Agentic Reasoning

Oct 19, 2025

15:21

299

Emergent coordination in multi-agent language models

Oct 19, 2025

13:57

300

Learning-to-measure: in-context active feature acquisition

Oct 19, 2025

16:02

301

Andrej Karpathy's insights: AGI, Intelligence, and Evolution

Oct 19, 2025

16:11

302

Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data

Oct 18, 2025

12:48

303

Representation-Based Exploration for Language Models: From Test-Time to Post-Training

Oct 18, 2025

17:02

304

The attacker moves second: stronger adaptive attacks bypass defenses against LLM jail- Breaks and prompt injections

Oct 18, 2025

16:08

305

When can in-context learning generalize out of task distribution?

Oct 16, 2025

19:44

306

The Art of Scaling Reinforcement Learning Compute for LLMs

Oct 16, 2025

13:41

307

A small number of samples can poison LLMs of any size

Oct 16, 2025

13:58

308

Dual Goal Representations

Oct 14, 2025

17:11

309

Welcome to the Era of Experience

Oct 14, 2025

16:42

310

Value Flows: Flow-Based Distributional Reinforcement Learning

Oct 14, 2025

15:42

311

Self-Adapting Language Models

Oct 12, 2025

16:42

312

The Markovian Thinker

Oct 12, 2025

14:15

313

Moloch’s Bargain: emergent misalignment when LLMs compete for audiences

Oct 12, 2025

16:48

314

Transformer Predictor Dynamics and Task Diversity

Oct 11, 2025

16:21

315

Base models know how to reason, thinking models learn when

Oct 11, 2025

11:34

316

Spectrum tuning: Post-training for distributional coverage and in-context steerability

Oct 11, 2025

15:45

317

Understanding Prompt Tuning and In-Context Learning via Meta-Learning

Oct 11, 2025

14:28

318

MLPs Learn In-Context on Regression and Classification tasks

Oct 11, 2025

16:13

319

Is Pre-Training Truly Better than Meta-Learning?

Oct 11, 2025

20:56

320

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Oct 11, 2025

17:39

321

Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs

Oct 9, 2025

15:52

322

Learning dynamics of LLM finetuning

Oct 9, 2025

12:18

323

Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF

Oct 9, 2025

17:23

324

OpenAI Agent Builder and n8n: Orchestrating Reasoning Versus Automating Process

Oct 8, 2025

14:49

325

Training Agents Inside of Scalable World Models

Oct 8, 2025

13:45

326

Small Language Models are the Future of Agentic AI

Oct 7, 2025

19:04

327

Activation Steering in Generative Settings via Contrastive Causal Mediation Analysis

Oct 6, 2025

18:17

328

Eliciting Secret Knowledge from Language Models

Oct 6, 2025

14:48

329

Temporal difference flow

Oct 6, 2025

14:54

330

Personalized reasoning: just-in-time personalization and why LLMs fail at it

Oct 5, 2025

14:10

331

Prompt Curriculum Learning for Efficient LLM Post-Training

Oct 5, 2025

13:25

332

Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning

Oct 4, 2025

18:06

333

Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward

Oct 4, 2025

14:14

334

Learning to summarize user information for personalized reinforcement learning from human feedback

Oct 4, 2025

16:00

335

Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF

Oct 3, 2025

16:11

336

LIMI: Less is More for Agency

Oct 1, 2025

14:06

337

LoRA Without Regret

Oct 1, 2025

21:56

338

Actor-Critic without Actor: Critic-Guided Denoising for RL

Sep 29, 2025

16:01

339

DELTA-Code: How Does RL Unlock and Transfer New Programming Algorithms in LLMs?

Sep 29, 2025

16:12

340

Linear Transformers Implicitly Discover Unified Numerical Algorithms

Sep 29, 2025

14:15

341

Regularizing Extrapolation in Causal Inference

Sep 27, 2025

15:20

342

DoubleGen - Debiased Generative Modeling of Counterfactuals

Sep 27, 2025

12:39

343

What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT

Sep 27, 2025

16:44

344

Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision

Sep 27, 2025

16:16

345

Learning without training: The implicit dynamics of in-context learning

Sep 24, 2025

13:43

346

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model

Sep 24, 2025

13:22

347

Open Problems in Mechanistic Interpretability

Sep 21, 2025

18:54

348

Maestro: Joint Graph & Config Optimization for Reliable AI Agents

Sep 21, 2025

12:14

349

Thought Anchors: Which LLM Reasoning Steps Matter?

Sep 21, 2025

15:47

350

RL's Razor: Why Online RL Forgets Less

Sep 7, 2025

24:56

351

Why Language Models Hallucinate

Sep 6, 2025

17:40

352

ALFA: Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning

Sep 6, 2025

16:12

353

Sample Efficient Preference Alignment in LLMs via Active Exploration

Sep 6, 2025

15:05

354

Adventures in Demand Analysis Using AI

Sep 4, 2025

13:59

355

Memento: Fine-tuning LLM Agents without Fine-tuning LLMs

Sep 1, 2025

18:59

356

On the Theoretical Limitations of Embedding-Based Retrieval

Aug 31, 2025

17:25

357

Performance Prediction for Large Systems via Text-to-Text Regression

Aug 30, 2025

15:53

358

Demystifying the Visual Quality Paradox in Multimodal Large Language Models

Aug 30, 2025

16:47

359

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Aug 30, 2025

20:15

360

Compute-Optimal Scaling for Value-Based Deep RL

Aug 25, 2025

16:02

361

LLM-based Conversational Recommendation Agents with Collaborative Verbalized Experience

Aug 23, 2025

17:05

362

Signal and Noise: Evaluating Language Model Benchmarks

Aug 23, 2025

12:01

363

Breaking Feedback Loops in Recommender Systems with Causal Inference

Aug 21, 2025

12:54

364

RAG is Dead, Context Engineering is King: Building Reliable AI Systems

Aug 20, 2025

19:55

365

A Survey of Personalization: From RAG to Agent

Aug 20, 2025

25:00

366

Facilitating the Adoption of Causal Infer-ence Methods Through LLM-Empowered Co-Pilot

Aug 19, 2025

22:28

367

Performance Prediction for Large Systems via Text-to-Text Regression

Aug 16, 2025

19:09

368

Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning

Aug 15, 2025

27:47

369

DINOv3: Vision Models for Self-Supervised Learning

Aug 15, 2025

20:07

370

Agent Lightning: Training Any AI Agents with Reinforcement Learning

Aug 14, 2025

20:07

371

Computational-Statistical Tradeoffs at the Next-Token Prediction Barrier

Aug 14, 2025

11:34

372

From Model Weights to Agent Workflows: Charting the New Frontier of Optimization in Large Language Models

Aug 12, 2025

16:51

373

Is Chain-of-Thought Reasoning a Mirage?

Aug 12, 2025

18:37

374

Agentic Web: Weaving the Next Web with AI Agents

Aug 11, 2025

22:24

375

The Assimilation-Accommodation Gap in LLM Intelligence

Aug 10, 2025

23:12

376

The Minimalist AI Kernel: A New Frontier in Reasoning

Aug 6, 2025

19:03

377

Statistical Rigor for Interpretable AI

Aug 6, 2025

18:08

378

Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value

Aug 4, 2025

22:00

379

A foundation model to predict and capture human cognition

Aug 4, 2025

18:34

380

Generative Recommendation with Semantic IDs: A Practitioner’s Handbook

Aug 4, 2025

16:39

381

Hierarchical Reasoning Model

Aug 4, 2025

12:12

382

Test-time Offline Reinforcement Learning on Goal-related Experience

Aug 4, 2025

13:39

383

Interpreting Chain of Thought: A Walkthrough and Discussion

Aug 4, 2025

14:33

384

The wall confronting large language models

Aug 4, 2025

17:33

385

COLLABLLM: LLMs From Passive to Collaborative

Jul 31, 2025

17:40

386

A decade's battle on dataset bias: are we there yet?

Jul 29, 2025

16:17

387

GEPA: Generative Feedback for AI System Optimization

Jul 29, 2025

15:09

388

From AI-Curious to AI-First: Engineering Production AI Systems

Jul 28, 2025

35:50

389

Context Engineering: Beyond Simple Prompting to LLM Architecture

Jul 28, 2025

30:09

390

Agentic Misalignment: LLMs as Insider Threats

Jul 28, 2025

18:13

391

Small Language Models: Future of Agentic AI

Jul 28, 2025

21:24

392

Learning without training: The implicit dynamics of in-context learning

Jul 28, 2025

11:22

393

Inverse Scaling in Test-Time Compute

Jul 28, 2025

15:57

394

LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra

Jul 28, 2025

16:11

395

Microsoft's Blueprint: AI, Quantum, and the Agentic Future

Jul 26, 2025

27:27

396

Zuckerberg's AI Vision Analyzed

Jul 26, 2025

25:40

397

Inside Claude: Scaling, Agency, and Interpretability

Jul 26, 2025

34:26

398

Personalized language modeling from personalized human feedback

Jul 26, 2025

16:57

399

Position: Empowering Time Series Reasoning with Multimodal LLMs

Jul 25, 2025

16:14

400

An empirical risk minimization approach for offline inverse RL and Dynamic Discrete Choice models

Jul 22, 2025

14:44

401

Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities

Jul 22, 2025

26:20

402

The Invisible Leash: Why RLVR May Not Escape Its Origin

Jul 20, 2025

16:09

403

Language Model Personalization via Reward Factorization

Jul 20, 2025

10:03

404

Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions

Jul 18, 2025

13:41

405

Do We Need to Verify Step by Step? Rethinking Process Supervision from a Theoretical Perspective

Jul 17, 2025

12:56

406

Soft Best-of-n Sampling for Model Alignment

Jul 16, 2025

14:27

407

On Temporal Credit Assignment and Data-Efficient Reinforcement Learning

Jul 15, 2025

16:56

408

Bradley–Terry and Multi-Objective Reward Modeling Are Complementary

Jul 15, 2025

16:39

409

Probing Foundation Models for World Models

Jul 15, 2025

11:47

410

GenAI-Powered Statistical Inference (with Unstructured Data)

Jul 14, 2025

20:10

411

Interpretable Reward Modeling with Active Concept Bottlenecks

Jul 14, 2025

11:38

412

PrefillOnly: An Inference Engine for Prefill-only Workloads in Large Language Model Applications

Jul 14, 2025

13:54

413

A Collectivist, Economic Perspective on AI

Jul 14, 2025

21:19

414

Textual Bayes: Quantifying Uncertainty in LLM-Based Systems

Jul 12, 2025

9:03

415

The Winner's Curse in Data-Driven Decisions

Jul 11, 2025

30:06

416

SPIRAL: Self-Play for Reasoning Through Zero-Sum Games

Jul 11, 2025

17:19

417

Beyond Statistical Learning: Exact Learning Is Essential for General Intelligence

Jul 11, 2025

21:42

418

Aligning Learning and Endogenous Decision-Making

Jul 11, 2025

16:25

419

Reliable Statistical Inference with Synthetic Data from Large Language Models

Jul 11, 2025

14:12

420

Multi-Turn Reinforcement Learning from Human Preference Feedback

Jul 10, 2025

17:06

421

Provably Learning from Language Feedback

Jul 9, 2025

17:12

422

Markets with Heterogeneous Agents: Dynamics and Survival of Bayesian vs. No-Regret Learners

Jul 5, 2025

21:22

423

Why Neural Network Can Discover Symbolic Structures with Gradient-based Training: An Algebraic and Geometric Foundation

Jul 5, 2025

13:30

424

Causal Abstraction with Lossy Representations

Jul 4, 2025

25:36

425

The Winner's Curse in Data-Driven Decisions

Jul 4, 2025

23:00

426

Embodied AI Agents: Modeling the World

Jul 4, 2025

28:52

427

Beyond Statistical Learning: Exact Learning Is Essential for General Intelligence

Jul 4, 2025

20:04

428

What Has a Foundation Model Found? Inductive Bias Reveals World Models

Jul 4, 2025

11:46

429

Language Bottleneck Models: A Framework for Interpretable Knowledge Tracing and Beyond

Jul 3, 2025

23:20

430

Learning to Explore: An In-Context Learning Approach for Pure Exploration

Jul 3, 2025

16:33

431

Human-AI Matching: The Limits of Algorithmic Search

Jun 25, 2025

14:53

432

Uncertainty Quantification Needs Reassessment for Large-language Model Agents

Jun 25, 2025

18:49

433

Bayesian Meta-Reasoning for Robust LLM Generalization

Jun 25, 2025

19:44

434

General Intelligence Requires Reward-based Pretraining

Jun 25, 2025

17:27

435

Deep Learning is Not So Mysterious or Different

Jun 25, 2025

21:44

436

AI Agents Need Authenticated Delegation

Jun 25, 2025

18:56

437

Probabilistic Modelling is Sufficient for Causal Inference

Jun 25, 2025

21:33

438

Not All Explanations for Deep Learning Phenomena Are Equally Valuable

Jun 25, 2025

18:52

439

e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs

Jun 17, 2025

13:57

440

Extrapolation by Association: Length Generalization Transfer in Transformers

Jun 17, 2025

12:04

441

Uncovering Causal Hierarchies in Language Model Capabilities

Jun 17, 2025

18:32

442

Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers

Jun 17, 2025

13:49

443

Improving Treatment Effect Estimation with LLM-Based Data Augmentation

Jun 17, 2025

15:24

444

LLM Numerical Prediction Without Auto-Regression

Jun 17, 2025

14:34

445

Why in-context learning models are good few-shot learners?

Jun 17, 2025

21:13

446

Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina∗

Jun 14, 2025

27:43

447

The Logic of Machines: The AI Reasoning Debate

Jun 12, 2025

31:02

448

Layer by Layer: Uncovering Hidden Representations in Language Models

Jun 12, 2025

13:20

449

Causal Attribution Analysis for Continuous Outcomes

Jun 12, 2025

18:02

450

Training a Generally Curious Agent

Jun 12, 2025

13:43

451

Estimation of Treatment Effects Under Nonstationarity via Truncated Difference-in-Q’s

Jun 12, 2025

20:43

452

Strategy Coopetition Explains the Emergence and Transience of In-Context Learning

Jun 12, 2025

18:59

453

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Jun 11, 2025

17:24

454

Agentic Supernet for Multi-agent Architecture Search

Jun 11, 2025

18:08

455

Sample Complexity and Representation Ability of Test-time Scaling Paradigms

Jun 11, 2025

14:53

456

Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators

Jun 10, 2025

19:29

457

LLMs Get Lost In Multi-Turn Conversation

Jun 9, 2025

20:34

458

PromptPex: Automatic Test Generation for Prompts

Jun 8, 2025

11:54

459

General Agents Need World Models

Jun 8, 2025

15:25

460

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models

Jun 7, 2025

12:43

461

Decisions With Algorithms

Jun 7, 2025

59:25

462

Adapting, fast and slow: Causal Approach to Few-Shot Sequence Learning

Jun 6, 2025

43:52

463

Conformal Arbitrage for LLM Objective Balancing

Jun 6, 2025

22:25

464

Simulation-Based Inference for Adaptive Experiments

Jun 6, 2025

48:46

465

Agents as Tool-Use Decision-Makers

Jun 6, 2025

22:25

466

Quantitative Judges for Large Language Models

Jun 6, 2025

18:12

467

Self-Challenging Language Model Agents

Jun 6, 2025

14:30

468

Learning to Explore: An In-Context Learning Approach for Pure Exploration

Jun 6, 2025

29:51

469

How Bidirectionality Helps Language Models Learn Better via Dynamic Bottleneck Estimation

Jun 6, 2025

19:27

470

A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models

Jun 5, 2025

16:47

471

Simplifying Bayesian Optimization Via In-Context Direct Optimum Sampling

Jun 5, 2025

23:36

472

Bayesian Teaching Enables Probabilistic Reasoning in Large Language Models

Jun 5, 2025

15:40

473

IPO: Interpretable Prompt Optimization for Vision-Language Models

Jun 5, 2025

13:43

474

Evolutionary Prompt Optimization discovers emergent multimodal reasoning strategies

Jun 5, 2025

18:46

475

Evaluating the Unseen Capabilities: How Many Theorems Do LLMs Know?

Jun 4, 2025

14:00

476

Diffusion Guidance Is a Controllable Policy Improvement Operator

Jun 2, 2025

17:26

477

Alita: Generalist Agent With Self-Evolution

Jun 2, 2025

15:26

478

A Snapshot of Influence: A Local Data Attribution Framework for Online Reinforcement Learning

Jun 2, 2025

25:48

479

Learning Compositional Functions with Transformers from Easy-to-Hard Data

Jun 2, 2025

12:21

480

Preference Learning with Response Time

Jun 2, 2025

22:07

481

Accelerating RL for LLM Reasoning with Optimal Advantage Regression

May 31, 2025

23:25

482

Algorithms for reliable decision-making need causal reasoning

May 31, 2025

27:07

483

Belief Attribution as Mental Explanation: The Role of Accuracy, Informativity, and Causality

May 31, 2025

10:46

484

Distances for Markov chains from sample streams

May 31, 2025

19:11

485

When and Why LLMs Fail to Reason Globally

May 31, 2025

18:19

486

IDA-Bench: Evaluating LLMs on Interactive Guided Data Analysis

May 31, 2025

23:26

487

No Free Lunch: Non-Asymptotic Analysis of Prediction-Powered Inference

May 31, 2025

15:30

488

Accelerating RL for LLM Reasoning with Optimal Advantage Regression

May 31, 2025

23:25

489

Statistical Inference for Online Algorithms

May 31, 2025

14:53

490

Prismatic Synthesis for Diverse LLM Reasoning Data

May 31, 2025

19:19

491

Position: Uncertainty Quantification Needs Reassessment for Large-language Model Agents

May 31, 2025

25:17

492

The Agentic Economy

May 30, 2025

38:37

493

Statistics for Large Language Models

May 29, 2025

18:46

494

Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search

May 29, 2025

20:37

495

Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning

May 29, 2025

22:04

496

Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL

May 29, 2025

25:11

497

Value-Guided Search for Efficient Chain-of-Thought Reasoning

May 29, 2025

18:28

498

Shallow Preference Signals: Large Language model aligns even better without truncated data?

May 29, 2025

13:47

499

Gaming Tool Preferences in Agentic LLMs

May 29, 2025

19:05

500

Partner Modelling Emerges in Recurrent Agents (But Only When It Matters)

May 29, 2025

12:56

501

LLM Populations Form Social Conventions and Collective Bias

May 29, 2025

15:46

502

LLM Generated Persona is a Promise with a Catch

May 29, 2025

18:11

503

Large Language Models for Digital Twin Simulation

May 29, 2025

20:58

504

From RL Distillation to Autonomous LLM Agents

May 29, 2025

27:11

505

Prompting, Auto-Prompting, and Human-AI Communication

May 29, 2025

17:21

506

Textual Gradients for LLM Optimization

May 29, 2025

23:44

507

Large Language Models as Markov Chains

May 28, 2025

16:14

508

Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation

May 28, 2025

21:26

509

Selective induction heads: how transformers select causal structures in context

May 28, 2025

13:34

510

The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains

May 28, 2025

14:02

511

How Transformers Learn Causal Structure with Gradient Descent

May 28, 2025

14:04

512

Planning anything with rigor: general-purpose zero-shot planning with llm-based formalized programming

May 28, 2025

19:55

513

Automated Design of Agentic Systems

May 28, 2025

13:34

514

What’s the Magic Word? A Control Theory of LLM Prompting

May 28, 2025

13:57

515

BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling

May 27, 2025

20:40

516

RL with KL penalties is better viewed as Bayesian inference

May 27, 2025

13:28

517

Asymptotics of Language Model Alignment

May 27, 2025

14:44

518

Qwen 2.5, RL, and Random Rewards

May 27, 2025

15:10

519

Theoretical guarantees on the best-of-n alignment policy

May 27, 2025

15:17

520

Score Matching Enables Causal Discovery of Nonlinear Additive Noise Models

May 27, 2025

18:42

521

Improved Techniques for Training Score-Based Generative Models

May 27, 2025

19:35

522

Your Pre-trained LLM is Secretly an Unsupervised Confidence Calibrator

May 27, 2025

13:47

523

AlphaEvolve: A coding agent for scientific and algorithmic discovery

May 27, 2025

23:52

524

Harnessing the Universal Geometry of Embeddings

May 27, 2025

23:00

525

Goal Inference using Reward-Producing Programs in a Novel Physics Environment

May 27, 2025

18:30

526

Trial-Error-Explain In-Context Learning for Personalized Text Generation

May 27, 2025

12:20

527

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

May 27, 2025

12:40

528

Test-Time Reinforcement Learning (TTRL)

May 27, 2025

17:49

529

Interpreting Emergent Planning in Model-Free Reinforcement Learning

May 26, 2025

14:56

530

Agentic Reward Modeling_Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems

May 26, 2025

13:04

531

Beyond Reward Hacking: Causal Rewards for Large LanguageModel Alignment

May 26, 2025

12:34

532

Learning How Hard to Think: Input-Adaptive Allocation of LM Computation

May 26, 2025

19:31

533

Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval

May 26, 2025

15:39

534

UFT: Unifying Supervised and Reinforcement Fine-Tuning

May 26, 2025

14:20

535

Understanding High-Dimensional Bayesian Optimization

May 26, 2025

19:39

536

Inference time alignment in continuous space

May 25, 2025

15:55

537

Efficient Test-Time Scaling via Self-Calibration

May 25, 2025

24:17

538

Conformal Prediction via Bayesian Quadrature

May 25, 2025

22:58

539

Predicting from Strings: Language Model Embeddings for Bayesian Optimization

May 25, 2025

27:18

540

Self-Evolving Curriculum for LLM Reasoning

May 25, 2025

14:55

541

Online Decision-Focused Learning in Dynamic Environments

May 25, 2025

20:48

542

FisherSFT: Data-Efficient Supervised Fine-Tuning of Language Models Using Information Gain

May 25, 2025

14:11

543

Reward Shaping from Confounded Offline Data

May 25, 2025

20:33

544

Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning

May 25, 2025

17:45

545

Understanding Best-of-N Language Model Alignment

May 25, 2025

14:13

546

Maximizing Acquisition Functions for Bayesian Optimization - and its relation to Gradient Descent

May 24, 2025

19:17

547

Bayesian Prompt Ensembles: Model Uncertainty Estimation for Black-Box Large Language Models

May 24, 2025

17:15

548

Prompting Strategies for Enabling Large Language Models to Infer Causation from Correlation

May 24, 2025

12:05

549

The Parallel Knowledge Gradient Method for Batch Bayesian Optimization

May 24, 2025

14:57

550

FunBO: Discovering Acquisition Functions for Bayesian Optimization with FunSearch

May 24, 2025

16:09

551

Automated Social Science: A Structural Causal Model-Based Approach

May 24, 2025

11:19

552

Causal Interpretation of Transformer Self-Attention

May 24, 2025

14:07

553

A Causal World Model Underlying Next Token Prediction: Exploring GPT in a Controlled Environment

May 24, 2025

19:34

554

Trace is the Next AutoDiff: Generative Optimization with Rich Feedback, Execution Traces, and LLMs

May 24, 2025

23:41

555

Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation

May 24, 2025

19:03

556

Prompts from Reinforcement Learning (PRL)

May 24, 2025

19:14

557

Logits are All We Need to Adapt Closed Models

May 24, 2025

13:55

558

Large Language Models Are (Bayesian) Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning

May 23, 2025

18:07

559

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

May 23, 2025

13:35

560

From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models

May 23, 2025

32:22

561

LLM In-Context Learning as Kernel Regression

May 23, 2025

12:56

562

Personalizing LLMs via Decode-Time Human Preference Optimization

May 23, 2025

15:32

563

Almost Surely Safe LLM Inference-Time Alignment

May 23, 2025

13:38

564

Survey of In-Context Learning Interpretation and Analysis

May 23, 2025

32:43

565

From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models

May 23, 2025

32:22

566

LLM In-Context Learning as Kernel Regression

May 23, 2025

12:56

567

Where does In-context Learning Happen in Large Language Models?

May 23, 2025

12:34

568

Auto-Differentiating Any LLM Workflow: A Farewell to Manual Prompting

May 22, 2025

19:00

569

metaTextGrad: Learning to learn with language models as optimizers

May 22, 2025

18:37

570

Semantic Operators: A Declarative Model for Rich, AI-based Data Processing

May 22, 2025

16:32

571

Isolated Causal Effects of Language

May 22, 2025

18:01

572

Sleep-time Compute: Beyond Inference Scaling at Test-time

May 22, 2025

12:00

573

J1: Incentivizing Thinking in LLM-as-a-Judge

May 22, 2025

18:46

574

ShiQ: Bringing back Bellman to LLMs

May 22, 2025

18:03

575

Policy Learning with a Natural Language Action Space: A Causal Approach

May 22, 2025

15:05

576

Multi-Objective Preference Optimization: Improving Human Alignment of Generative Models

May 22, 2025

16:35

577

End-to-End Learning for Stochastic Optimization: A Bayesian Perspective

May 21, 2025

34:41

578

TEXTGRAD: Automatic Differentiation via Text

May 21, 2025

17:57

579

Steering off Course: Reliability Challenges in Steering Language Models

May 20, 2025

17:29

580

Past-Token Prediction for Long-Context Robot Policies

May 20, 2025

15:59

581

Recovering Coherent Event Probabilities from LLM Embeddings

May 20, 2025

13:55

582

Systematic Meta-Abilities Alignment in Large Reasoning Models

May 20, 2025

16:34

583

Predictability Shapes Adaptation: An Evolutionary Perspective on Modes of Learning in Transformers

May 20, 2025

21:58

584

Efficient Exploration for LLMs

May 19, 2025

13:56

585

Rankers, Judges, and Assistants: Towards Understanding the Interplay of LLMs in Information Retrieval Evaluation

May 18, 2025

25:35

586

Bayesian Concept Bottlenecks with LLM Priors

May 17, 2025

22:29

587

Transformers for In-Context Reinforcement Learning

May 17, 2025

15:28

588

Evaluating Large Language Models Across the Lifecycle

May 17, 2025

22:44

589

Active Ranking from Human Feedback with DopeWolfe

May 16, 2025

13:27

590

Optimal Designs for Preference Elicitation

May 16, 2025

13:44

591

Dual Active Learning for Reinforcement Learning from Human Feedback

May 16, 2025

18:19

592

Active Learning for Direct Preference Optimization

May 16, 2025

13:02

593

Active Preference Optimization for RLHF

May 16, 2025

12:08

594

Test-Time Alignment of Diffusion Models without reward over-optimization

May 16, 2025

28:01

595

Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback

May 16, 2025

24:09

596

GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment

May 16, 2025

9:06

597

Advantage-Weighted Regression: Simple and Scalable Off-Policy RL

May 16, 2025

18:38

598

Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective

May 16, 2025

17:54

599

Transformers can be used for in-context linear regression in the presence of endogeneity

May 15, 2025

12:02

600

Bayesian Concept Bottlenecks with LLM Priors

May 15, 2025

21:27

601

In-Context Parametric Inference: Point or Distribution Estimators?

May 15, 2025

11:52

602

Enough Coin Flips Can Make LLMs Act Bayesian

May 15, 2025

12:56

603

Bayesian Scaling Laws for In-Context Learning

May 15, 2025

17:15

604

Posterior Mean Matching Generative Modeling

May 15, 2025

18:33

605

Can Generative AI Solve Your In-Context Learning Problem? A Martingale Perspective

May 15, 2025

22:21

606

Dynamic Search for Inference-Time Alignment in Diffusion Models

May 15, 2025

14:27

607

Is In-Context Learning in Large Language Models Bayesian? A Martingale Perspective

May 12, 2025

16:06

608

Leaked Claude Sonnet 3.7 System Instruction tuning

May 12, 2025

14:03

609

Converging Predictions with Shared Information

May 11, 2025

9:56

610

Test-Time Alignment Via Hypothesis Reweighting

May 11, 2025

21:26

611

Rethinking Diverse Human Preference Learning through Principal Component Analysis

May 11, 2025

17:25

612

Active Statistical Inference

May 10, 2025

15:59

613

Data Mixture Optimization: A Multi-fidelity Multi-scale Bayesian Framework

May 10, 2025

13:21

614

AI-Powered Bayesian Inference

May 10, 2025

17:55

615

Can Unconfident LLM Annotations Be Used for Confident Conclusions?

May 9, 2025

21:09

616

Predictions as Surrogates: Revisiting Surrogate Outcomes in the Age of AI

May 9, 2025

19:42

617

Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control

May 9, 2025

15:32

618

How to Evaluate Reward Models for RLHF

May 9, 2025

14:32

619

LLMs as Judges: Survey of Evaluation Methods

May 9, 2025

26:56

620

The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs

May 9, 2025

15:38

621

Limits to scalable evaluation at the frontier: LLM as Judge won’t beat twice the data

May 9, 2025

12:15

622

Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation

May 9, 2025

13:23

623

Accelerating Unbiased LLM Evaluation via Synthetic Feedback

May 9, 2025

20:45

624

Prediction-Powered Statistical Inference Framework

May 9, 2025

10:47

625

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

May 9, 2025

15:32

626

RM-R1: Reward Modeling as Reasoning

May 9, 2025

19:36

627

Reexamining the Aleatoric and Epistemic Uncertainty Dichotomy

May 8, 2025

16:31

628

Decoding Claude Code: Terminal Agent for Developers

May 7, 2025

13:59

629

Emergent Strategic AI Equilibrium from Pre-trained Reasoning

May 7, 2025

27:45

630

Benefiting from Proprietary Data with Siloed Training

May 6, 2025

18:27

631

Advantage Alignment Algorithms

May 6, 2025

16:14

632

Asymptotic Safety Guarantees Based On Scalable Oversight

May 6, 2025

19:19

633

What Makes a Reward Model a Good Teacher? An Optimization Perspective

May 6, 2025

13:48

634

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

May 6, 2025

15:12

635

Identifiable Steering via Sparse Autoencoding of Multi-Concept Shifts

May 6, 2025

12:22

636

You Are What You Eat - AI Alignment Requires Understanding How Data Shapes Structure and Generalisation

May 6, 2025

14:51

637

Interplay of LLMs in Information Retrieval Evaluation

May 3, 2025

15:49

638

Trade-Offs Between Tasks Induced by Capacity Constraints Bound the Scope of Intelligence

May 3, 2025

19:18

639

Toward Efficient Exploration by Large Language Model Agents

May 3, 2025

18:52

640

Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT

May 2, 2025

13:19

641

Self-Consuming Generative Models with Curated Data

May 2, 2025

17:13

642

Bootstrapping Language Models with DPO Implicit Rewards

May 2, 2025

20:24

643

DeepSeek-Prover-V2: Advancing Formal Reasoning

May 1, 2025

11:29

644

THINKPRM: Data-Efficient Process Reward Models

May 1, 2025

24:58

645

Societal Frameworks and LLM Alignment

Apr 29, 2025

17:52

646

Risks from Multi-Agent Advanced AI

Apr 29, 2025

28:39

647

Causality-Aware Alignment for Large Language Model Debiasing

Apr 29, 2025

19:06

648

Reward Models Evaluate Consistency, Not Causality

Apr 28, 2025

17:05

649

Causal Rewards for Large Language Model Alignment

Apr 28, 2025

15:03

650

Sycophancy to subterfuge: Investigating reward-tampering in large language models

Apr 28, 2025

14:59

651

Bidirectional AI Alignment

Apr 28, 2025

78:30

652

Why Do Multi-Agent LLM Systems Fail?

Apr 27, 2025

20:18

653

LLMs as Greedy Agents: RL Fine-tuning for Decision-Making

Apr 27, 2025

18:19

654

LLM Feedback Loops and the Lock-in Hypothesis

Apr 27, 2025

12:42

655

Representational Alignment Drives Effective Teaching and Learning

Apr 27, 2025

13:53

656

Adaptive Parallel Reasoning with Language Models

Apr 27, 2025

16:23

657

AI: Rewiring the Flow of Ideas and Human Knowledge

Apr 27, 2025

21:53

658

Learning and Equilibrium with Ranking Feedback

Apr 27, 2025

17:49

659

Designing Human-AI Collaboration: A Sufficient-Statistic Approach

Apr 27, 2025

24:33

660

GOAT: Generative Adversarial Training for Human-AI Coordination

Apr 27, 2025

17:35

661

π0.5: Generalization in Robotic Manipulation via Diverse Data

Apr 27, 2025

11:04

662

NoWag: Unified Compression for Large Language Models

Apr 26, 2025

17:55

663

Optimal Tool Calls in Language Model Reasoning

Apr 26, 2025

24:49

664

Data Selection for Empirical Risk Minimization

Apr 26, 2025

34:10

665

LoRe: Low-Rank Reward Modeling for Personalized LLMs

Apr 26, 2025

10:43

666

ParaPO: Reducing Language Model Verbatim Reproduction

Apr 26, 2025

15:15

667

Test-Time RL: Self-Evolving LLMs via Majority Voting Rewards

Apr 25, 2025

18:17

668

Tina: Tiny LoRA Reasoning Models

Apr 25, 2025

15:37

669

Evaluating large language models in theory of mind tasks

Apr 25, 2025

14:57

670

QUEST: Quality Sampling for Machine Translation

Apr 24, 2025

9:36

671

Offline Preference Learning via Simulated Trajectory Feedback

Apr 24, 2025

17:21

672

Reasoning Elicitation in Language Models via Counterfactual Feedback

Apr 24, 2025

20:33

673

Eliciting Human Preferences with Language Models

Apr 24, 2025

11:56

674

Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning

Apr 24, 2025

22:04

675

γ-Bench: Evaluating LLMs in Multi-Agent Games

Apr 24, 2025

24:06

676

DRAFT: Self-Driven LLM Tool Mastery via Documentation Refinement

Apr 24, 2025

13:05

677

Optimal Prediction Sets for Enhanced Human-AI Accuracy

Apr 24, 2025

15:08

678

Self-Correction via Reinforcement Learning for Language Models

Apr 24, 2025

12:30

679

Tractable Multi-Agent Reinforcement Learning through Behavioral Economics

Apr 24, 2025

17:55

680

Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement

Apr 24, 2025

10:33

681

Iterative Nash Policy Optimization for Language Model Alignment

Apr 24, 2025

20:14

682

SycEval: Benchmarking LLM Sycophancy in Mathematics and Medicine

Apr 23, 2025

15:52

683

Stack AI: Democratizing Enterprise AI Development

Apr 22, 2025

23:02

684

Evaluating Modern Recommender Systems: Challenges and Future Directions

Apr 22, 2025

29:43

685

AI in the Enterprise: Seven Lessons from Frontier Companies by OpenAI

Apr 22, 2025

44:55

686

Discussion: Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Apr 21, 2025

20:37

687

AI Agent Protocols and Human Preference

Apr 21, 2025

14:34

688

Cross-Environment Cooperation for Zero-Shot Multi-Agent Coordination

Apr 20, 2025

17:36

689

Sutton and Silver: The Era of Experience: Learning Beyond Human Data

Apr 19, 2025

31:48

690

Sample, Don't Search: Rethinking Test-Time Alignment for Language Models

Apr 19, 2025

15:54

691

AI Agents: Echoes of Past Technology Pivots?

Apr 19, 2025

14:54

692

Minimalist LLM Reasoning: Rejection Sampling to Reinforcement

Apr 19, 2025

13:25

693

Securing the Model Context Protocol in Enterprise Environments

Apr 19, 2025

18:46

694

Improving Multi-Turn Tool Use with Reinforcement Learning

Apr 19, 2025

14:44

695

Cultural Knowledge Conservation and Control in Large Language Models

Apr 19, 2025

12:27

696

Data Quality, Repetition, and Scaling of Language Models

Apr 18, 2025

19:00

697

Compute-Optimal Scaling Laws for Language Models Revisited

Apr 18, 2025

17:14

698

Concise Reasoning via Reinforcement Learning

Apr 18, 2025

13:41

699

Throughput Limits for LLM Inference and AI Agent Scheduling

Apr 14, 2025

32:21

700

RL Post-training Amplifies Pretraining Behaviors in Language Models

Apr 14, 2025

15:45

701

Fast Adaptation of Behavioral Foundation Models

Apr 14, 2025

22:24

702

Proprietary Reward Models: Sustaining Advantage in Agentic AI

Apr 13, 2025

24:26

703

Why Multi-Agent LLM Systems Fail: A Comprehensive Study

Apr 12, 2025

18:58

704

Play2Prompt: Zero-Shot Tool Instruction Optimization via Tool Play

Apr 12, 2025

16:46

705

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Apr 12, 2025

46:36

706

API and GUI Agents: Divergence, Convergence, and Hybrid Approaches

Apr 12, 2025

18:04

707

AI, Chess, and Competitive Advantage: Substitution and Complementation

Apr 12, 2025

20:42

708

Knowledge of the Firm and Replication of Technology

Apr 12, 2025

19:21

709

Firm Resources and Sustained Competitive Advantage

Apr 12, 2025

14:57

710

Evaluating Pharmaceutical Marketing to Physicians with Panel Data

Apr 12, 2025

26:10

711

Theory of the firm in the era of Agents

Apr 12, 2025

42:45

712

Large Language Models: An Applied Econometric Framework

Apr 12, 2025

22:53

713

Evaluating the World Model Implicit in a Generative Model

Apr 12, 2025

17:37

714

Machine Learning for Hypothesis Generation in Social Science

Apr 11, 2025

10:10

715

Active Learning for Moral Preference Elicitation: Challenges and Nuances

Apr 11, 2025

21:58

716

Gradient-Based Surveys for Nonparametric Discrete Choice Experiments

Apr 11, 2025

19:45

717

Explainable Data-driven Share-of-choice Product Line Design Optimization

Apr 11, 2025

22:17

718

The More You Ask, the Less You Get: When Additional Questions Hurt External Validity

Apr 11, 2025

16:08

719

Conjoint topics from Handbook of Marketing Analytics: Methods and Applications

Apr 11, 2025

14:44

720

Choice-Based Conjoint Analysis: Methods and Applications

Apr 11, 2025

20:40

721

Beyond Conjoint Analysis: The Future of Preference Measurement

Apr 11, 2025

34:10

722

An Optimization Framework for Adaptive Questionnaire Design

Apr 11, 2025

20:49

723

Adaptive Self-Explication of Multiattribute Preferences

Apr 11, 2025

17:54

724

Conjoint Analysis: Methods, Applications, and Recent Developments

Apr 11, 2025

18:30

725

Current Issues and a “Wish List” for Conjoint Analysis

Apr 11, 2025

22:44

726

Ellipsoidal Methods for Adaptive Choice-Based Conjoint Analysis

Apr 11, 2025

14:35

727

Adaptive Polyhedral Methods for Conjoint Analysis

Apr 11, 2025

20:04

728

MSL: Enhancing LLM Recommenders via Masked Softmax Loss

Apr 11, 2025

15:56

729

Self-Supervised Deep Reinforcement Learning for Optimal Question Ranking

Apr 11, 2025

21:05

730

Adaptive Language Elicitation for Latent Information Discovery

Apr 10, 2025

16:55

731

LLM Persona Bias: Promise and Peril in Simulation

Apr 10, 2025

17:37

732

AutoTools: Automating Tool Use for Large Language Models

Apr 10, 2025

19:52

733

Tool Learning with Large Language Models: A Comprehensive Survey

Apr 10, 2025

23:29

734

All Roads Lead to Likelihood: RL for Fine-Tuning Value

Apr 8, 2025

24:16

735

ATLAS: Tuning Agents via Critical Step Learning

Apr 8, 2025

20:00

736

Thinking Faster by Writing Less: Chain of Draft Reasoning

Apr 8, 2025

18:36

737

Meta Plan Optimization for Boosting LLM Agents

Apr 8, 2025

19:01

738

L1: Length Controlled Reasoning with Reinforcement Learning

Apr 8, 2025

16:59

739

WikiBigEdit: Benchmarking Lifelong Knowledge Editing in LLMs

Apr 8, 2025

20:12

740

PLAN-AND-ACT: LLM Agent Planning with Synthetic Data

Apr 8, 2025

14:32

741

SEARCH-R1: LLMs Learn to Reason and Search via Reinforcement Learning

Apr 8, 2025

23:54

742

The Theory of the Firm: Information, Incentives, and Organization

Apr 8, 2025

24:39

743

Four Formalizable Theories of the Firm

Apr 8, 2025

32:04

744

Efficient Tool Use with Chain-of-Abstraction Reasoning

Apr 6, 2025

21:21

745

CodeTool: Process Supervision for Enhanced LLM Tool Invocation

Apr 6, 2025

17:01

746

Evaluating LLM Agents in Multi-Turn Conversations: A Survey

Apr 6, 2025

29:26

747

Epistemic Alignment in User-LLM Knowledge Delivery

Apr 6, 2025

17:00

748

MCP is (not) all you need

Apr 6, 2025

28:54

749

AI, Human Skills, and Competitive Advantage in Chess

Apr 5, 2025

23:24

750

Inference-Time Scaling for Generalist Reward Modeling

Apr 4, 2025

21:48

751

Optimal Pure Exploration in Linear Bandits via Sampling

Apr 4, 2025

25:31

752

Presidential Address: The Economist as Designer in the Innovation Process for Socially Impactful Digital Products

Apr 4, 2025

44:37

753

Emergent Symbolic Mechanisms for Reasoning in Large Language Models

Apr 3, 2025

17:26

754

Inference-Time Alignment: Coverage, Scaling, and Optimality

Apr 3, 2025

14:40

755

Sharpe Ratio-Guided Active Learning for Preference Optimization

Apr 3, 2025

19:08

756

Active Learning for Adaptive In-Context Prompt Design

Apr 3, 2025

15:41

757

Visual Chain-of-Thought Reasoning for Vision-Language-Action Models

Apr 3, 2025

20:19

758

On the Biology of a Large Language Model

Apr 1, 2025

19:01

759

Async-TB: Asynchronous Trajectory Balance for Scalable LLM RL

Apr 1, 2025

17:56

760

Instacart's Economics Team: A Hybrid Role in Tech

Mar 31, 2025

18:37

761

Data Mixture Optimization: A Multi-fidelity Multi-scale Bayesian Framework

Mar 31, 2025

22:01

762

Why MCP won

Mar 31, 2025

17:03

763

SWEET-RL: Training LLM Agents for Collaborative Reasoning

Mar 31, 2025

24:31

764

TheoryCoder: Bilevel Planning with Synthesized World Models

Mar 30, 2025

23:18

765

Driving Forces in AI: Scaling to 2025 and Beyond (Jason Wei, OpenAI)

Mar 29, 2025

22:31

766

Expert Demonstrations for Sequential Decision Making under Heterogeneity

Mar 28, 2025

17:58

767

TextGrad: Backpropagating Language Model Feedback for Generative AI Optimization

Mar 27, 2025

26:07

768

MemReasoner: Generalizing Language Models on Reasoning-in-a-Haystack Tasks

Mar 27, 2025

17:33

769

RAFT: In-Domain Retrieval-Augmented Fine-Tuning for Language Models

Mar 27, 2025

20:36

770

Inductive Biases for Exchangeable Sequence Modeling

Mar 26, 2025

20:12

771

InverseRLignment: LLM Alignment via Inverse Reinforcement Learning

Mar 26, 2025

25:01

772

Prompt-OIRL: Offline Inverse RL for Query-Dependent Prompting

Mar 26, 2025

15:53

773

Alignment from Demonstrations for Large Language Models

Mar 25, 2025

20:51

774

Q♯: Distributional RL for Optimal LLM Post-Training

Mar 18, 2025

20:00

775

Scaling Test-Time Compute Without Verification or RL is Suboptimal

Mar 14, 2025

15:13

776

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

Mar 14, 2025

11:35

777

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

Mar 14, 2025

4:50

778

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Mar 14, 2025

1:45

779

Revisiting Superficial Alignment Hypothesis

Mar 14, 2025

4:11

780

Diagnostic uncertainty: teaching language Models to describe open-ended uncertainty

Mar 14, 2025

4:16

781

Language Model Personalization via Reward Factorization

Mar 14, 2025

4:53

782

How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach

Mar 14, 2025

4:08

783

Can Large Language Models Extract Customer Needs as well as Professional Analysts?

Mar 13, 2025

4:46

784

Spurlens: finding spurious correlations in Multimodal llms

Mar 13, 2025

4:39

785

Improving test-time search with backtrack- Ing Improving test-time search with backtrack- Ing against in-context value verifiersagainst in-context value verifiers

Mar 13, 2025

3:59

786

Adaptive elicitation of latent information Using natural language

Mar 13, 2025

4:20

787

Document Valuation in LLM Summaries: A Cluster Shapley Approach

Mar 13, 2025

3:48

788

s1: simple test time scaling

Mar 13, 2025

5:17

All Episodes