Best AI papers explained cover art

All Episodes

Best AI papers explained — 739 episodes

#
Title
1

EVOLM: Self-Evolving Language Models through Co-Evolved Discriminative Rubrics

2

Personalized Alignment Revisited: The Necessity and Sufficiency of User Diversity

3

OGPO: Sample Efficient Full-Finetuning of Generative Control Policies

4

Adaptive Querying with AI Persona Priors

5

Rethinking the Role of LLMs in Time Series Forecasting

6

Robust Representation Learning through Explicit Environment Modeling

7

Magentic Marketplace: An Open-Source Environment for studying Agentic Markets

8

Hyperloop Transformers

9

Scaling Self-Play with Self-Guidance

10

RL Token: Bootstrapping Online RL with Vision-Language-Action Models

11

Agentic Data Environments

12

AI organizations are more effective but less aligned than individual agents

13

Text-to-Distribution Prediction with Quantile Tokens and Neighbor Context

14

Distortion of AI alignment revisited: RLHF is a decent utilitarian aligner

15

Llms get lost in multi-turn conversation

16

Transformers are inherently succint

17

The Coasean Singularity? Demand, Supply, and Market Design with AI Agents

18

Demystifying the unreasonable effectiveness of online alignment methods

19

Specialization after generalization: towards understanding test-time training in foundation models

20

Exploration and Exploitation Errors Are Measurable for Language Model Agents

21

A Mechanistic Analysis of Looped Reasoning Language Models

22

Sample Complexity of Autoregressive Reasoning: Chain-of-Thought vs. End-to-End

23

Why AI systems don’t learn and what to do about it

24

The Illusion of Learning from Observational Data: An Empirical Bayes Perspective

25

Ads in AI chatbots? An analysis of how large language models navigate conflicts of interest

26

Beyond Semantic Manipulation: Token-Space Attacks on Reward Models

27

LLM Evaluation as Tensor Completion: Low-Rank Efficiency and Uncertainty Quantification

28

Neural Computers

29

How AI Aggregation Affects Knowledge

30

World Action Verifier: Self-Improving World Models via Forward-Inverse Asymmetry

31

In-Place Test-Time Training

32

Test-Time Scaling Makes Overtraining Compute-Optimal

33

AI Agent Prevalence and Data Quality Across Multiple Online Sample Providers

34

POLCA: Stochastic Generative Optimization with LLM

35

Agentic Markets: Equilibrium Effects of Improving Consumer Search

36

One Model, Two Markets: Bid-Aware Generative Recommendation

37

How Well Do LLMs Predict Human Behavior? A Measure of their Pretrained Knowledge

38

Learning to Reason with Curriculum I: Provable Benefits of Autocurriculum

39

Agentic AI and the next intelligence explosion

40

Understanding Behavior Cloning with Action Quantization

41

HyperAgents: : Open-Ended Metacognitive Self-Improvement for Any Computable Task

42

Harness design for long-running application development \ Anthropic

43

Reasonably reasoning AI agents can avoid game-theoretic failures in zero-shot, provably

44

How Log-Barrier Helps Exploration in Policy Optimization

45

The Finetuner’s Fallacy: When to Pretrain with Your Finetuning Data

46

TURNWISE: The Gap between Single- and Multi-turn Language Model Capabilities

47

Temporal Straightening for Latent Planning

48

Fine-Tuning Strategies for Preserving In-Context Learning in Linear Attention

49

LLMs Can Learn to Reason Via Off-Policy RL

50

Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning

51

Provable and practical in-context policy optimization for self-improvement

52

Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models

53

Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights

54

AdaEvolve: Adaptive LLM Driven Zeroth-Order Optimization

55

∇−reasoner: LLM reasoning via test-time gradient descent in latent space

56

Inference for Regression with Variables Generated by AI or Machine Learning

57

Fast KV Compaction via Attention Matching

58

Position: stop anthropomorphizing intermediate tokens as reasoning/thinking traces!

59

Code World Models for General Game Playing

60

Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought

61

Task Descriptors Help Transformers Learn Linear Models In-Context

62

Equivalence of Context and Parameter Updates in Modern Transformer Blocks

63

Learning without training: The implicit dynamics of in-context learning

64

Causal Identification from Counterfactual Data: Completeness and Bounding Results

65

Is Cosine-Similarity of Embeddings Really About Similarity?

66

Diffusion LLMs are Natural Adversaries for any LLM

67

Are you going to finish that? A Practical Study of the Partial Token Problem

68

Language Models Struggle to Use Representations Learned In-Context

69

LLMs are Bayesian, In Expectation, Not in Realization

70

Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs

71

LLMs Can Learn to Reason Via Off-Policy RL

72

Test-Time Training with KV Binding Is Secretly Linear Attention

73

Unified Latents (UL): How to train your latents

74

Spectral Bellman Method: Unifying RL Representation and Exploration

75

Prescriptive Scaling Reveals the Evolution of Language Model Capabilities

76

Experiential Reinforcement Learning

77

Learning Personalized Agents from Human Feedback

78

Learning to summarize user information for personalized RLHF

79

Intrinsic Credit Assignment for Long Horizon Interaction

80

Learning to Continually Learn via Meta-learning Agentic Memory Designs

81

Why Self-Rewarding Works: Theoretical Guarantees for Iterative Alignment of Language Models

82

PAD: Personalized Alignment of LLMs at Decoding-Time

83

The Reward Model Selection Crisis in Personalized Alignment

84

Causal-JEPA: Learning World Models through Object-Level Latent Interventions

85

How Sampling Shapes LLM Alignment: From One-Shot Optima to Iterative Dynamics

86

Deriving neural scaling laws from the statistics of natural language

87

Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL

88

Scaling In-Context Online Learning Capability of LLMs via Cross-Episode Meta-RL

89

Divide-and-Conquer CoT: RL for Reducing Latency via Parallel Reasoning

90

Owning the AI Pareto Frontier — Jeff Dean

91

Learning to Reason in 13 Parameters

92

Nearly Optimal Active Preference Learning and Its Application to LLM Alignment

93

Language Model Circuits Are Sparse in the Neuron Basis

94

Rethinking the Trust Region in LLM Reinforcement Learning

95

Principled Fine-tuning of LLMs from User-Edits: A Medley of Preference, Supervision, and Reward

96

Self-distillation enables continual learning

97

Maximum Likelihood Reinforcement Learning

98

In-Context Algorithm Emulation in Fixed-Weight Transformers

99

PPI-SVRG: Unifying Prediction-Powered Inference and Variance Reduction for Semi-Supervised Optimization

100

When Models Don’t Collapse: On the Consistency of Iterative MLE

101

An orthogonal learner for individualized outcomes In markov decision processes

102

Shaping capabilities with token-level data filtering

103

Self-Improving Pretraining: using post-trained models to pretrain better models

104

Success Conditioning as Policy Improvement: The Optimization Problem Solved by Imitating Success

105

Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning

106

GameTalk: Training LLMs for Strategic Multi-Turn Conversation

107

Reinforcement Learning via Self-Distillation

108

Self-Supervised Contrastive Learning is Approximately Supervised Contrastive Learning

109

On the alignment between supervised and self-supervised contrastive learning

110

Rethinking the value of multi-agent work-flow: a strong single agent baseline

111

Greedy Sampling Is Provably Efficient for RLHF

112

A Generalization Theory for Zero-Shot Prediction

113

Learning to Discover at Test Time

114

How Does the Pretraining Distribution Shape In-Context Learning? Task Selection, Generalization, and Robustness

115

Highlighting What Matters: Promptable Embeddings for Attribute-Focused Retrieval

116

Activation Reward Models for Few-Shot Model Alignment

117

Reward is enough: LLMs are in-context reinforcement learners

118

Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO

119

The End of Reward Engineering: How LLMs Are Redefining Multi-Agent Coordination

120

PRL: Process Reward Learning Improves LLMs’ Reasoning Ability and Broadens the Reasoning Boundary

121

Coverage Improvement and Fast Convergence of On-policy Preference Learning

122

Stagewise Reinforcement Learning and the Geometry of the Regret Landscape

123

Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

124

Learning Latent Action World Models In The Wild

125

From Unstructured Data to Demand Counterfactuals: Theory and Practice

126

In-context reinforcement learning through bayesian fusion of context and value prior

127

Digital RedQueen: Adversarial Program Evolution in Core War with LLMs

128

Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings

129

Representation-Based Exploration for Language Models: from test-time to post-training

130

NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation

131

RelayLLM: Efficient Reasoning via Collaborative Decoding

132

A Unified Definition of Hallucination, Or: It’s the World Model, Stupid

133

Deep sequence models tend to memorize geometrically; it is unclear why.

134

From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence

135

Diffusion Language Models are Provably Optimal Parallel Samplers

136

Universal Reasoning Model

137

Recursive language models

138

Adapting fast and slow: transportable circuits for few shot learning

139

Position: Probabilistic Modelling is Sufficient for Causal Inference

140

End-to-End Test-Time Training for Long Context

141

Parallel Token Generation for Language Models

142

Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning

143

Activation oracles: training and evaluating llms as general-purpose activation explainers

144

Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

145

Joint-Embedding vs Reconstruction: Provable Benefits of Latent Space Prediction

146

Monitoring Monitorability/ OpenAI

147

Detailed Balance in Large Language Model-Driven Agents

148

Learning to reason in LLMs by expectation maximization

149

Exploratory Causal Inference in SAEnce

150

Detailed balance in large language model-driven agents

151

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

152

Adaptation of Agentic AI

153

Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning

154

Let’s (not) just put things in Context: Test-Time Training for Long-Context LLMs

155

TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models

156

What’s In My Human Feedback? Learning Interpretable Descriptions of Preference Data

157

Bolmo: Byteifying the Next Generation of Language Models

158

What happened with sparse autoencoders?

159

What Matters Right Now in Mechanistic Interpretability

160

CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning

161

Self-Improving AI and Human Co-Improvement for Safer Co-Superintelligence

162

Towards a Science of Scaling Agent Systems / Google Deepmind

163

Emergent hierarchical reasoning in LLMs through reinforcement learning

164

AI revolution finally comes to Relational foundational models for structured data

165

REFRAG: Rethinking RAG based Decoding

166

Provable Long-Range Benefits of Next-Token Prediction

167

Jeff Dean on TPUs, AI Research, and Funding

168

Latent Debate: surrogate framework for Interpreting LLM Thinking

169

Distribution-calibrated inference time compute for thinking llm-as-a-judge

170

Principled RL for diffusion LLMs emerges from sequence level perspective

171

Algorithmic Thinking Theory

172

On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

173

Natural language actor-critic: Scalable off-policy learning in language space

174

Beyond the Transformer: Titans, MIRAS, and the Future of Infinite Context

175

On the Limits of Test-Time Compute: Sequential Reward Filtering for Better Inference

176

The Universal Weight Subspace Hypothesis

177

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

178

Benchmarking In-context Experiential Learning Through Repeated Product Recommendations

179

Training LLMs for Honesty via Confessions

180

STOIC REASONER: Dual-Mode Transformers that Compress to Think and Decompress to Speak

181

E-GEO: A Testbed for Generative Engine Optimization in E-Commerce

182

1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities

183

Treatment Effect Estimation for Optimal Decision-Making

184

Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems

185

Debugging misaligned completions with sparse-autoencoder latent attribution

186

Building Effective AI Agents \ Anthropic

187

How to Correctly Report LLM-as-a-Judge Evaluations

188

In-Context Learning with Hypothesis-Class Guidance

189

Selecting Belief-State Approximations in Simulators with Latent States

190

Latent Collaboration in Multi-Agent Systems

191

CausalPFN: Amortized Causal Effect Estimation via In-Context Learning

192

DELTA: How Does RL Unlock and Transfer New Algorithms in LLMs?

193

Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing

194

Prompted Policy Search: Reinforcement Learning through Linguistic and Numerical Reasoning in LLMs

195

Ilya Sutskever – We're moving from the age of scaling to the age of research

196

Cognitive Foundations for Reasoning and Their Manifestation in LLMs

197

Natural emergent misalignment from reward hacking in production RL

198

Evolution Strategies at the Hyperscale

199

The Path Not Taken: RLVR Provably Learns Off the Principals

200

Back to Basics: Let Denoising Generative Models Denoise

201

LLM Prompt Duel Optimizer: Efficient Label-Free Prompt Optimization

202

Black-Box On-Policy Distillation of Large Language Models

203

Solving a million step LLM task with zero errors

204

Not All Thoughts Matter: Selective Attention for Efficient Reasoning

205

Sample-Efficient Parametric Learning from Natural Language

206

Bayesian Optimization in Language space: An Eval-Efficient AI Self-Improvement Framework

207

Context Engineering: Sessions, Memory

208

The Era of Agentic Organization: Learning to Organize with Language Models

209

Understanding neural networks through sparse circuits

210

Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning

211

Multi-Agent Evolve: LLM Self-Improvement Through Co-Evolution

212

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

213

PREFDISCO: Evaluating Proactive Personalization through Interactive Preference Discovery

214

Reusing pre-training data at test time is a compute multiplier

215

Scaling Agent Learning via Experience Synthesis

216

Continuous Autoregressive Language Models

217

Toward a Theory of Agents as Tool-Use Decision-Makers

218

Nested Learning: The Illusion of Deep Learning Architectures

219

GST-UNet: A Neural Framework for Spatiotemporal Causal Inference with Time-Varying Confounding

220

Beyond a million tokens: benchmarking and enhancing long-term memory in llms

221

Agentic Economic Modeling

222

Emergent Introspective Awareness in Large Language Models

223

Can Large reasoning models self-train?

224

ALITA-G: Self-Evolving Generative Agent for Agent Generation

225

Self-improving LLM agents at test-time

226

Offline RL by Reward-Weighted Fine-Tuning for Conversation Optimization

227

Language models are injective and hence invertible

228

ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory

229

RLAD: Training LLMs to Discover Abstractions

230

How to Train Your Advisor: Steering Black-Box LLMs with ADVISOR MODELS

231

Self-improving LLM agents at Test-Time

232

KL-Regularized Reinforcement Learning is designed to Mode Collapse

233

How do LLMs use their depth?

234

Thought Communication in Multiagent Collaboration

235

Reasoning with Sampling: Base Models Outperform RL

236

Continual Learning via Sparse Memory Finetuning

237

Direct Preference Optimization with Unobserved Preference Heterogeneity: The Necessity of Ternary Preferences

238

The Coverage Principle: How Pre-Training Enables Post-Training

239

The Era of Real-World Human Interaction: RL from User Conversations

240

Agent Learning via Early Experience

241

Demystifying the Mechanisms Behind Emergent Exploration in Goal-conditioned RL

242

Rewriting History: A Recipe for Interventional Analyses to Study Data Effects on Model Behavior

243

A Definition of AGI

244

Provably Learning from Language Feedback

245

In-Context Learning for Pure Exploration

246

On the Role of Preference Variance in Preference Optimization

247

Training LLM Agents to Empower Humans

248

Richard Sutton Declares LLMs a Dead End

249

Demystifying Reinforcement Learning in Agentic Reasoning

250

Emergent coordination in multi-agent language models

251

Learning-to-measure: in-context active feature acquisition

252

Andrej Karpathy's insights: AGI, Intelligence, and Evolution

253

Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data

254

Representation-Based Exploration for Language Models: From Test-Time to Post-Training

255

The attacker moves second: stronger adaptive attacks bypass defenses against LLM jail- Breaks and prompt injections

256

When can in-context learning generalize out of task distribution?

257

The Art of Scaling Reinforcement Learning Compute for LLMs

258

A small number of samples can poison LLMs of any size

259

Dual Goal Representations

260

Welcome to the Era of Experience

261

Value Flows: Flow-Based Distributional Reinforcement Learning

262

Self-Adapting Language Models

263

The Markovian Thinker

264

Moloch’s Bargain: emergent misalignment when LLMs compete for audiences

265

Transformer Predictor Dynamics and Task Diversity

266

Base models know how to reason, thinking models learn when

267

Spectrum tuning: Post-training for distributional coverage and in-context steerability

268

Understanding Prompt Tuning and In-Context Learning via Meta-Learning

269

MLPs Learn In-Context on Regression and Classification tasks

270

Is Pre-Training Truly Better than Meta-Learning?

271

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

272

Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs

273

Learning dynamics of LLM finetuning

274

Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF

275

OpenAI Agent Builder and n8n: Orchestrating Reasoning Versus Automating Process

276

Training Agents Inside of Scalable World Models

277

Small Language Models are the Future of Agentic AI

278

Activation Steering in Generative Settings via Contrastive Causal Mediation Analysis

279

Eliciting Secret Knowledge from Language Models

280

Temporal difference flow

281

Personalized reasoning: just-in-time personalization and why LLMs fail at it

282

Prompt Curriculum Learning for Efficient LLM Post-Training

283

Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning

284

Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward

285

Learning to summarize user information for personalized reinforcement learning from human feedback

286

Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF

287

LIMI: Less is More for Agency

288

LoRA Without Regret

289

Actor-Critic without Actor: Critic-Guided Denoising for RL

290

DELTA-Code: How Does RL Unlock and Transfer New Programming Algorithms in LLMs?

291

Linear Transformers Implicitly Discover Unified Numerical Algorithms

292

Regularizing Extrapolation in Causal Inference

293

DoubleGen - Debiased Generative Modeling of Counterfactuals

294

What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT

295

Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision

296

Learning without training: The implicit dynamics of in-context learning

297

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model

298

Open Problems in Mechanistic Interpretability

299

Maestro: Joint Graph & Config Optimization for Reliable AI Agents

300

Thought Anchors: Which LLM Reasoning Steps Matter?

301

RL's Razor: Why Online RL Forgets Less

302

Why Language Models Hallucinate

303

ALFA: Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning

304

Sample Efficient Preference Alignment in LLMs via Active Exploration

305

Adventures in Demand Analysis Using AI

306

Memento: Fine-tuning LLM Agents without Fine-tuning LLMs

307

On the Theoretical Limitations of Embedding-Based Retrieval

308

Performance Prediction for Large Systems via Text-to-Text Regression

309

Demystifying the Visual Quality Paradox in Multimodal Large Language Models

310

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

311

Compute-Optimal Scaling for Value-Based Deep RL

312

LLM-based Conversational Recommendation Agents with Collaborative Verbalized Experience

313

Signal and Noise: Evaluating Language Model Benchmarks

314

Breaking Feedback Loops in Recommender Systems with Causal Inference

315

RAG is Dead, Context Engineering is King: Building Reliable AI Systems

316

A Survey of Personalization: From RAG to Agent

317

Facilitating the Adoption of Causal Infer-ence Methods Through LLM-Empowered Co-Pilot

318

Performance Prediction for Large Systems via Text-to-Text Regression

319

Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning

320

DINOv3: Vision Models for Self-Supervised Learning

321

Agent Lightning: Training Any AI Agents with Reinforcement Learning

322

Computational-Statistical Tradeoffs at the Next-Token Prediction Barrier

323

From Model Weights to Agent Workflows: Charting the New Frontier of Optimization in Large Language Models

324

Is Chain-of-Thought Reasoning a Mirage?

325

Agentic Web: Weaving the Next Web with AI Agents

326

The Assimilation-Accommodation Gap in LLM Intelligence

327

The Minimalist AI Kernel: A New Frontier in Reasoning

328

Statistical Rigor for Interpretable AI

329

Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value

330

A foundation model to predict and capture human cognition

331

Generative Recommendation with Semantic IDs: A Practitioner’s Handbook

332

Hierarchical Reasoning Model

333

Test-time Offline Reinforcement Learning on Goal-related Experience

334

Interpreting Chain of Thought: A Walkthrough and Discussion

335

The wall confronting large language models

336

COLLABLLM: LLMs From Passive to Collaborative

337

A decade's battle on dataset bias: are we there yet?

338

GEPA: Generative Feedback for AI System Optimization

339

From AI-Curious to AI-First: Engineering Production AI Systems

340

Context Engineering: Beyond Simple Prompting to LLM Architecture

341

Agentic Misalignment: LLMs as Insider Threats

342

Small Language Models: Future of Agentic AI

343

Learning without training: The implicit dynamics of in-context learning

344

Inverse Scaling in Test-Time Compute

345

LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra

346

Microsoft's Blueprint: AI, Quantum, and the Agentic Future

347

Zuckerberg's AI Vision Analyzed

348

Inside Claude: Scaling, Agency, and Interpretability

349

Personalized language modeling from personalized human feedback

350

Position: Empowering Time Series Reasoning with Multimodal LLMs

351

An empirical risk minimization approach for offline inverse RL and Dynamic Discrete Choice models

352

Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities

353

The Invisible Leash: Why RLVR May Not Escape Its Origin

354

Language Model Personalization via Reward Factorization

355

Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions

356

Do We Need to Verify Step by Step? Rethinking Process Supervision from a Theoretical Perspective

357

Soft Best-of-n Sampling for Model Alignment

358

On Temporal Credit Assignment and Data-Efficient Reinforcement Learning

359

Bradley–Terry and Multi-Objective Reward Modeling Are Complementary

360

Probing Foundation Models for World Models

361

GenAI-Powered Statistical Inference (with Unstructured Data)

362

Interpretable Reward Modeling with Active Concept Bottlenecks

363

PrefillOnly: An Inference Engine for Prefill-only Workloads in Large Language Model Applications

364

A Collectivist, Economic Perspective on AI

365

Textual Bayes: Quantifying Uncertainty in LLM-Based Systems

366

The Winner's Curse in Data-Driven Decisions

367

SPIRAL: Self-Play for Reasoning Through Zero-Sum Games

368

Beyond Statistical Learning: Exact Learning Is Essential for General Intelligence

369

Aligning Learning and Endogenous Decision-Making

370

Reliable Statistical Inference with Synthetic Data from Large Language Models

371

Multi-Turn Reinforcement Learning from Human Preference Feedback

372

Provably Learning from Language Feedback

373

Markets with Heterogeneous Agents: Dynamics and Survival of Bayesian vs. No-Regret Learners

374

Why Neural Network Can Discover Symbolic Structures with Gradient-based Training: An Algebraic and Geometric Foundation

375

Causal Abstraction with Lossy Representations

376

The Winner's Curse in Data-Driven Decisions

377

Embodied AI Agents: Modeling the World

378

Beyond Statistical Learning: Exact Learning Is Essential for General Intelligence

379

What Has a Foundation Model Found? Inductive Bias Reveals World Models

380

Language Bottleneck Models: A Framework for Interpretable Knowledge Tracing and Beyond

381

Learning to Explore: An In-Context Learning Approach for Pure Exploration

382

Human-AI Matching: The Limits of Algorithmic Search

383

Uncertainty Quantification Needs Reassessment for Large-language Model Agents

384

Bayesian Meta-Reasoning for Robust LLM Generalization

385

General Intelligence Requires Reward-based Pretraining

386

Deep Learning is Not So Mysterious or Different

387

AI Agents Need Authenticated Delegation

388

Probabilistic Modelling is Sufficient for Causal Inference

389

Not All Explanations for Deep Learning Phenomena Are Equally Valuable

390

e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs

391

Extrapolation by Association: Length Generalization Transfer in Transformers

392

Uncovering Causal Hierarchies in Language Model Capabilities

393

Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers

394

Improving Treatment Effect Estimation with LLM-Based Data Augmentation

395

LLM Numerical Prediction Without Auto-Regression

396

Why in-context learning models are good few-shot learners?

397

Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina∗

398

The Logic of Machines: The AI Reasoning Debate

399

Layer by Layer: Uncovering Hidden Representations in Language Models

400

Causal Attribution Analysis for Continuous Outcomes

401

Training a Generally Curious Agent

402

Estimation of Treatment Effects Under Nonstationarity via Truncated Difference-in-Q’s

403

Strategy Coopetition Explains the Emergence and Transience of In-Context Learning

404

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

405

Agentic Supernet for Multi-agent Architecture Search

406

Sample Complexity and Representation Ability of Test-time Scaling Paradigms

407

Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators

408

LLMs Get Lost In Multi-Turn Conversation

409

PromptPex: Automatic Test Generation for Prompts

410

General Agents Need World Models

411

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models

412

Decisions With Algorithms

413

Adapting, fast and slow: Causal Approach to Few-Shot Sequence Learning

414

Conformal Arbitrage for LLM Objective Balancing

415

Simulation-Based Inference for Adaptive Experiments

416

Agents as Tool-Use Decision-Makers

417

Quantitative Judges for Large Language Models

418

Self-Challenging Language Model Agents

419

Learning to Explore: An In-Context Learning Approach for Pure Exploration

420

How Bidirectionality Helps Language Models Learn Better via Dynamic Bottleneck Estimation

421

A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models

422

Simplifying Bayesian Optimization Via In-Context Direct Optimum Sampling

423

Bayesian Teaching Enables Probabilistic Reasoning in Large Language Models

424

IPO: Interpretable Prompt Optimization for Vision-Language Models

425

Evolutionary Prompt Optimization discovers emergent multimodal reasoning strategies

426

Evaluating the Unseen Capabilities: How Many Theorems Do LLMs Know?

427

Diffusion Guidance Is a Controllable Policy Improvement Operator

428

Alita: Generalist Agent With Self-Evolution

429

A Snapshot of Influence: A Local Data Attribution Framework for Online Reinforcement Learning

430

Learning Compositional Functions with Transformers from Easy-to-Hard Data

431

Preference Learning with Response Time

432

Accelerating RL for LLM Reasoning with Optimal Advantage Regression

433

Algorithms for reliable decision-making need causal reasoning

434

Belief Attribution as Mental Explanation: The Role of Accuracy, Informativity, and Causality

435

Distances for Markov chains from sample streams

436

When and Why LLMs Fail to Reason Globally

437

IDA-Bench: Evaluating LLMs on Interactive Guided Data Analysis

438

No Free Lunch: Non-Asymptotic Analysis of Prediction-Powered Inference

439

Accelerating RL for LLM Reasoning with Optimal Advantage Regression

440

Statistical Inference for Online Algorithms

441

Prismatic Synthesis for Diverse LLM Reasoning Data

442

Position: Uncertainty Quantification Needs Reassessment for Large-language Model Agents

443

The Agentic Economy

444

Statistics for Large Language Models

445

Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search

446

Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning

447

Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL

448

Value-Guided Search for Efficient Chain-of-Thought Reasoning

449

Shallow Preference Signals: Large Language model aligns even better without truncated data?

450

Gaming Tool Preferences in Agentic LLMs

451

Partner Modelling Emerges in Recurrent Agents (But Only When It Matters)

452

LLM Populations Form Social Conventions and Collective Bias

453

LLM Generated Persona is a Promise with a Catch

454

Large Language Models for Digital Twin Simulation

455

From RL Distillation to Autonomous LLM Agents

456

Prompting, Auto-Prompting, and Human-AI Communication

457

Textual Gradients for LLM Optimization

458

Large Language Models as Markov Chains

459

Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation

460

Selective induction heads: how transformers select causal structures in context

461

The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains

462

How Transformers Learn Causal Structure with Gradient Descent

463

Planning anything with rigor: general-purpose zero-shot planning with llm-based formalized programming

464

Automated Design of Agentic Systems

465

What’s the Magic Word? A Control Theory of LLM Prompting

466

BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling

467

RL with KL penalties is better viewed as Bayesian inference

468

Asymptotics of Language Model Alignment

469

Qwen 2.5, RL, and Random Rewards

470

Theoretical guarantees on the best-of-n alignment policy

471

Score Matching Enables Causal Discovery of Nonlinear Additive Noise Models

472

Improved Techniques for Training Score-Based Generative Models

473

Your Pre-trained LLM is Secretly an Unsupervised Confidence Calibrator

474

AlphaEvolve: A coding agent for scientific and algorithmic discovery

475

Harnessing the Universal Geometry of Embeddings

476

Goal Inference using Reward-Producing Programs in a Novel Physics Environment

477

Trial-Error-Explain In-Context Learning for Personalized Text Generation

478

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

479

Test-Time Reinforcement Learning (TTRL)

480

Interpreting Emergent Planning in Model-Free Reinforcement Learning

481

Agentic Reward Modeling_Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems

482

Beyond Reward Hacking: Causal Rewards for Large LanguageModel Alignment

483

Learning How Hard to Think: Input-Adaptive Allocation of LM Computation

484

Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval

485

UFT: Unifying Supervised and Reinforcement Fine-Tuning

486

Understanding High-Dimensional Bayesian Optimization

487

Inference time alignment in continuous space

488

Efficient Test-Time Scaling via Self-Calibration

489

Conformal Prediction via Bayesian Quadrature

490

Predicting from Strings: Language Model Embeddings for Bayesian Optimization

491

Self-Evolving Curriculum for LLM Reasoning

492

Online Decision-Focused Learning in Dynamic Environments

493

FisherSFT: Data-Efficient Supervised Fine-Tuning of Language Models Using Information Gain

494

Reward Shaping from Confounded Offline Data

495

Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning

496

Understanding Best-of-N Language Model Alignment

497

Maximizing Acquisition Functions for Bayesian Optimization - and its relation to Gradient Descent

498

Bayesian Prompt Ensembles: Model Uncertainty Estimation for Black-Box Large Language Models

499

Prompting Strategies for Enabling Large Language Models to Infer Causation from Correlation

500

The Parallel Knowledge Gradient Method for Batch Bayesian Optimization

501

FunBO: Discovering Acquisition Functions for Bayesian Optimization with FunSearch

502

Automated Social Science: A Structural Causal Model-Based Approach

503

Causal Interpretation of Transformer Self-Attention

504

A Causal World Model Underlying Next Token Prediction: Exploring GPT in a Controlled Environment

505

Trace is the Next AutoDiff: Generative Optimization with Rich Feedback, Execution Traces, and LLMs

506

Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation

507

Prompts from Reinforcement Learning (PRL)

508

Logits are All We Need to Adapt Closed Models

509

Large Language Models Are (Bayesian) Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning

510

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

511

From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models

512

LLM In-Context Learning as Kernel Regression

513

Personalizing LLMs via Decode-Time Human Preference Optimization

514

Almost Surely Safe LLM Inference-Time Alignment

515

Survey of In-Context Learning Interpretation and Analysis

516

From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models

517

LLM In-Context Learning as Kernel Regression

518

Where does In-context Learning Happen in Large Language Models?

519

Auto-Differentiating Any LLM Workflow: A Farewell to Manual Prompting

520

metaTextGrad: Learning to learn with language models as optimizers

521

Semantic Operators: A Declarative Model for Rich, AI-based Data Processing

522

Isolated Causal Effects of Language

523

Sleep-time Compute: Beyond Inference Scaling at Test-time

524

J1: Incentivizing Thinking in LLM-as-a-Judge

525

ShiQ: Bringing back Bellman to LLMs

526

Policy Learning with a Natural Language Action Space: A Causal Approach

527

Multi-Objective Preference Optimization: Improving Human Alignment of Generative Models

528

End-to-End Learning for Stochastic Optimization: A Bayesian Perspective

529

TEXTGRAD: Automatic Differentiation via Text

530

Steering off Course: Reliability Challenges in Steering Language Models

531

Past-Token Prediction for Long-Context Robot Policies

532

Recovering Coherent Event Probabilities from LLM Embeddings

533

Systematic Meta-Abilities Alignment in Large Reasoning Models

534

Predictability Shapes Adaptation: An Evolutionary Perspective on Modes of Learning in Transformers

535

Efficient Exploration for LLMs

536

Rankers, Judges, and Assistants: Towards Understanding the Interplay of LLMs in Information Retrieval Evaluation

537

Bayesian Concept Bottlenecks with LLM Priors

538

Transformers for In-Context Reinforcement Learning

539

Evaluating Large Language Models Across the Lifecycle

540

Active Ranking from Human Feedback with DopeWolfe

541

Optimal Designs for Preference Elicitation

542

Dual Active Learning for Reinforcement Learning from Human Feedback

543

Active Learning for Direct Preference Optimization

544

Active Preference Optimization for RLHF

545

Test-Time Alignment of Diffusion Models without reward over-optimization

546

Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback

547

GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment

548

Advantage-Weighted Regression: Simple and Scalable Off-Policy RL

549

Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective

550

Transformers can be used for in-context linear regression in the presence of endogeneity

551

Bayesian Concept Bottlenecks with LLM Priors

552

In-Context Parametric Inference: Point or Distribution Estimators?

553

Enough Coin Flips Can Make LLMs Act Bayesian

554

Bayesian Scaling Laws for In-Context Learning

555

Posterior Mean Matching Generative Modeling

556

Can Generative AI Solve Your In-Context Learning Problem? A Martingale Perspective

557

Dynamic Search for Inference-Time Alignment in Diffusion Models

558

Is In-Context Learning in Large Language Models Bayesian? A Martingale Perspective

559

Leaked Claude Sonnet 3.7 System Instruction tuning

560

Converging Predictions with Shared Information

561

Test-Time Alignment Via Hypothesis Reweighting

562

Rethinking Diverse Human Preference Learning through Principal Component Analysis

563

Active Statistical Inference

564

Data Mixture Optimization: A Multi-fidelity Multi-scale Bayesian Framework

565

AI-Powered Bayesian Inference

566

Can Unconfident LLM Annotations Be Used for Confident Conclusions?

567

Predictions as Surrogates: Revisiting Surrogate Outcomes in the Age of AI

568

Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control

569

How to Evaluate Reward Models for RLHF

570

LLMs as Judges: Survey of Evaluation Methods

571

The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs

572

Limits to scalable evaluation at the frontier: LLM as Judge won’t beat twice the data

573

Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation

574

Accelerating Unbiased LLM Evaluation via Synthetic Feedback

575

Prediction-Powered Statistical Inference Framework

576

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

577

RM-R1: Reward Modeling as Reasoning

578

Reexamining the Aleatoric and Epistemic Uncertainty Dichotomy

579

Decoding Claude Code: Terminal Agent for Developers

580

Emergent Strategic AI Equilibrium from Pre-trained Reasoning

581

Benefiting from Proprietary Data with Siloed Training

582

Advantage Alignment Algorithms

583

Asymptotic Safety Guarantees Based On Scalable Oversight

584

What Makes a Reward Model a Good Teacher? An Optimization Perspective

585

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

586

Identifiable Steering via Sparse Autoencoding of Multi-Concept Shifts

587

You Are What You Eat - AI Alignment Requires Understanding How Data Shapes Structure and Generalisation

588

Interplay of LLMs in Information Retrieval Evaluation

589

Trade-Offs Between Tasks Induced by Capacity Constraints Bound the Scope of Intelligence

590

Toward Efficient Exploration by Large Language Model Agents

591

Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT

592

Self-Consuming Generative Models with Curated Data

593

Bootstrapping Language Models with DPO Implicit Rewards

594

DeepSeek-Prover-V2: Advancing Formal Reasoning

595

THINKPRM: Data-Efficient Process Reward Models

596

Societal Frameworks and LLM Alignment

597

Risks from Multi-Agent Advanced AI

598

Causality-Aware Alignment for Large Language Model Debiasing

599

Reward Models Evaluate Consistency, Not Causality

600

Causal Rewards for Large Language Model Alignment

601

Sycophancy to subterfuge: Investigating reward-tampering in large language models

602

Bidirectional AI Alignment

603

Why Do Multi-Agent LLM Systems Fail?

604

LLMs as Greedy Agents: RL Fine-tuning for Decision-Making

605

LLM Feedback Loops and the Lock-in Hypothesis

606

Representational Alignment Drives Effective Teaching and Learning

607

Adaptive Parallel Reasoning with Language Models

608

AI: Rewiring the Flow of Ideas and Human Knowledge

609

Learning and Equilibrium with Ranking Feedback

610

Designing Human-AI Collaboration: A Sufficient-Statistic Approach

611

GOAT: Generative Adversarial Training for Human-AI Coordination

612

π0.5: Generalization in Robotic Manipulation via Diverse Data

613

NoWag: Unified Compression for Large Language Models

614

Optimal Tool Calls in Language Model Reasoning

615

Data Selection for Empirical Risk Minimization

616

LoRe: Low-Rank Reward Modeling for Personalized LLMs

617

ParaPO: Reducing Language Model Verbatim Reproduction

618

Test-Time RL: Self-Evolving LLMs via Majority Voting Rewards

619

Tina: Tiny LoRA Reasoning Models

620

Evaluating large language models in theory of mind tasks

621

QUEST: Quality Sampling for Machine Translation

622

Offline Preference Learning via Simulated Trajectory Feedback

623

Reasoning Elicitation in Language Models via Counterfactual Feedback

624

Eliciting Human Preferences with Language Models

625

Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning

626

γ-Bench: Evaluating LLMs in Multi-Agent Games

627

DRAFT: Self-Driven LLM Tool Mastery via Documentation Refinement

628

Optimal Prediction Sets for Enhanced Human-AI Accuracy

629

Self-Correction via Reinforcement Learning for Language Models

630

Tractable Multi-Agent Reinforcement Learning through Behavioral Economics

631

Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement

632

Iterative Nash Policy Optimization for Language Model Alignment

633

SycEval: Benchmarking LLM Sycophancy in Mathematics and Medicine

634

Stack AI: Democratizing Enterprise AI Development

635

Evaluating Modern Recommender Systems: Challenges and Future Directions

636

AI in the Enterprise: Seven Lessons from Frontier Companies by OpenAI

637

Discussion: Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

638

AI Agent Protocols and Human Preference

639

Cross-Environment Cooperation for Zero-Shot Multi-Agent Coordination

640

Sutton and Silver: The Era of Experience: Learning Beyond Human Data

641

Sample, Don't Search: Rethinking Test-Time Alignment for Language Models

642

AI Agents: Echoes of Past Technology Pivots?

643

Minimalist LLM Reasoning: Rejection Sampling to Reinforcement

644

Securing the Model Context Protocol in Enterprise Environments

645

Improving Multi-Turn Tool Use with Reinforcement Learning

646

Cultural Knowledge Conservation and Control in Large Language Models

647

Data Quality, Repetition, and Scaling of Language Models

648

Compute-Optimal Scaling Laws for Language Models Revisited

649

Concise Reasoning via Reinforcement Learning

650

Throughput Limits for LLM Inference and AI Agent Scheduling

651

RL Post-training Amplifies Pretraining Behaviors in Language Models

652

Fast Adaptation of Behavioral Foundation Models

653

Proprietary Reward Models: Sustaining Advantage in Agentic AI

654

Why Multi-Agent LLM Systems Fail: A Comprehensive Study

655

Play2Prompt: Zero-Shot Tool Instruction Optimization via Tool Play

656

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

657

API and GUI Agents: Divergence, Convergence, and Hybrid Approaches

658

AI, Chess, and Competitive Advantage: Substitution and Complementation

659

Knowledge of the Firm and Replication of Technology

660

Firm Resources and Sustained Competitive Advantage

661

Evaluating Pharmaceutical Marketing to Physicians with Panel Data

662

Theory of the firm in the era of Agents

663

Large Language Models: An Applied Econometric Framework

664

Evaluating the World Model Implicit in a Generative Model

665

Machine Learning for Hypothesis Generation in Social Science

666

Active Learning for Moral Preference Elicitation: Challenges and Nuances

667

Gradient-Based Surveys for Nonparametric Discrete Choice Experiments

668

Explainable Data-driven Share-of-choice Product Line Design Optimization

669

The More You Ask, the Less You Get: When Additional Questions Hurt External Validity

670

Conjoint topics from Handbook of Marketing Analytics: Methods and Applications

671

Choice-Based Conjoint Analysis: Methods and Applications

672

Beyond Conjoint Analysis: The Future of Preference Measurement

673

An Optimization Framework for Adaptive Questionnaire Design

674

Adaptive Self-Explication of Multiattribute Preferences

675

Conjoint Analysis: Methods, Applications, and Recent Developments

676

Current Issues and a “Wish List” for Conjoint Analysis

677

Ellipsoidal Methods for Adaptive Choice-Based Conjoint Analysis

678

Adaptive Polyhedral Methods for Conjoint Analysis

679

MSL: Enhancing LLM Recommenders via Masked Softmax Loss

680

Self-Supervised Deep Reinforcement Learning for Optimal Question Ranking

681

Adaptive Language Elicitation for Latent Information Discovery

682

LLM Persona Bias: Promise and Peril in Simulation

683

AutoTools: Automating Tool Use for Large Language Models

684

Tool Learning with Large Language Models: A Comprehensive Survey

685

All Roads Lead to Likelihood: RL for Fine-Tuning Value

686

ATLAS: Tuning Agents via Critical Step Learning

687

Thinking Faster by Writing Less: Chain of Draft Reasoning

688

Meta Plan Optimization for Boosting LLM Agents

689

L1: Length Controlled Reasoning with Reinforcement Learning

690

WikiBigEdit: Benchmarking Lifelong Knowledge Editing in LLMs

691

PLAN-AND-ACT: LLM Agent Planning with Synthetic Data

692

SEARCH-R1: LLMs Learn to Reason and Search via Reinforcement Learning

693

The Theory of the Firm: Information, Incentives, and Organization

694

Four Formalizable Theories of the Firm

695

Efficient Tool Use with Chain-of-Abstraction Reasoning

696

CodeTool: Process Supervision for Enhanced LLM Tool Invocation

697

Evaluating LLM Agents in Multi-Turn Conversations: A Survey

698

Epistemic Alignment in User-LLM Knowledge Delivery

699

MCP is (not) all you need

700

AI, Human Skills, and Competitive Advantage in Chess

701

Inference-Time Scaling for Generalist Reward Modeling

702

Optimal Pure Exploration in Linear Bandits via Sampling

703

Presidential Address: The Economist as Designer in the Innovation Process for Socially Impactful Digital Products

704

Emergent Symbolic Mechanisms for Reasoning in Large Language Models

705

Inference-Time Alignment: Coverage, Scaling, and Optimality

706

Sharpe Ratio-Guided Active Learning for Preference Optimization

707

Active Learning for Adaptive In-Context Prompt Design

708

Visual Chain-of-Thought Reasoning for Vision-Language-Action Models

709

On the Biology of a Large Language Model

710

Async-TB: Asynchronous Trajectory Balance for Scalable LLM RL

711

Instacart's Economics Team: A Hybrid Role in Tech

712

Data Mixture Optimization: A Multi-fidelity Multi-scale Bayesian Framework

713

Why MCP won

714

SWEET-RL: Training LLM Agents for Collaborative Reasoning

715

TheoryCoder: Bilevel Planning with Synthesized World Models

716

Driving Forces in AI: Scaling to 2025 and Beyond (Jason Wei, OpenAI)

717

Expert Demonstrations for Sequential Decision Making under Heterogeneity

718

TextGrad: Backpropagating Language Model Feedback for Generative AI Optimization

719

MemReasoner: Generalizing Language Models on Reasoning-in-a-Haystack Tasks

720

RAFT: In-Domain Retrieval-Augmented Fine-Tuning for Language Models

721

Inductive Biases for Exchangeable Sequence Modeling

722

InverseRLignment: LLM Alignment via Inverse Reinforcement Learning

723

Prompt-OIRL: Offline Inverse RL for Query-Dependent Prompting

724

Alignment from Demonstrations for Large Language Models

725

Q♯: Distributional RL for Optimal LLM Post-Training

726

Scaling Test-Time Compute Without Verification or RL is Suboptimal

727

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

728

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

729

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

730

Revisiting Superficial Alignment Hypothesis

731

Diagnostic uncertainty: teaching language Models to describe open-ended uncertainty

732

Language Model Personalization via Reward Factorization

733

How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach

734

Can Large Language Models Extract Customer Needs as well as Professional Analysts?

735

Spurlens: finding spurious correlations in Multimodal llms

736

Improving test-time search with backtrack- Ing Improving test-time search with backtrack- Ing against in-context value verifiersagainst in-context value verifiers

737

Adaptive elicitation of latent information Using natural language

738

Document Valuation in LLM Summaries: A Cluster Shapley Approach

739

s1: simple test time scaling