PodParley - Discover, Search, and Explore Podcasts

1

VideoChat3: Fully Open Video MLLM for Efficient and Generalist Video Understanding

Jul 18, 2026

24:41

2

SEED: Self-Evolving On-Policy Distillation for Agentic Reinforcement Learning

Jul 18, 2026

19:16

3

LongStraw: Long-Context RL Beyond 2M Tokens under a Fixed GPU Budget

Jul 18, 2026

19:36

4

SearchOS-V1: Towards Robust Open-Domain Information-Seeking Agent Collaboration

Jul 18, 2026

15:57

5

BadWAM: When World-Action Models Dream Right but Act Wrong

Jul 18, 2026

21:03

6

KeyFrame-Compass: Towards Comprehensive Evaluation of Keyframe-Conditioned Video Generation

Jul 18, 2026

22:20

7

MultiRef-Compass: Towards Comprehensive Evaluation of Multi-Reference-to-Audio-Video Generation

Jul 18, 2026

20:51

8

Concurrent Image Understanding and Generation: Self-Correcting Coupled Markov Jump Processes

Jul 18, 2026

21:40

9

From Pixels to States: Rethinking Interactive World Models as Game Engines

Jul 18, 2026

17:56

10

UniVR: Thinking in Visual Space for Unified Visual Reasoning

Jul 18, 2026

18:17

11

Harness Handbook: Making Evolving Agent Harnesses Readable,Navigable, and Editable

Jul 17, 2026

19:12

12

Boogu-Image-0.1: Boosting Open-Source Unified Multimodal Understanding and Generation

Jul 17, 2026

19:28

13

Ring-Zero: Scaling Zero RL to a Trillion Parameters for Emergent Reasoning

Jul 17, 2026

20:49

14

KnowAct-GUIClaw: Know Deeply, Act Perfectly, Personal GUI Assistant with Self-Evolving Memory and Skill

Jul 17, 2026

22:16

15

OvisOCR2 Technical Report

Jul 17, 2026

20:46

16

PolicyShiftGuard: Benchmarking and Improving Policy-Adaptive Image Guardrails

Jul 17, 2026

19:42

17

MetaView: Monocular Novel View Synthesis with Scale-Aware Implicit Geometry Priors

Jul 17, 2026

18:32

18

GigaWorld-Policy-0.5: A Faster and Stronger WAM Empowered by AutoResearch

Jul 17, 2026

21:13

19

SynthDocBench: Controlled Benchmark for Long-Context Visual Document Understanding

Jul 16, 2026

20:59

20

Read It Back: Pretrained MLLMs Are Zero-Shot Reward Models for Text-to-Image Generation

Jul 16, 2026

19:17

21

Search Beyond What Can Be Taught: Evolving the Knowledge Boundary in Agentic Visual Generation

Jul 16, 2026

19:24

22

Blind-Spots-Bench: Evaluating Blind Spots in Multimodal Models

Jul 16, 2026

19:39

23

Weak-to-Strong Generalization via Direct On-Policy Distillation

Jul 15, 2026

19:42

24

ABot-AgentOS: A General Robotic Agent OS with Lifelong Multi-modal Memory

Jul 15, 2026

22:18

25

4D Human-Scene Reconstruction from Low-Overlap Captures

Jul 15, 2026

19:56

26

LightMem-Ego: Your AI Memory for Everyday Life

Jul 15, 2026

18:37

27

Long-Horizon-Terminal-Bench: Testing the Limits of Agents on Long-Horizon Terminal Tasks with Dense Reward-Based Grading

Jul 14, 2026

20:24

28

Scalable Visual Pretraining for Language Intelligence

Jul 14, 2026

18:18

29

Video Generation Models are General-Purpose Vision Learners

Jul 14, 2026

20:24

30

Vidu S1: A Real-Time Interactive Video Generation Model

Jul 11, 2026

22:47

31

Why Can't I Open My Drawer? Mitigating Object-Driven Shortcuts in Zero-Shot Compositional Action Recognition

Jul 11, 2026

24:53

32

UniClawBench: A Universal Benchmark for Proactive Agents on Real-World Tasks

Jul 11, 2026

26:22

33

Ideas Have Genomes: Benchmarking Scientific Lineage Reasoning and Lineage-Grounded Idea Generation

Jul 11, 2026

25:43

34

Accurate, Interdisciplinary and Transparent Structure-property Understanding with Deep Native Structural Reasoning

Jul 10, 2026

26:31

35

Dual Latent Memory in Vision-Language-Action Models for Robotic Manipulation

Jul 10, 2026

25:00

36

Scaling Mixture-of-Experts Video Pretraining for Embodied Intelligence

Jul 10, 2026

21:54

37

Infinite Worlds with Versatile Interactions

Jul 10, 2026

19:38

38

RynnWorld-4D: 4D Embodied World Models for Robotic Manipulation

Jul 9, 2026

25:16

39

RynnWorld-Teleop: An Action-Conditioned World Model for Digital Teleoperation

Jul 9, 2026

21:41

40

Hierarchical Sparse Attention Done Right: Toward Infinite Context Modeling

Jul 9, 2026

20:57

41

Vision as Unified Multimodal Generation

Jul 9, 2026

26:18

42

Gemma 4 Technical Report

Jul 9, 2026

23:51

43

OmniOpt: Taxonomy, Geometry, and Benchmarking of Modern Optimizers

Jul 8, 2026

20:44

44

UI-MOPD: Multi-Platform On-Policy Distillation for Continual GUI Agent Learning

Jul 8, 2026

21:41

45

ResearchStudio-Reel: Automate the Last Mile of Research from Paper to Poster, Video, and Blog

Jul 8, 2026

23:37

46

PixWorld: Unifying 3D Scene Generation and Reconstruction in Pixel Space

Jul 8, 2026

23:41

47

ResearchStudio-Idea: An Evidence-Grounded Research-Ideation Skill Suite from ML Conference Outcomes

Jul 8, 2026

27:01

48

MANCE: Manifold Aware Concept Erasure

Jul 8, 2026

22:20

49

GigaWorld-1: A Roadmap to Build World Models for Robot Policy Evaluation

Jul 8, 2026

25:49

50

Wan-Streamer v0.2: Higher Resolution, Same Latency

Jul 8, 2026

21:41

51

EVA-Client: A Unified Data Collection, Inference, and Deployment Framework for Embodied Policies on Real Robots

Jul 8, 2026

22:30

52

The Mirage of Optimizing Training Policies: Monotonic Inference Policies as the Real Objective for LLM Reinforcement Learning

Jul 7, 2026

21:25

53

Embodied.cpp: A Portable Inference Runtime of Embodied AI Models on Heterogeneous Robots

Jul 7, 2026

22:30

54

OrbitQuant: Data-Agnostic Quantization for Image and Video Diffusion Transformers

Jul 7, 2026

23:07

55

VLA-Corrector: Lightweight Detect-and-Correct Inference for Adaptive Action Horizon

Jul 7, 2026

20:29

56

DataComp-VLM: Improved Open Datasets for Vision-Language Models

Jul 7, 2026

22:35

57

Program-as-Weights: A Programming Paradigm for Fuzzy Functions

Jul 4, 2026

24:26

58

AgenticSTS: A Bounded-Memory Testbed for Long-Horizon LLM Agents

Jul 4, 2026

23:30

59

EvoPolicyGym: Evaluating Autonomous Policy Evolution in Interactive Environments

Jul 4, 2026

23:06

60

Morphing into Hybrid Attention Models

Jul 4, 2026

24:47

61

AgenticDataBench: A Comprehensive Benchmark for Data Agents

Jul 4, 2026

21:38

62

Multi-Resolution Flow Matching: Training-Free Diffusion Acceleration via Staged Sampling

Jul 4, 2026

22:29

63

PerceptionRubrics: Calibrating Multimodal Evaluation to Human Perception

Jul 3, 2026

21:58

64

Orca: The World is in Your Mind

Jul 2, 2026

24:33

65

Dockerless: Environment-Free Program Verifier for Coding Agents

Jul 2, 2026

24:12

66

DOPD: Dual On-policy Distillation

Jul 2, 2026

25:26

67

Does VLA Even Know the Basics? Measuring Commonsense and World Knowledge Retention in Vision-Language-Action Models

Jul 2, 2026

21:01

68

Scenes as Objects, Not Primitives: Instance-Structured 3D Tokenization from Unposed Views

Jul 2, 2026

23:55

69

GEAR: Guided End-to-End AutoRegression for Image Synthesis

Jul 2, 2026

26:30

70

Multi-Block Diffusion Language Models

Jul 2, 2026

23:52

71

Agentic Abstention: Do Agents Know When to Stop Instead of Act?

Jul 1, 2026

24:13

72

LiveEdit: Towards Real-Time Diffusion-Based Streaming Video Editing

Jul 1, 2026

23:18

73

Scaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B Agent

Jul 1, 2026

26:23

74

TUA-Bench: A Benchmark for General-Purpose Terminal-Use Agents

Jul 1, 2026

22:43

75

Beyond IID: How General Are Tabular Foundation Models, Really?

Jul 1, 2026

22:43

76

Trimming the Long-Tail of Visual World Modeling Evaluation

Jul 1, 2026

25:54

77

AsyncOPD: How Stale Can On-Policy Distillation Be?

Jul 1, 2026

22:38

78

Video-MME-Logical: A Controlled Diagnostic Benchmark for Video Temporal-Logical Reasoning

Jul 1, 2026

20:40

79

Bridging VideoQA and Video-Guided Agentic Tasks via Generalized Keyframe Extraction

Jul 1, 2026

22:50

80

PhysisForcing: Physics Reinforced World Simulator for Robotic Manipulation

Jun 30, 2026

21:18

81

Translation as a Bridging Action: Transferring Manipulation Skills from Humans to Robots

Jun 30, 2026

23:56

82

Qwen-Image-2.0-RL Technical Report

Jun 30, 2026

27:10

83

DanceOPD: On-Policy Generative Field Distillation

Jun 28, 2026

25:46

84

In-Context World Modeling for Robotic Control

Jun 28, 2026

23:23

85

Qwen-Image-Agent: Bridging the Context Gap in Real-World Image Generation

Jun 28, 2026

21:26

86

OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning

Jun 28, 2026

23:52

87

The Verification Horizon: No Silver Bullet for Coding Agent Rewards

Jun 28, 2026

22:55

88

ViQ: Text-Aligned Visual Quantized Representations at Any Resolution

Jun 28, 2026

25:52

89

JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting

Jun 28, 2026

20:40

90

GUI vs. CLI: Execution Bottlenecks in Screen-Only and Skill-Mediated Computer-Use Agents

Jun 28, 2026

23:49

91

Fast LeWorldModel

Jun 28, 2026

24:41

92

EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments

Jun 13, 2026

24:06

93

MiniMax Sparse Attention

Jun 13, 2026

26:02

94

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

Jun 13, 2026

22:54

95

InterleaveThinker: Reinforcing Agentic Interleaved Generation

Jun 13, 2026

21:11

96

FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents

Jun 13, 2026

23:14

97

Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?

Jun 13, 2026

20:28

98

MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling

Jun 13, 2026

23:50

99

WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces

Jun 13, 2026

20:33

100

LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories

Jun 13, 2026

23:15

101

HYDRA-X: Native Unified Multimodal Models with Holistic Visual Tokenizers

Jun 13, 2026

20:48

102

ABot-Earth 0.5: Generative 3D Earth Model

Jun 11, 2026

22:47

103

Kwai Keye-VL-2.0 Technical Report

Jun 11, 2026

25:32

104

Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution

Jun 11, 2026

22:49

105

Evolving Agents in the Dark: Retrospective Harness Optimization via Self-Preference

Jun 11, 2026

21:09

106

SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research

Jun 11, 2026

23:57

107

Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning

Jun 11, 2026

26:29

108

Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models

Jun 11, 2026

21:40

109

SCAIL-2: Unifying Controlled Character Animation with End-to-end In-Context Conditioning

Jun 11, 2026

22:01

110

Lip Forcing: Few-Step Autoregressive Diffusion for Real-time Lip Synchronization

Jun 11, 2026

24:28

111

Agents' Last Exam

Jun 10, 2026

25:00

112

SWE-Explore: Benchmarking How Coding Agents Explore Repositories

Jun 10, 2026

23:07

113

On the Geometry of On-Policy Distillation

Jun 10, 2026

26:53

114

LatentSkill: From In-Context Textual Skills to In-Weight Latent Skills for LLM Agents

Jun 10, 2026

22:02

115

Latent Spatial Memory for Video World Models

Jun 10, 2026

25:04

116

FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention

Jun 10, 2026

21:47

117

CoVEBench: Can Video Editing Models Handle Complex Instructions?

Jun 10, 2026

22:19

118

SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks

Jun 10, 2026

24:19

119

Human Psychometric Questionnaires Mischaracterize LLM Behavior

Jun 10, 2026

25:08

120

Echo-Memory: A Controlled Study of Memory in Action World Models

Jun 10, 2026

21:20

121

From Activation to Causality: Discovery of Causal Visual Representations in the Human Brain

Jun 4, 2026

23:20

122

Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

Jun 4, 2026

25:40

123

Trust Region On-Policy Distillation

Jun 4, 2026

24:05

124

KVarN: Variance-Normalized KV-Cache Quantization Mitigates Error Accumulation in Reasoning Tasks

Jun 4, 2026

22:40

125

COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation

Jun 2, 2026

21:18

126

Representation Forcing for Bottleneck-Free Unified Multimodal Models

Jun 2, 2026

24:26

127

Mellum2 Technical Report

Jun 2, 2026

21:42

128

Function2Scene: 3D Indoor Scene Layout from Functional Specifications

Jun 2, 2026

21:58

129

GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration

Jun 2, 2026

23:17

130

Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer

Jun 2, 2026

26:16

131

TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation

May 23, 2026

22:55

132

Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality?

May 23, 2026

23:53

133

DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards

May 23, 2026

21:26

134

$π$-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows

May 23, 2026

22:32

135

Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

May 23, 2026

19:18

136

ACC: Compiling Agent Trajectories for Long-Context Training

May 23, 2026

24:45

137

PhysX-Omni: Unified Simulation-Ready Physical 3D Generation for Rigid, Deformable, and Articulated Objects

May 23, 2026

23:15

138

LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning

May 23, 2026

22:03

139

Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning

May 23, 2026

22:25

140

WorldKV: Efficient World Memory with World Retrieval and Compression

May 23, 2026

22:31

141

Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining

May 22, 2026

19:31

142

Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation

May 22, 2026

23:01

143

Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos

May 22, 2026

25:01

144

IndusAgent: Reinforcing Open-Vocabulary Industrial Anomaly Detection with Agentic Tools

May 22, 2026

24:16

145

When Vision Speaks for Sound

May 21, 2026

23:01

146

Active Learners as Efficient PRP Rerankers

May 21, 2026

23:39

147

Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information

May 21, 2026

22:57

148

AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration

May 21, 2026

23:39

149

OpenComputer: Verifiable Software Worlds for Computer-Use Agents

May 21, 2026

24:42

150

GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment

May 21, 2026

24:36

151

Process Rewards with Learned Reliability

May 21, 2026

23:21

152

EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL

May 21, 2026

27:22

153

CogOmniControl: Reasoning-Driven Controllable Video Generation via Creative Intent Cognition

May 21, 2026

23:22

154

Harnessing LLM Agents with Skill Programs

May 21, 2026

22:00

155

Code as Agent Harness

May 20, 2026

25:23

156

SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution

May 20, 2026

22:51

157

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

May 20, 2026

22:25

158

Lance: Unified Multimodal Modeling by Multi-Task Synergy

May 20, 2026

23:11

159

AI for Auto-Research: Roadmap & User Guide

May 20, 2026

22:22

160

CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?

May 20, 2026

23:10

161

KVPO: ODE-Native GRPO for Autoregressive Video Alignment via KV Semantic Exploration

May 20, 2026

23:40

162

CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence

May 19, 2026

23:10

163

PhysBrain 1.0 Technical Report

May 19, 2026

25:24

164

MMSkills: Towards Multimodal Skills for General Visual Agents

May 19, 2026

22:40

165

DexJoCo: A Benchmark and Toolkit for Task-Oriented Dexterous Manipulation on MuJoCo

May 19, 2026

23:12

166

Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding

May 19, 2026

21:12

167

InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation

May 19, 2026

24:22

168

Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization

May 19, 2026

21:04

169

Nudging Beyond the Comfort Zone: Efficient Strategy-Guided Exploration for RLVR

May 19, 2026

21:07

170

Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

May 16, 2026

22:50

171

Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation

May 16, 2026

23:30

172

Self-Distilled Agentic Reinforcement Learning

May 16, 2026

24:51

173

MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models

May 16, 2026

26:57

174

SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer

May 16, 2026

22:12

175

MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory

May 16, 2026

22:50

176

Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning

May 16, 2026

23:22

177

Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems

May 16, 2026

21:52

178

STALE: Can LLM Agents Know When Their Memories Are No Longer Valid?

May 16, 2026

23:31

179

WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation

May 16, 2026

24:33

180

MinT: Managed Infrastructure for Training and Serving Millions of LLMs

May 15, 2026

23:52

181

MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image

May 15, 2026

24:13

182

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

May 15, 2026

23:30

183

Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context

May 15, 2026

23:05

184

EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents

May 15, 2026

25:19

185

Predicting Decisions of AI Agents from Limited Interaction through Text-Tabular Modeling

May 15, 2026

24:59

186

Qwen-Image-VAE-2.0 Technical Report

May 15, 2026

24:22

187

TrackCraft3R: Repurposing Video Diffusion Transformers for Dense 3D Tracking

May 15, 2026

23:26

188

Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling

May 15, 2026

23:38

189

Many-Shot CoT-ICL: Making In-Context Learning Truly Learn

May 15, 2026

24:00

190

MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents

May 14, 2026

24:27

191

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

May 14, 2026

25:32

192

$δ$-mem: Efficient Online Memory for Large Language Models

May 14, 2026

24:31

193

RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards

May 14, 2026

22:37

194

Do Enterprise Systems Need Learned World Models? The Importance of Context to Infer Dynamics

May 14, 2026

22:41

195

World Action Models: The Next Frontier in Embodied AI

May 14, 2026

24:43

196

Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenization

May 14, 2026

25:20

197

Efficient Pre-Training with Token Superposition

May 14, 2026

24:35

198

AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward

May 14, 2026

23:59

199

MCP-Cosmos: World Model-Augmented Agents for Complex Task Execution in MCP Environments

May 14, 2026

21:44

200

Qwen-Image-2.0 Technical Report

May 13, 2026

23:03

201

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

May 13, 2026

23:59

202

CollabVR: Collaborative Video Reasoning with Vision-Language and Video Generation Models

May 13, 2026

25:26

203

TMAS: Scaling Test-Time Compute via Multi-Agent Synergy

May 13, 2026

23:17

204

PaperFit: Vision-in-the-Loop Typesetting Optimization for Scientific Documents

May 13, 2026

22:55

205

Model Merging Scaling Laws in Large Language Models

May 13, 2026

21:44

206

SEIF: Self-Evolving Reinforcement Learning for Instruction Following

May 13, 2026

21:27

207

WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors

May 13, 2026

22:26

208

Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models

May 13, 2026

22:29

209

Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers

May 12, 2026

21:53

210

Flow-OPD: On-Policy Distillation for Flow Matching Models

May 12, 2026

26:15

211

HyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agents

May 12, 2026

25:21

212

Anisotropic Modality Align

May 12, 2026

22:54

213

Beyond Retrieval: A Multitask Benchmark and Model for Code Search

May 12, 2026

21:14

214

MiA-Signature: Approximating Global Activation for Long-Context Understanding

May 9, 2026

11:50

215

When to Trust Imagination: Adaptive Action Execution for World Action Models

May 9, 2026

12:11

216

Continuous-Time Distribution Matching for Few-Step Diffusion Distillation

May 9, 2026

15:16

217

Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation

May 8, 2026

22:23

218

Stream-T1: Test-Time Scaling for Streaming Video Generation

May 8, 2026

21:59

219

RLDX-1 Technical Report

May 8, 2026

23:03

220

OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents

May 8, 2026

25:07

221

HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation

May 8, 2026

23:18

222

PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World

May 8, 2026

21:38

223

ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration

May 7, 2026

24:16

224

OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories

May 7, 2026

21:54

225

Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL

May 7, 2026

21:27

226

MolmoAct2: Action Reasoning Models for Real-world Deployment

May 6, 2026

25:04

227

From Context to Skills: Can Language Models Learn from Context Skillfully?

May 6, 2026

19:07

228

UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors

May 5, 2026

22:14

229

Web2BigTable: A Bi-Level Multi-Agent LLM System for Internet-Scale Information Search and Extraction

May 5, 2026

23:47

230

Heterogeneous Scientific Foundation Model Collaboration

May 2, 2026

25:02

231

Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling

May 2, 2026

20:21

232

Co-Evolving Policy Distillation

May 2, 2026

22:49

233

ExoActor: Exocentric Video Generation as Generalizable Interactive Humanoid Control

May 2, 2026

27:23

234

Efficient Training on Multiple Consumer GPUs with RoundPipe

May 2, 2026

23:18

235

GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents

May 1, 2026

25:34

236

Large Language Models Explore by Latent Distilling

May 1, 2026

22:37

237

RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments

May 1, 2026

24:14

238

ClawGym: A Scalable Framework for Building Effective Claw Agents

May 1, 2026

26:01

239

Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models

May 1, 2026

22:25

240

Recursive Multi-Agent Systems

Apr 30, 2026

25:02

241

Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora

Apr 30, 2026

24:15

242

DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios

Apr 30, 2026

23:08

243

AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery

Apr 30, 2026

21:40

244

Meta-CoT: Enhancing Granularity and Generalization in Image Editing

Apr 30, 2026

22:01

245

Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models

Apr 30, 2026

23:20

246

World-R1: Reinforcing 3D Constraints for Text-to-Video Generation

Apr 29, 2026

21:09

247

From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company

Apr 29, 2026

25:57

248

ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning

Apr 29, 2026

22:53

249

Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation

Apr 29, 2026

22:29

250

Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms

Apr 29, 2026

29:53

251

ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents

Apr 29, 2026

22:23

252

SketchVLM: Vision language models can annotate images to explain thoughts and guide users

Apr 29, 2026

21:34

253

Video Analysis and Generation via a Semantic Progress Function

Apr 28, 2026

20:59

254

DiffNR: Diffusion-Enhanced Neural Representation Optimization for Sparse-View 3D Tomographic Reconstruction

Apr 28, 2026

25:02

255

LLM Safety From Within: Detecting Harmful Content with Internal Representations

Apr 28, 2026

23:23

256

LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics

Apr 25, 2026

26:25

257

WorldMark: A Unified Benchmark Suite for Interactive Video World Models

Apr 25, 2026

25:39

258

UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling

Apr 25, 2026

26:31

259

LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model

Apr 24, 2026

25:15

260

Near-Future Policy Optimization

Apr 24, 2026

22:20

261

DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data

Apr 24, 2026

24:27

262

OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis

Apr 24, 2026

26:56

263

DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation

Apr 24, 2026

23:40

264

Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items

Apr 23, 2026

23:57

265

CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation

Apr 23, 2026

21:16

266

AgentSPEX: An Agent SPecification and EXecution Language

Apr 23, 2026

22:39

267

AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model

Apr 23, 2026

24:31

268

TEMPO: Scaling Test-time Training for Large Reasoning Models

Apr 23, 2026

23:30

269

Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation

Apr 22, 2026

20:55

270

OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

Apr 22, 2026

26:31

271

Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence

Apr 22, 2026

24:00

272

OpenGame: Open Agentic Coding for Games

Apr 22, 2026

25:31

273

MultiWorld: Scalable Multi-Agent Multi-View Video World Models

Apr 22, 2026

21:41

274

EasyVideoR1: Easier RL for Video Understanding

Apr 22, 2026

27:05

275

Elucidating the SNR-t Bias of Diffusion Probabilistic Models

Apr 21, 2026

22:07

276

Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips

Apr 21, 2026

22:28

277

PersonaVLM: Long-Term Personalized Multimodal LLMs

Apr 21, 2026

24:57

278

Qwen3.5-Omni Technical Report

Apr 21, 2026

24:57

279

HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

Apr 18, 2026

24:06

280

RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework

Apr 18, 2026

22:11

281

DR$^{3}$-Eval: Towards Realistic and Reproducible Deep Research Evaluation

Apr 18, 2026

24:20

282

Seedance 2.0: Advancing Video Generation for World Complexity

Apr 17, 2026

27:40

283

GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents

Apr 17, 2026

26:00

284

RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time

Apr 17, 2026

24:13

285

SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments

Apr 17, 2026

24:11

286

OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language Environment Simulation

Apr 17, 2026

27:30

287

Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents

Apr 17, 2026

26:27

288

From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space

Apr 17, 2026

23:31

289

Exploration and Exploitation Errors Are Measurable for Language Model Agents

Apr 17, 2026

22:21

290

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

Apr 16, 2026

23:59

291

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Apr 16, 2026

25:07

292

Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization

Apr 16, 2026

21:10

293

SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks

Apr 16, 2026

22:33

294

Toward Autonomous Long-Horizon Engineering for ML Research

Apr 16, 2026

24:05

295

BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation

Apr 16, 2026

21:58

296

QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation

Apr 15, 2026

24:54

297

The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping

Apr 15, 2026

21:31

298

OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation

Apr 15, 2026

21:58

299

Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation

Apr 15, 2026

21:34

300

Strips as Tokens: Artist Mesh Generation with Native UV Segmentation

Apr 15, 2026

21:33

301

Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator

Apr 15, 2026

22:44

302

Pseudo-Unification: Entropy Probing Reveals Divergent Information Patterns in Unified Multimodal Models

Apr 15, 2026

23:05

303

CocoaBench: Evaluating Unified Digital Agents in the Wild

Apr 15, 2026

22:55

304

CodeTracer: Towards Traceable Agent States

Apr 15, 2026

23:36

305

WildDet3D: Scaling Promptable 3D Detection in the Wild

Apr 14, 2026

25:01

306

FORGE: Fine-grained Multimodal Evaluation for Manufacturing Scenarios

Apr 14, 2026

21:57

307

EXAONE 4.5 Technical Report

Apr 14, 2026

23:16

308

RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details

Apr 14, 2026

22:22

309

Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory

Apr 14, 2026

23:56

310

Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

Apr 11, 2026

24:44

311

SkillClaw: Let Skills Evolve Collectively with Agentic Evolver

Apr 11, 2026

22:33

312

RAGEN-2: Reasoning Collapse in Agentic RL

Apr 10, 2026

25:36

313

MARS: Enabling Autoregressive Models Multi-Token Generation

Apr 10, 2026

23:23

314

Combee: Scaling Prompt Learning for Self-Improving Language Model Agents

Apr 10, 2026

21:56

315

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

Apr 9, 2026

24:54

316

Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents

Apr 9, 2026

22:39

317

Learning to Retrieve from Agent Trajectories

Apr 9, 2026

22:27

318

ACES: Who Tests the Tests? Leave-One-Out AUC Consistency for Code Generation

Apr 9, 2026

24:22

319

GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance Engineers

Apr 9, 2026

23:50

320

Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated Reasoning

Apr 9, 2026

21:19

321

ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement

Apr 9, 2026

22:23

322

Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision

Apr 9, 2026

24:53

323

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Apr 9, 2026

25:12

324

Watch Before You Answer: Learning from Visually Grounded Post-Training

Apr 9, 2026

20:34

325

OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

Apr 8, 2026

23:58

326

MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale

Apr 8, 2026

23:52

327

LIBERO-Para: A Diagnostic Benchmark and Metrics for Paraphrase Robustness in VLA Models

Apr 8, 2026

22:52

328

TriAttention: Efficient Long Reasoning with Trigonometric KV Compression

Apr 8, 2026

21:22

329

Adam's Law: Textual Frequency Law on Large Language Models

Apr 8, 2026

22:27

330

AURA: Always-On Understanding and Real-Time Assistance via Video Streams

Apr 8, 2026

23:35

331

ClawArena: Benchmarking AI Agents in Evolving Information Environments

Apr 8, 2026

21:28

332

SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing

Apr 8, 2026

22:34

333

LightThinker++: From Reasoning Compression to Memory Management

Apr 8, 2026

20:05

334

Self-Distilled RLVR

Apr 7, 2026

21:55

335

A Simple Baseline for Streaming Video Understanding

Apr 7, 2026

21:50

336

Token Warping Helps MLLMs Look from Nearby Viewpoints

Apr 7, 2026

20:05

337

Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence?

Apr 7, 2026

22:43

338

DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models

Apr 4, 2026

27:27

339

The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook

Apr 4, 2026

22:54

340

Generative World Renderer

Apr 4, 2026

22:41

341

SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

Apr 4, 2026

19:28

342

Steerable Visual Representations

Apr 4, 2026

21:42

343

EgoSim: Egocentric World Simulator for Embodied Interaction Generation

Apr 4, 2026

24:47

344

CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery

Apr 4, 2026

24:57

345

ClawKeeper: Comprehensive Safety Protection for OpenClaw Agents Through Skills, Plugins, and Watchers

Apr 3, 2026

25:10

346

Terminal Agents Suffice for Enterprise Automation

Apr 3, 2026

24:09

347

MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome

Apr 3, 2026

24:48

348

ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?

Apr 3, 2026

24:35

349

Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification

Apr 3, 2026

23:27

350

QuitoBench: A High-Quality Open Time Series Forecasting Benchmark

Apr 3, 2026

24:46

351

Reasoning Shift: How Context Silently Shortens LLM Reasoning

Apr 3, 2026

23:27

352

FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization

Apr 2, 2026

23:55

353

CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence

Apr 2, 2026

28:20

354

LongCat-Next: Lexicalizing Modalities as Discrete Tokens

Apr 2, 2026

23:25

355

Lingshu-Cell: A generative cellular world model for transcriptome modeling toward virtual cells

Apr 2, 2026

23:44

356

GEMS: Agent-Native Multimodal Generation with Memory and Skills

Apr 2, 2026

22:21

357

Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development

Apr 2, 2026

23:58

358

VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward

Apr 2, 2026

22:52

359

Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis

Apr 2, 2026

22:56

360

CutClaw: Agentic Hours-Long Video Editing via Music Synchronization

Apr 2, 2026

22:22

361

daVinci-LLM:Towards the Science of Pretraining

Apr 2, 2026

25:37

362

TAPS: Task Aware Proposal Distributions for Speculative Sampling

Apr 1, 2026

22:18

363

Towards a Medical AI Scientist

Apr 1, 2026

24:49

364

Gen-Searcher: Reinforcing Agentic Search for Image Generation

Apr 1, 2026

26:21

365

Emergent Social Intelligence Risks in Generative Multi-Agent Systems

Apr 1, 2026

22:19

366

EpochX: Building the Infrastructure for an Emergent Agent Civilization

Apr 1, 2026

24:01

367

On Token's Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models

Apr 1, 2026

22:13

368

GEditBench v2: A Human-Aligned Benchmark for General Image Editing

Apr 1, 2026

21:38

369

Make Geometry Matter for Spatial Reasoning

Apr 1, 2026

25:26

370

PRBench: End-to-end Paper Reproduction in Physics Research

Apr 1, 2026

24:01

371

PixelSmile: Toward Fine-Grained Facial Expression Editing

Mar 28, 2026

27:52

372

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Mar 28, 2026

25:01

373

Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration

Mar 28, 2026

22:30

374

RealRestorer: Towards Generalizable Real-World Image Restoration with Large-Scale Image Editing Models

Mar 28, 2026

21:32

375

MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data

Mar 28, 2026

22:23

376

Voxtral TTS

Mar 28, 2026

29:09

377

Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?

Mar 27, 2026

24:50

378

MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding

Mar 26, 2026

24:06

379

WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG

Mar 26, 2026

22:14

380

From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents

Mar 26, 2026

27:57

381

SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning

Mar 26, 2026

21:53

382

PEARL: Personalized Streaming Video Understanding Model

Mar 26, 2026

22:20

383

DA-Flow: Degradation-Aware Optical Flow Estimation with Diffusion Models

Mar 26, 2026

20:42

384

SIMART: Decomposing Monolithic Meshes into Sim-ready Articulated Assets via MLLM

Mar 26, 2026

23:43

385

UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation

Mar 26, 2026

20:15

386

RealMaster: Lifting Rendered Scenes into Photorealistic Video

Mar 26, 2026

22:32

387

Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World Models

Mar 25, 2026

23:04

388

Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

Mar 25, 2026

22:55

389

LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning

Mar 25, 2026

22:54

390

Look Where It Matters: High-Resolution Crops Retrieval for Efficient VLMs

Mar 25, 2026

26:12

391

OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis

Mar 25, 2026

24:15

392

VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding

Mar 25, 2026

22:39

393

SpatialBoost: Enhancing Visual Representation through Language-Guided Reasoning

Mar 25, 2026

22:51

394

F4Splat: Feed-Forward Predictive Densification for Feed-Forward 3D Gaussian Splatting

Mar 25, 2026

22:49

395

mSFT: Addressing Dataset Mixtures Overfitting Heterogeneously in Multi-task SFT

Mar 25, 2026

24:27

396

HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning

Mar 24, 2026

24:36

397

Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models

Mar 24, 2026

24:19

398

TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation

Mar 24, 2026

25:57

399

ProactiveBench: Benchmarking Proactiveness in Multimodal Large Language Models

Mar 24, 2026

20:44

400

FlowScene: Style-Consistent Indoor Scene Generation with Multimodal Graph Rectified Flow

Mar 24, 2026

26:02

401

The $\mathbf{Y}$-Combinator for LLMs: Solving Long-Context Rot with $λ$-Calculus

Mar 24, 2026

26:08

402

LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation

Mar 24, 2026

22:17

403

Hyperagents

Mar 24, 2026

24:00

404

Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding

Mar 21, 2026

23:56

405

SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing

Mar 21, 2026

27:08

406

FASTER: Rethinking Real-Time Flow VLAs

Mar 21, 2026

22:35

407

3DreamBooth: High-Fidelity 3D Subject-Driven Video Generation Model

Mar 21, 2026

21:19

408

Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer

Mar 21, 2026

23:48

409

MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction

Mar 21, 2026

22:52

410

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

Mar 21, 2026

22:57

411

Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens

Mar 21, 2026

21:58

412

LVOmniBench: Pioneering Long Audio-Video Understanding Evaluation for Omnimodal LLMs

Mar 21, 2026

21:40

413

Memento-Skills: Let Agents Design Agents

Mar 21, 2026

24:38

414

MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild

Mar 20, 2026

23:38

415

Video-CoE: Reinforcing Video Event Prediction via Chain of Events

Mar 20, 2026

23:32

416

MosaicMem: Hybrid Spatial Memory for Controllable Video World Models

Mar 20, 2026

23:18

417

Alignment Makes Language Models Normative, Not Descriptive

Mar 20, 2026

21:29

418

Complementary Reinforcement Learning

Mar 20, 2026

24:00

419

When AI Navigates the Fog of War

Mar 20, 2026

23:56

420

MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification

Mar 19, 2026

25:05

421

InCoder-32B: Code Foundation Model for Industrial Scenarios

Mar 19, 2026

23:16

422

Qianfan-OCR: A Unified End-to-End Model for Document Intelligence

Mar 19, 2026

21:30

423

Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding

Mar 19, 2026

20:17

424

Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation

Mar 19, 2026

23:18

425

Demystifing Video Reasoning

Mar 19, 2026

20:20

426

WorldCam: Interactive Autoregressive 3D Gaming Worlds with Camera Pose as a Unifying Geometric Representation

Mar 19, 2026

21:39

427

TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas

Mar 19, 2026

23:48

428

Online Experiential Learning for Language Models

Mar 19, 2026

25:39

429

FinToolBench: Evaluating LLM Agents for Real-World Financial Tool Use

Mar 19, 2026

24:55

430

AI Can Learn Scientific Taste

Mar 18, 2026

22:31

431

OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data

Mar 18, 2026

23:37

432

EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings

Mar 18, 2026

23:54

433

Grounding World Simulation Models in a Real-World Metropolis

Mar 18, 2026

23:52

434

HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions

Mar 18, 2026

22:57

435

Attention Residuals

Mar 18, 2026

23:06

436

Mixture-of-Depths Attention

Mar 18, 2026

22:53

437

Effective Distillation to Hybrid xLSTM Architectures

Mar 18, 2026

25:29

438

Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models

Mar 18, 2026

23:15

439

ViFeEdit: A Video-Free Tuner of Your Video Diffusion Transformer

Mar 18, 2026

23:16

440

LMEB: Long-horizon Memory Embedding Benchmark

Mar 17, 2026

21:47

441

Can Vision-Language Models Solve the Shell Game?

Mar 17, 2026

22:31

442

Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation

Mar 17, 2026

20:54

443

daVinci-Env: Open SWE Environment Synthesis at Scale

Mar 17, 2026

20:36

444

Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

Mar 14, 2026

25:28

445

OpenClaw-RL: Train Any Agent Simply by Talking

Mar 13, 2026

25:45

446

Flash-KMeans: Fast and Memory-Efficient Exact K-Means

Mar 13, 2026

23:12

447

MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied Agents

Mar 13, 2026

25:45

448

LLM2Vec-Gen: Generative Embeddings from Large Language Models

Mar 13, 2026

24:22

449

Urban Socio-Semantic Segmentation with Vision-Language Reasoning

Jan 17, 2026

21:48

450

STEP3-VL-10B Technical Report

Jan 17, 2026

26:27

451

Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs

Jan 17, 2026

20:32

452

Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning

Jan 17, 2026

25:14

453

Controlled Self-Evolution for Algorithmic Code Optimization

Jan 16, 2026

23:40

454

DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation

Jan 16, 2026

18:31

455

MAXS: Meta-Adaptive Exploration with LLM Agents

Jan 16, 2026

21:42

456

Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning

Jan 16, 2026

20:35

457

Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning

Jan 16, 2026

24:42

458

SkinFlow: Efficient Information Transmission for Open Dermatological Diagnosis via Dynamic Visual Encoding and Staged RL

Jan 16, 2026

26:56

459

OpenDecoder: Open Large Language Model Decoding to Incorporate Document Quality in RAG

Jan 16, 2026

25:28

460

OpenVoxel: Training-Free Grouping and Captioning Voxels for Open-Vocabulary 3D Scene Understanding

Jan 16, 2026

23:20

461

MemGovern: Enhancing Code Agents through Learning from Governed Human Experiences

Jan 15, 2026

23:25

462

Solar Open Technical Report

Jan 15, 2026

21:06

463

KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions

Jan 15, 2026

22:06

464

User-Oriented Multi-Turn Dialogue Generation with Tool Use at scale

Jan 15, 2026

23:26

465

ShowUI-$π$: Flow-based Generative Models as GUI Dexterous Hands

Jan 15, 2026

22:20

466

ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking

Jan 15, 2026

26:01

467

MemoBrain: Executive Memory as an Agentic Brain for Reasoning

Jan 15, 2026

22:27

468

Motion Attribution for Video Generation

Jan 15, 2026

20:06

469

3AM: Segment Anything with Geometric Consistency in Videos

Jan 15, 2026

22:38

470

BabyVision: Visual Reasoning Beyond Language

Jan 14, 2026

22:04

471

PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning

Jan 14, 2026

23:17

472

MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

Jan 14, 2026

22:26

473

X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests

Jan 14, 2026

22:30

474

GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts

Jan 14, 2026

20:07

475

Lost in the Noise: How Reasoning Models Fail with Contextual Distractors

Jan 14, 2026

23:35

476

OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent

Jan 14, 2026

24:29

477

Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization

Jan 13, 2026

26:55

478

MMFormalizer: Multimodal Autoformalization in the Wild

Jan 13, 2026

21:58

479

CaricatureGS: Exaggerating 3D Gaussian Splatting Faces With Gaussian Curvature

Jan 13, 2026

21:52

480

The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning

Jan 13, 2026

22:57

481

Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards

Jan 13, 2026

25:39

482

EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis

Jan 13, 2026

23:24

483

Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking

Jan 13, 2026

22:01

484

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Jan 10, 2026

25:04

485

Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers

Jan 10, 2026

25:06

486

RL-AWB: Deep Reinforcement Learning for Auto White Balance Correction in Low-Light Night-time Scenes

Jan 10, 2026

23:09

487

Token-Level LLM Collaboration via FusionRoute

Jan 10, 2026

25:46

488

Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting

Jan 9, 2026

23:27

489

Evolving Programmatic Skill Networks

Jan 9, 2026

25:35

490

Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning

Jan 9, 2026

27:28

491

Benchmark^2: Systematic Evaluation of LLM Benchmarks

Jan 9, 2026

22:14

492

InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields

Jan 8, 2026

22:07

493

LTX-2: Efficient Joint Audio-Visual Foundation Model

Jan 8, 2026

22:27

494

MOSS Transcribe Diarize: Accurate Transcription with Speaker Diarization

Jan 8, 2026

26:58

495

SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence

Jan 8, 2026

28:03

496

NitroGen: An Open Foundation Model for Generalist Gaming Agents

Jan 8, 2026

22:31

497

Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits

Jan 7, 2026

22:58

498

NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation

Jan 7, 2026

26:55

499

DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer

Jan 7, 2026

23:47

500

VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation

Jan 7, 2026

22:33

501

GARDO: Reinforcing Diffusion Models without Reward Hacking

Jan 7, 2026

24:15

502

InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams

Jan 7, 2026

25:57

503

VINO: A Unified Visual Generator with Interleaved OmniModal Context

Jan 7, 2026

23:53

504

Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization

Jan 6, 2026

23:23

505

NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos

Jan 6, 2026

22:46

506

Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation

Jan 6, 2026

22:58

507

Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation

Jan 6, 2026

26:51

508

SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning

Jan 6, 2026

27:56

509

Deep Delta Learning

Jan 6, 2026

20:34

510

AdaGaR: Adaptive Gabor Representation for Dynamic Scene Reconstruction

Jan 6, 2026

23:09

511

Nested Learning: The Illusion of Deep Learning Architectures

Jan 6, 2026

23:45

512

Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling

Jan 3, 2026

22:36

513

Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space

Jan 3, 2026

25:22

514

mHC: Manifold-Constrained Hyper-Connections

Jan 2, 2026

20:57

515

Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models

Jan 2, 2026

28:35

516

Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem

Jan 2, 2026

25:58

517

GaMO: Geometry-aware Multi-view Diffusion Outpainting for Sparse-View 3D Reconstruction

Jan 2, 2026

22:28

518

Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss

Dec 31, 2025

24:49

519

LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

Dec 31, 2025

23:16

520

Yume-1.5: A Text-Controlled Interactive World Generation Model

Dec 31, 2025

25:01

521

SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents

Dec 31, 2025

24:01

522

Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation

Dec 31, 2025

25:32

523

Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion

Dec 31, 2025

25:06

524

Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone

Dec 31, 2025

23:48

525

SpotEdit: Selective Region Editing in Diffusion Transformers

Dec 31, 2025

22:44

526

GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models

Dec 31, 2025

22:03

527

InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion

Dec 30, 2025

23:11

528

Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding

Dec 30, 2025

21:17

529

MAI-UI Technical Report: Real-World Centric Foundation GUI Agents

Dec 30, 2025

24:59

530

Latent Implicit Visual Reasoning

Dec 27, 2025

25:49

531

Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

Dec 27, 2025

26:01

532

TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

Dec 26, 2025

21:22

533

Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models

Dec 26, 2025

22:56

534

DreaMontage: Arbitrary Frame-Guided One-Shot Video Generation

Dec 26, 2025

21:35

535

T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation

Dec 26, 2025

20:49

536

SemanticGen: Video Generation in Semantic Space

Dec 25, 2025

22:13

537

Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies

Dec 25, 2025

27:28

538

LongVideoAgent: Multi-Agent Reasoning with Long Videos

Dec 25, 2025

22:12

539

SpatialTree: How Spatial Abilities Branch Out in MLLMs

Dec 25, 2025

22:10

540

DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

Dec 24, 2025

24:05

541

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

Dec 24, 2025

26:13

542

Region-Constraint In-Context Generation for Instructional Video Editing

Dec 24, 2025

21:07

543

QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation

Dec 24, 2025

24:14

544

Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation

Dec 24, 2025

24:41

545

Can LLMs Estimate Student Struggles? Human-AI Difficulty Alignment with Proficiency Simulation for Item Difficulty Prediction

Dec 24, 2025

25:57

546

Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

Dec 23, 2025

24:01

547

PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence

Dec 23, 2025

25:33

548

When Reasoning Meets Its Laws

Dec 23, 2025

21:45

549

Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

Dec 23, 2025

25:34

550

4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation

Dec 23, 2025

26:30

551

Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing

Dec 23, 2025

23:40

552

Are We on the Right Way to Assessing LLM-as-a-Judge?

Dec 23, 2025

23:16

553

Kling-Omni Technical Report

Dec 20, 2025

24:17

554

Adaptation of Agentic AI

Dec 20, 2025

26:20

555

LLaDA2.0: Scaling Up Diffusion Language Models to 100B

Dec 20, 2025

26:32

556

Next-Embedding Prediction Makes Strong Vision Learners

Dec 20, 2025

22:00

557

StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors

Dec 20, 2025

24:04

558

Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model

Dec 20, 2025

22:14

559

Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation

Dec 20, 2025

21:29

560

Generative Refocusing: Flexible Defocus Control from a Single Image

Dec 20, 2025

25:27

561

DeContext as Defense: Safe Image Editing in Diffusion Transformers

Dec 20, 2025

23:34

562

Step-GUI Technical Report

Dec 19, 2025

26:21

563

DEER: Draft with Diffusion, Verify with Autoregressive Models

Dec 19, 2025

25:44

564

Fast and Accurate Causal Parallel Decoding using Jacobi Forcing

Dec 19, 2025

21:47

565

HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices

Dec 19, 2025

22:07

566

Puzzle Curriculum GRPO for Vision-Centric Reasoning

Dec 19, 2025

25:36

567

MMGR: Multi-Modal Generative Reasoning

Dec 18, 2025

24:35

568

Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?

Dec 18, 2025

24:20

569

WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling

Dec 18, 2025

21:32

570

Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling

Dec 18, 2025

22:31

571

RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics

Dec 18, 2025

19:54

572

OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value

Dec 18, 2025

29:25

573

ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding

Dec 17, 2025

26:30

574

Towards Scalable Pre-training of Visual Tokenizers for Generation

Dec 17, 2025

21:55

575

Memory in the Age of AI Agents

Dec 17, 2025

23:55

576

QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management

Dec 17, 2025

24:20

577

LongVie 2: Multimodal Controllable Ultra-Long Video World Model

Dec 17, 2025

26:24

578

Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows

Dec 17, 2025

30:53

579

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

Dec 17, 2025

24:52

580

Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics

Dec 17, 2025

22:42

581

KlingAvatar 2.0 Technical Report

Dec 17, 2025

24:12

582

MentraSuite: Post-Training Large Language Models for Mental Health Reasoning and Assessment

Dec 17, 2025

23:46

583

EgoX: Egocentric Video Generation from a Single Exocentric Video

Dec 16, 2025

21:54

584

DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry

Dec 16, 2025

18:26

585

SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder

Dec 16, 2025

22:17

586

V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties

Dec 16, 2025

23:26

587

T-pro 2.0: An Efficient Russian Hybrid-Reasoning Model and Playground

Dec 13, 2025

22:52

588

Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving

Dec 13, 2025

23:06

589

Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation

Dec 13, 2025

28:53

590

OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification

Dec 13, 2025

24:56

591

Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning

Dec 13, 2025

23:41

592

StereoWorld: Geometry-Aware Monocular-to-Stereo Video Generation

Dec 12, 2025

21:49

593

BrainExplore: Large-Scale Discovery of Interpretable Visual Representations in the Human Brain

Dec 12, 2025

22:44

594

OmniPSD: Layered PSD Generation with Diffusion Transformer

Dec 12, 2025

26:08

595

Composing Concepts from Images and Videos via Concept-prompt Binding

Dec 12, 2025

23:04

596

Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

Dec 11, 2025

23:28

597

Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform

Dec 11, 2025

25:38

598

Preserving Source Video Realism: High-Fidelity Face Swapping for Cinematic Quality

Dec 11, 2025

23:04

599

OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory

Dec 11, 2025

22:37

600

Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning

Dec 10, 2025

24:05

601

Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs

Dec 10, 2025

21:49

602

Unified Video Editing with Temporal Reasoner

Dec 10, 2025

20:34

603

Voxify3D: Pixel Art Meets Volumetric Rendering

Dec 10, 2025

22:17

604

Scaling Zero-Shot Reference-to-Video Generation

Dec 10, 2025

23:52

605

DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems

Dec 10, 2025

24:53

606

TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows

Dec 9, 2025

22:53

607

EditThinker: Unlocking Iterative Reasoning for Any Image Editor

Dec 9, 2025

26:31

608

From Imitation to Discrimination: Toward A Generalized Curriculum Advantage Mechanism Enhancing Cross-Domain Reasoning Tasks

Dec 9, 2025

23:39

609

EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture

Dec 9, 2025

24:33

610

DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle

Dec 6, 2025

27:25

611

Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

Dec 6, 2025

25:15

612

Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction

Dec 6, 2025

24:40

613

ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning

Dec 6, 2025

23:40

614

Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation

Dec 6, 2025

21:09

615

Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion

Dec 6, 2025

25:40

616

PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing

Dec 6, 2025

23:24

617

Qwen3-VL Technical Report

Dec 5, 2025

27:05

618

Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach

Dec 5, 2025

22:30

619

PretrainZero: Reinforcement Active Pretraining

Dec 5, 2025

22:48

620

ViDiC: Video Difference Captioning

Dec 5, 2025

23:33

621

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

Dec 4, 2025

22:11

622

ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

Dec 4, 2025

21:23

623

MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

Dec 4, 2025

27:19

624

MG-Nav: Dual-Scale Visual Navigation via Sparse Spatial Memory

Dec 4, 2025

23:42

625

Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch

Dec 4, 2025

25:38

626

DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation

Dec 4, 2025

20:31

627

Guided Self-Evolving LLMs with Minimal Human Supervision

Dec 4, 2025

25:25

628

SimScale: Learning to Drive via Real-World Simulation at Scale

Dec 4, 2025

22:38

629

InnoGym: Benchmarking the Innovation Potential of AI Agents

Dec 4, 2025

23:37

630

LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

Dec 3, 2025

23:00

631

Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights

Dec 3, 2025

24:10

632

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Dec 3, 2025

19:41

633

How Far Are We from Genuinely Useful Deep Research Agents?

Dec 3, 2025

24:37

634

What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards

Dec 3, 2025

22:15

635

Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout

Dec 3, 2025

19:09

636

The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment

Dec 3, 2025

24:04

637

TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models

Dec 3, 2025

24:58

638

LFM2 Technical Report

Dec 3, 2025

22:37

639

Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Dec 2, 2025

24:43

640

REASONEDIT: Towards Reasoning-Enhanced Image Editing Models

Dec 2, 2025

21:40

641

Vision Bridge Transformer at Scale

Dec 2, 2025

21:48

642

DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning

Dec 2, 2025

21:10

643

Architecture Decoupling Is Not All You Need For Unified Multimodal Model

Dec 2, 2025

20:55

644

Multimodal Evaluation of Russian-language Architectures

Nov 28, 2025

24:48

645

Latent Collaboration in Multi-Agent Systems

Nov 28, 2025

26:12

646

Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation

Nov 28, 2025

18:26

647

GigaEvo: An Open Source Optimization Framework Powered By LLMs And Evolution Algorithms

Nov 27, 2025

23:40

648

MedSAM3: Delving into Segment Anything with Medical Concepts

Nov 27, 2025

24:21

649

Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning

Nov 27, 2025

22:42

650

SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation

Nov 27, 2025

19:11

651

iMontage: Unified, Versatile, Highly Dynamic Many-to-many Image Generation

Nov 27, 2025

25:05

652

Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward

Nov 27, 2025

25:10

653

GigaWorld-0: World Models as Data Engine to Empower Embodied AI

Nov 27, 2025

22:27

654

SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space

Nov 27, 2025

21:32

655

Soft Adaptive Policy Optimization

Nov 27, 2025

24:08

656

General Agentic Memory Via Deep Research

Nov 26, 2025

25:36

657

AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning

Nov 26, 2025

23:00

658

Computer-Use Agents as Judges for Generative User Interface

Nov 26, 2025

25:50

659

DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation

Nov 26, 2025

25:10

660

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

Nov 26, 2025

20:31

661

UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios

Nov 26, 2025

22:20

662

In-Video Instructions: Visual Signals as Generative Control

Nov 26, 2025

22:26

663

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Nov 25, 2025

21:24

664

Unveiling Intrinsic Dimension of Texts: from Academic Abstract to Creative Story

Nov 25, 2025

22:13

665

GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization

Nov 25, 2025

21:40

666

SAM 3: Segment Anything with Concepts

Nov 25, 2025

23:53

667

Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks

Nov 21, 2025

25:59

668

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Nov 21, 2025

24:57

669

What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity

Nov 21, 2025

22:40

670

VisPlay: Self-Evolving Vision-Language Models from Images

Nov 21, 2025

22:28

671

Instruction-Guided Lesion Segmentation for Chest X-rays with Automatically Generated Large-Scale Dataset

Nov 21, 2025

19:22

672

VIDEOP2R: Video Understanding from Perception to Reasoning

Nov 20, 2025

25:08

673

Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models

Nov 20, 2025

24:58

674

AraLingBench A Human-Annotated Benchmark for Evaluating Arabic Linguistic Capabilities of Large Language Models

Nov 20, 2025

23:48

675

A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space

Nov 20, 2025

23:48

676

Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark

Nov 20, 2025

22:39

677

MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs

Nov 20, 2025

24:27

678

REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding

Nov 20, 2025

26:47

679

Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data

Nov 19, 2025

24:24

680

P1: Mastering Physics Olympiads with Reinforcement Learning

Nov 19, 2025

22:16

681

MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling

Nov 19, 2025

27:44

682

Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance

Nov 19, 2025

23:57

683

Part-X-MLLM: Part-aware 3D Multimodal Large Language Model

Nov 19, 2025

25:57

684

MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation

Nov 19, 2025

20:43

685

GroupRank: A Groupwise Reranking Paradigm Driven by Reinforcement Learning

Nov 19, 2025

23:49

686

TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models

Nov 19, 2025

23:11

687

PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image

Nov 19, 2025

25:25

688

GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models

Nov 18, 2025

21:58

689

DoPE: Denoising Rotary Position Embedding

Nov 18, 2025

19:24

690

WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation

Nov 18, 2025

25:25

691

UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation

Nov 18, 2025

25:46

692

AIonopedia: an LLM agent orchestrating multimodal learning for ionic liquid discovery

Nov 18, 2025

28:52

693

LiteAttention: A Temporal Sparse Attention for Diffusion Transformers

Nov 18, 2025

21:16

694

Virtual Width Networks

Nov 18, 2025

22:36

695

One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models

Nov 15, 2025

22:00

696

PAN: A World Model for General, Interactable, and Long-Horizon World Simulation

Nov 15, 2025

25:56

697

UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist

Nov 15, 2025

27:35

698

Too Good to be Bad: On the Failure of LLMs to Role-Play Villains

Nov 11, 2025

25:22

699

DeepEyesV2: Toward Agentic Multimodal Model

Nov 11, 2025

26:21

700

Visual Spatial Tuning

Nov 11, 2025

24:14

701

VeriCoT: Neuro-symbolic Chain-of-Thought Validation via Logical Consistency Checks

Nov 11, 2025

22:55

702

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Nov 8, 2025

24:22

703

V-Thinker: Interactive Thinking with Images

Nov 8, 2025

21:01

704

Scaling Agent Learning via Experience Synthesis

Nov 8, 2025

23:00

705

Diffusion Language Models are Super Data Learners

Nov 7, 2025

22:27

706

LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied Environments with Tool Augmentation

Nov 7, 2025

25:51

707

UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions

Nov 7, 2025

23:19

708

Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization

Nov 6, 2025

28:20

709

VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

Nov 6, 2025

21:54

710

When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought

Nov 6, 2025

24:13

711

Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation

Nov 5, 2025

24:08

712

Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph

Nov 5, 2025

22:50

713

The Underappreciated Power of Vision Models for Graph Structural Understanding

Nov 5, 2025

25:56

714

UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback

Nov 5, 2025

23:59

715

ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation

Nov 5, 2025

25:34

716

PHUMA: Physically-Grounded Humanoid Locomotion Dataset

Nov 5, 2025

21:26

717

UniREditBench: A Unified Reasoning-based Image Editing Benchmark

Nov 5, 2025

23:01

718

World Simulation with Video Foundation Models for Physical AI

Nov 5, 2025

29:01

719

ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning

Nov 4, 2025

22:48

720

INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats

Nov 4, 2025

20:54

721

Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning

Nov 4, 2025

25:27

722

The End of Manual Decoding: Towards Truly End-to-End Language Models

Nov 1, 2025

22:34

723

Kimi Linear: An Expressive, Efficient Attention Architecture

Nov 1, 2025

23:11

724

Surfer 2: The Next Generation of Cross-Platform Computer Use Agents

Nov 1, 2025

23:45

725

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark

Nov 1, 2025

25:05

726

The Quest for Generalizable Motion Generation: Data, Model, and Evaluation

Nov 1, 2025

22:48

727

Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations

Oct 29, 2025

23:09

728

Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning

Oct 24, 2025

23:06

729

BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping

Oct 24, 2025

21:40

730

LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts

Oct 24, 2025

21:23

731

Language Models are Injective and Hence Invertible

Oct 24, 2025

24:03

732

GigaBrain-0: A World Model-Powered Vision-Language-Action Model

Oct 24, 2025

29:04

733

LightMem: Lightweight and Efficient Memory-Augmented Generation

Oct 23, 2025

26:02

734

Efficient Long-context Language Model Training by Core Attention Disaggregation

Oct 23, 2025

23:41

735

World-in-World: World Models in a Closed-Loop World

Oct 23, 2025

24:28

736

UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation

Oct 23, 2025

24:10

737

Chem-R: Learning to Reason as a Chemist

Oct 23, 2025

20:46

738

MoGA: Mixture-of-Groups Attention for End-to-End Long Video Generation

Oct 23, 2025

22:44

739

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

Oct 23, 2025

23:33

740

Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model

Oct 23, 2025

22:30

741

IF-VidCap: Can Video Caption Models Follow Instructions?

Oct 23, 2025

24:26

742

DeepAnalyze: Agentic Large Language Models for Autonomous Data Science

Oct 22, 2025

20:11

743

PICABench: How Far Are We from Physically Realistic Image Editing?

Oct 22, 2025

22:19

744

Glyph: Scaling Context Windows via Visual-Text Compression

Oct 22, 2025

25:22

745

FineVision: Open Data Is All You Need

Oct 22, 2025

27:02

746

TrajSelector: Harnessing Latent Representations for Efficient and Effective Best-of-N in Large Reasoning Model

Oct 22, 2025

23:31

747

Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation

Oct 22, 2025

24:13

748

When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling

Oct 22, 2025

22:36

749

A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning

Oct 21, 2025

20:07

750

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Oct 21, 2025

25:05

751

NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks

Oct 21, 2025

22:45

752

Emergent Misalignment via In-Context Learning: Narrow in-context examples can produce broadly misaligned LLMs

Oct 21, 2025

20:28

753

Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset

Oct 21, 2025

20:07

754

Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery

Oct 21, 2025

25:28

755

Latent Diffusion Model without Variational Autoencoder

Oct 21, 2025

25:07

756

When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA

Oct 18, 2025

23:54

757

Agentic Entropy-Balanced Policy Optimization

Oct 18, 2025

23:52

758

WithAnyone: Towards Controllable and ID Consistent Image Generation

Oct 18, 2025

23:15

759

AI for Service: Proactive Assistance with AI Glasses

Oct 18, 2025

24:04

760

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

Oct 18, 2025

20:41

761

ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints

Oct 18, 2025

21:12

762

Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents

Oct 18, 2025

25:04

763

LaSeR: Reinforcement Learning with Last-Token Self-Rewarding

Oct 18, 2025

19:44

764

TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar

Oct 18, 2025

20:48

765

BitNet Distillation

Oct 18, 2025

22:18

766

Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model

Oct 16, 2025

22:32

767

Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training

Oct 16, 2025

23:30

768

DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation

Oct 16, 2025

21:58

769

Scaling Language-Centric Omnimodal Representation Learning

Oct 16, 2025

28:03

770

Robot Learning: A Tutorial

Oct 16, 2025

23:41

771

Detect Anything via Next Point Prediction

Oct 16, 2025

23:00

772

A Survey of Vibe Coding with Large Language Models

Oct 16, 2025

22:36

773

FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution

Oct 16, 2025

23:41

774

Dr.LLM: Dynamic Layer Routing in LLMs

Oct 16, 2025

23:50

775

Temporal Alignment Guidance: On-Manifold Sampling in Diffusion Models

Oct 16, 2025

20:48

776

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Oct 15, 2025

24:17

777

Diffusion Transformers with Representation Autoencoders

Oct 15, 2025

24:28

778

OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs

Oct 15, 2025

26:46

779

Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States

Oct 15, 2025

25:11

780

Spotlight on Token Perception for Multimodal Reinforcement Learning

Oct 15, 2025

23:52

781

RLFR: Extending Reinforcement Learning for LLMs with Flow Environment

Oct 15, 2025

24:01

782

DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training

Oct 15, 2025

22:13

783

AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration

Oct 15, 2025

24:24

784

InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models

Oct 15, 2025

25:21

785

BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions

Oct 15, 2025

21:38

786

D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI

Oct 14, 2025

23:49

787

Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation

Oct 14, 2025

23:18

788

TAG:Tangential Amplifying Guidance for Hallucination-Resistant Diffusion Sampling

Oct 14, 2025

21:39

789

AutoPR: Let's Automate Your Academic Promotion!

Oct 14, 2025

22:35

790

Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs

Oct 14, 2025

25:37

791

BEAR: Benchmarking and Enhancing Multimodal Language Models for Atomic Embodied Capabilities

Oct 14, 2025

26:40

792

StreamingVLM: Real-Time Understanding for Infinite Video Streams

Oct 14, 2025

21:22

793

Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels

Oct 14, 2025

24:41

794

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

Oct 14, 2025

22:59

795

R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?

Oct 14, 2025

24:42

796

Agent Learning via Early Experience

Oct 11, 2025

22:45

797

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization

Oct 11, 2025

21:36

798

MemMamba: Rethinking Memory Patterns in State Space Model

Oct 11, 2025

24:07

799

UniVideo: Unified Understanding, Generation, and Editing for Videos

Oct 11, 2025

26:06

800

From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning

Oct 11, 2025

26:23

801

When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs

Oct 11, 2025

21:23

802

Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning

Oct 11, 2025

24:32

803

VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning

Oct 11, 2025

24:37

804

The Alignment Waltz: Jointly Training Agents to Collaborate for Safety

Oct 11, 2025

24:20

805

Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense

Oct 11, 2025

24:18

806

Cache-to-Cache: Direct Semantic Communication Between Large Language Models

Oct 10, 2025

24:59

807

Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer

Oct 10, 2025

26:47

808

Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding

Oct 10, 2025

21:15

809

SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models

Oct 10, 2025

24:34

810

MATRIX: Mask Track Alignment for Interaction-aware Video Generation

Oct 10, 2025

23:04

811

RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training

Oct 10, 2025

20:18

812

Vibe Checker: Aligning Code Evaluation with Human Preference

Oct 10, 2025

23:39

813

Less is More: Recursive Reasoning with Tiny Networks

Oct 9, 2025

21:41

814

TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning

Oct 9, 2025

27:21

815

Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs

Oct 9, 2025

23:46

816

In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Oct 9, 2025

27:45

817

Fast-dLLM v2: Efficient Block-Diffusion LLM

Oct 9, 2025

23:35

818

CoDA: Coding LM via Diffusion Adaptation

Oct 9, 2025

22:12

819

Drax: Speech Recognition with Discrete Flow Matching

Oct 9, 2025

24:54

820

Paper2Video: Automatic Video Generation from Scientific Papers

Oct 8, 2025

21:54

821

MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual Information

Oct 8, 2025

25:31

822

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models

Oct 8, 2025

26:25

823

VChain: Chain-of-Visual-Thought for Reasoning in Video Generation

Oct 8, 2025

22:20

824

Imperceptible Jailbreaking against Large Language Models

Oct 8, 2025

20:49

825

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Oct 8, 2025

26:37

826

Hybrid Architectures for Language Models: Systematic Analysis and Design Insights

Oct 8, 2025

23:07

827

Optimal Scaling Needs Optimal Norm

Oct 8, 2025

22:52

828

Apriel-1.5-15b-Thinker

Oct 7, 2025

25:25

829

Large Reasoning Models Learn Better Alignment from Flawed Thinking

Oct 7, 2025

22:02

830

Efficient Multi-modal Large Language Models via Progressive Consistency Distillation

Oct 7, 2025

21:42

831

LongCodeZip: Compress Long Context for Code Language Models

Oct 4, 2025

29:31

832

Self-Forcing++: Towards Minute-Scale High-Quality Video Generation

Oct 4, 2025

23:03

833

ExGRPO: Learning to Reason from Experience

Oct 4, 2025

21:57

834

StealthAttack: Robust 3D Gaussian Splatting Poisoning via Density-Guided Illusions

Oct 4, 2025

26:07

835

Interactive Training: Feedback-Driven Neural Network Optimization

Oct 4, 2025

20:55

836

ModernVBERT: Towards Smaller Visual Document Retrievers

Oct 4, 2025

23:21

837

StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets?

Oct 4, 2025

30:49

838

DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Oct 3, 2025

24:12

839

GEM: A Gym for Agentic LLMs

Oct 3, 2025

25:53

840

VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators

Oct 3, 2025

26:50

841

Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation

Oct 3, 2025

24:50

842

PIPer: On-Device Environment Setup via Online Reinforcement Learning

Oct 3, 2025

20:19

843

SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights

Oct 3, 2025

24:05

844

ACON: Optimizing Context Compression for Long-horizon LLM Agents

Oct 3, 2025

25:16

845

MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use

Oct 2, 2025

25:16

846

The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

Oct 2, 2025

23:40

847

Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play

Oct 2, 2025

29:09

848

Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning

Oct 2, 2025

20:04

849

TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning

Oct 2, 2025

24:43

850

Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training

Oct 2, 2025

27:25

851

OceanGym: A Benchmark Environment for Underwater Embodied Agents

Oct 2, 2025

22:22

852

More Thought, Less Accuracy? On the Dual Nature of Reasoning in Vision-Language Models

Oct 2, 2025

24:35

853

Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners

Oct 2, 2025

23:19

854

DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder

Oct 2, 2025

25:33

855

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention

Oct 1, 2025

24:35

856

StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs

Oct 1, 2025

21:00

857

Multiplayer Nash Preference Optimization

Oct 1, 2025

26:13

858

RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark

Oct 1, 2025

25:04

859

Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR

Oct 1, 2025

24:39

860

OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing

Oct 1, 2025

19:35

861

SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer

Oct 1, 2025

26:23

862

Democratizing AI scientists using ToolUniverse

Oct 1, 2025

26:05

863

Visual Jigsaw Post-Training Improves MLLMs

Oct 1, 2025

23:16

864

When Does Reasoning Matter? A Controlled Study of Reasoning's Contribution to Model Performance

Oct 1, 2025

24:47

865

LongLive: Real-time Interactive Long Video Generation

Sep 30, 2025

24:54

866

Quantile Advantage Estimation for Entropy-Safe Reasoning

Sep 30, 2025

23:16

867

EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning

Sep 30, 2025

27:27

868

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Sep 30, 2025

25:00

869

ReviewScore: Misinformed Peer Review Detection with Large Language Models

Sep 30, 2025

21:58

870

Variational Reasoning for Language Models

Sep 30, 2025

22:33

871

Language Models Can Learn from Verbal Feedback Without Scalar Rewards

Sep 30, 2025

23:30

872

MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning

Sep 30, 2025

25:33

873

CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning

Sep 30, 2025

23:54

874

No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping

Sep 30, 2025

27:53

875

VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models

Sep 27, 2025

22:17

876

SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines

Sep 27, 2025

23:35

877

MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

Sep 27, 2025

28:47

878

Tree Search for LLM Agent Reinforcement Learning

Sep 27, 2025

24:50

879

Seedream 4.0: Toward Next-generation Multimodal Image Generation

Sep 27, 2025

21:30

880

Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets

Sep 27, 2025

25:09

881

AutoIntent: AutoML for Text Classification

Sep 27, 2025

22:28

882

Video models are zero-shot learners and reasoners

Sep 26, 2025

24:55

883

SIM-CoT: Supervised Implicit Chain-of-Thought

Sep 26, 2025

24:06

884

Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR

Sep 25, 2025

20:21

885

Reinforcement Learning on Pre-Training Data

Sep 25, 2025

20:55

886

Do You Need Proprioceptive States in Visuomotor Policies?

Sep 25, 2025

26:01

887

MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe

Sep 25, 2025

25:04

888

LIMI: Less is More for Agency

Sep 24, 2025

21:27

889

Qwen3-Omni Technical Report

Sep 24, 2025

15:18

890

OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models

Sep 24, 2025

22:54

891

OnePiece: Bringing Context Engineering and Reasoning to Industrial Cascade Ranking System

Sep 24, 2025

23:43

892

TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs

Sep 24, 2025

26:52

893

RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation

Sep 23, 2025

28:25

894

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

Sep 23, 2025

25:40

895

Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification

Sep 23, 2025

22:42

896

ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data

Sep 20, 2025

23:23

897

FlowRL: Matching Reward Distributions for LLM Reasoning

Sep 20, 2025

19:46

898

Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration

Sep 20, 2025

21:17

899

Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation

Sep 20, 2025

22:31

900

FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning

Sep 20, 2025

26:05

901

Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation

Sep 20, 2025

21:13

902

Hala Technical Report: Building Arabic-Centric Instruction & Translation Models at Scale

Sep 19, 2025

21:36

903

SAIL-VL2 Technical Report

Sep 19, 2025

24:30

904

PANORAMA: The Rise of Omnidirectional Vision in the Embodied AI Era

Sep 19, 2025

20:16

905

WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research

Sep 18, 2025

19:54

906

Scaling Agents via Continual Pre-training

Sep 18, 2025

23:43

907

WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning

Sep 18, 2025

22:05

908

Towards General Agentic Intelligence via Environment Scaling

Sep 18, 2025

23:04

909

WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents

Sep 18, 2025

20:37

910

ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization

Sep 18, 2025

19:21

911

Single-stream Policy Optimization

Sep 18, 2025

21:45

912

OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

Sep 17, 2025

22:32

913

UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning

Sep 17, 2025

20:22

914

InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts

Sep 17, 2025

21:12

915

IntrEx: A Dataset for Modeling Engagement in Educational Conversations

Sep 16, 2025

24:12

916

The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs

Sep 16, 2025

24:03

917

VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model

Sep 13, 2025

21:33

918

HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning

Sep 13, 2025

24:06

919

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

Sep 13, 2025

24:58

920

EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs

Sep 13, 2025

19:21

921

Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents

Sep 13, 2025

20:50

922

Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis

Sep 13, 2025

25:00

923

FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark

Sep 13, 2025

26:55

924

Can Understanding and Generation Truly Benefit Together -- or Just Coexist?

Sep 13, 2025

24:24

925

MachineLearningLM: Scaling Many-shot In-context Learning via Continued Pretraining

Sep 13, 2025

22:59

926

A Survey of Reinforcement Learning for Large Reasoning Models

Sep 12, 2025

20:52

927

RewardDance: Reward Scaling in Visual Generation

Sep 12, 2025

21:27

928

3D and 4D World Modeling: A Survey

Sep 12, 2025

20:25

929

AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning

Sep 12, 2025

24:44

930

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Sep 11, 2025

23:21

931

Visual Representation Alignment for Multimodal Large Language Models

Sep 11, 2025

26:13

932

Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search

Sep 11, 2025

21:52

933

Reconstruction Alignment Improves Unified Multimodal Models

Sep 11, 2025

24:13

934

UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward

Sep 11, 2025

23:02

935

Reverse-Engineered Reasoning for Open-Ended Generation

Sep 10, 2025

12:24

936

Does DINOv3 Set a New Medical Vision Standard?

Sep 10, 2025

10:46

937

Symbolic Graphics Programming with Large Language Models

Sep 9, 2025

13:57

938

Set Block Decoding is a Language Model Inference Accelerator

Sep 9, 2025

16:21

939

Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth

Sep 6, 2025

22:57

940

From Editor to Dense Geometry Estimator

Sep 6, 2025

18:45

941

Towards a Unified View of Large Language Model Post-Training

Sep 6, 2025

23:07

942

DeepResearch Arena: The First Exam of LLMs' Research Abilities via Seminar-Grounded Tasks

Sep 6, 2025

20:11

943

Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?

Sep 6, 2025

23:03

944

Open Data Synthesis For Deep Research

Sep 5, 2025

23:03

945

Robix: A Unified Model for Robot Interaction, Reasoning and Planning

Sep 5, 2025

21:57

946

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

Sep 4, 2025

24:16

947

LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model

Sep 4, 2025

23:48

948

ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding

Sep 4, 2025

22:32

949

POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion

Sep 4, 2025

20:16

950

Baichuan-M2: Scaling Medical Capability with Large Verifier System

Sep 4, 2025

23:34

951

Kwai Keye-VL 1.5 Technical Report

Sep 4, 2025

18:19

952

Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic

Sep 4, 2025

24:18

953

PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning

Sep 3, 2025

21:59

954

R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

Sep 2, 2025

19:58

955

A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers

Sep 2, 2025

23:14

956

TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

Aug 28, 2025

21:42

957

VibeVoice Technical Report

Aug 28, 2025

21:19

958

CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics

Aug 28, 2025

20:03

959

VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Aug 28, 2025

20:50

960

OmniHuman-1.5: Instilling an Active Mind in Avatars via Cognitive Simulation

Aug 28, 2025

22:38

961

Spacer: Towards Engineered Scientific Inspiration

Aug 28, 2025

22:27

962

UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

Aug 28, 2025

19:39

963

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Aug 27, 2025

23:14

964

Visual-CoG: Stage-Aware Reinforcement Learning with Chain of Guidance for Text-to-Image Generation

Aug 27, 2025

18:59

965

MV-RAG: Retrieval Augmented Multiview Diffusion

Aug 27, 2025

20:32

966

Memento: Fine-tuning LLM Agents without Fine-tuning LLMs

Aug 26, 2025

22:33

967

Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR

Aug 26, 2025

21:37

968

ODYSSEY: Open-World Quadrupeds Exploration and Manipulation for Long-Horizon Tasks

Aug 26, 2025

21:27

969

Intern-S1: A Scientific Multimodal Foundation Model

Aug 23, 2025

19:26

970

Mobile-Agent-v3: Foundamental Agents for GUI Automation

Aug 23, 2025

25:02

971

Deep Think with Confidence

Aug 23, 2025

20:40

972

LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries

Aug 23, 2025

23:48

973

DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization

Aug 22, 2025

22:59

974

From Scores to Skills: A Cognitive Diagnosis Framework for Evaluating Financial Large Language Models

Aug 22, 2025

23:15

975

FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction

Aug 22, 2025

22:01

976

MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds

Aug 22, 2025

22:06

977

Tinker: Diffusion's Gift to 3D--Multi-View Consistent Editing From Sparse Inputs without Per-Scene Optimization

Aug 22, 2025

20:53

978

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Aug 21, 2025

22:08

979

LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos

Aug 21, 2025

21:20

980

Prompt Orchestration Markup Language

Aug 21, 2025

23:11

981

Ovis2.5 Technical Report

Aug 20, 2025

23:09

982

ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning

Aug 20, 2025

21:38

983

4DNeX: Feed-Forward 4D Generative Modeling Made Easy

Aug 20, 2025

23:37

984

Next Visual Granularity Generation

Aug 20, 2025

22:45

985

Speed Always Wins: A Survey on Efficient Architectures for Large Language Models

Aug 20, 2025

20:55

986

When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs

Aug 20, 2025

22:33

987

Has GPT-5 Achieved Spatial Intelligence? An Empirical Study

Aug 20, 2025

19:57

988

HeroBench: A Benchmark for Long-Horizon Planning and Structured Reasoning in Virtual Worlds

Aug 20, 2025

23:19

989

SSRL: Self-Search Reinforcement Learning

Aug 19, 2025

21:51

990

DINOv3

Aug 19, 2025

26:33

991

Thyme: Think Beyond Images

Aug 19, 2025

23:24

992

BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining

Aug 19, 2025

24:09

993

XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization

Aug 19, 2025

23:07

994

We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning

Aug 16, 2025

21:07

995

NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale

Aug 16, 2025

23:06

996

PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts

Aug 16, 2025

25:09

997

ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing

Aug 16, 2025

21:43

998

Story2Board: A Training-Free Approach for Expressive Storyboard Generation

Aug 15, 2025

21:29

999

Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery

Aug 15, 2025

23:06

1000

Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation

Aug 15, 2025

21:04

All Episodes