Unzip cover art

All Episodes

Unzip — 72 episodes

#
Title
1

SWE-WebDevBench: Evaluating Coding Agent Application Platforms as Virtual Software Agencies

2

Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL

3

HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help?

4

From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills

5

Co-Evolving Policy Distillation

6

LLM as Clinical Graph Structure Refiner: Enhancing Representation Learning in EEG Seizure Diagnosis

7

FlashRT: Towards Computationally and Memory Efficient Red-Teaming for Prompt Injection and Knowledge Corruption

8

Operating-Layer Controls for Onchain Language-Model Agents Under Real Capital

9

Co-Director: Agentic Generative Video Storytelling

10

PageGuide: Browser extension to assist users in navigating a webpage and locating information

11

DiffNR: Diffusion-Enhanced Neural Representation Optimization for Sparse-View 3D Tomographic Reconstruction

12

Temporal Taskification in Streaming Continual Learning: A Source of Evaluation Instability

13

Coevolving Representations in Joint Image-Feature Diffusion

14

Seeing Fast and Slow: Learning the Flow of Time in Videos

15

Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts

16

Mind's Eye: A Benchmark of Visual Abstraction, Transformation and Composition for Multimodal LLMs

17

MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval

18

Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips

19

Boosting Visual Instruction Tuning with Self-Supervised Guidance

20

Three-Phase Transformer

21

MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation

22

VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors

23

Therefore I am. I Think

24

Video Models Reason Early: Exploiting Plan Commitment for Maze Solving

25

When Users Change Their Mind: Evaluating Interruptible Agents in Long-Horizon Web Navigation

26

BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation

27

A Comparative Study in Surgical AI: Datasets, Foundation Models, and Barriers to Med-AGI

28

ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling

29

Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes

30

AVControl: Efficient Framework for Training Audio-Visual Controls

31

Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models

32

EVA: Efficient Reinforcement Learning for End-to-End Video Agent

33

Abstraction as a Memory-Efficient Inductive Bias for Continual Learning

34

Repurposing Geometric Foundation Models for Multi-view Diffusion

35

Measuring Faithfulness Depends on How You Measure: Classifier Sensitivity in LLM Chain-of-Thought Evaluation

36

Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation

37

Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens

38

Cognitive Mismatch in Multimodal Large Language Models for Discrete Symbol Understanding

39

GMT: Goal-Conditioned Multimodal Transformer for 6-DOF Object Trajectory Synthesis in 3D Scenes

40

ECG-Reasoning-Benchmark: A Benchmark for Evaluating Clinical Reasoning Capabilities in ECG Interpretation

41

POLCA: Stochastic Generative Optimization with LLM

42

VoXtream2: Full-stream TTS with dynamic speaking rate control

43

Autonomous Agents Coordinating Distributed Discovery Through Emergent Artifact Exchange

44

SciMDR: Benchmarking and Advancing Scientific Multimodal Document Reasoning

45

COMIC: Agentic Sketch Comedy Generation

46

Do What I Say: A Spoken Prompt Dataset for Instruction-Following

47

Variational Flow Maps: Make Some Noise for One-Step Conditional Generation

48

Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation

49

SkillNet: Create, Evaluate, and Connect AI Skills

50

SageBwd: A Trainable Low-bit Attention

51

MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models

52

MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models

53

ParEVO: Synthesizing Code for Irregular Data: High-Performance Parallelism through Agentic Evolution

54

PhotoBench: Beyond Visual Matching Towards Personalized Intent-Driven Photo Retrieval

55

Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets

56

SeeThrough3D: Occlusion Aware 3D Control in Text-to-Image Generation

57

Overconfident Errors Need Stronger Correction: Asymmetric Confidence Penalties for Reinforcement Learning

58

MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios

59

ISO-Bench: Can Coding Agents Optimize Real-World Inference Workloads?

60

Benchmark Test-Time Scaling of General LLM Agents

61

Anatomy of Agentic Memory: Taxonomy and Empirical Analysis of Evaluation and System Limitations

62

ReIn: Conversational Error Recovery with Reasoning Inception

63

"What Are You Doing?": Effects of Intermediate Feedback from Agentic LLM In-Car Assistants During Multi-Step Processing

64

"What Are You Doing?": Effects of Intermediate Feedback from Agentic LLM In-Car Assistants During Multi-Step Processing

65

"What Are You Doing?": Effects of Intermediate Feedback from Agentic LLM In-Car Assistants During Multi-Step Processing

66

BiManiBench: A Hierarchical Benchmark for Evaluating Bimanual Coordination of Multimodal Large Language Models

67

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

68

VisPhyWorld: Probing Physical Reasoning via Code-Driven Video Reconstruction

69

SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise

70

Thinking with Drafting: Optical Decompression via Logical Reconstruction

71

Thinking with Drafting: Optical Decompression via Logical Reconstruction

72

CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty