Unzip Podcast - All Episodes

79

SWE-WebDevBench: Evaluating Coding Agent Application Platforms as Virtual Software Agencies

## Episode Summary In this episode, we cover: - **SWE-WebDevBench: Evaluating Coding Agent Application Platforms as Virtual Software Agencies** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.04637) - **CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.02910) - **MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.27393) - **When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.03314) - **ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.00380) --- *Sponsored by LimitLess AI*

May 7, 2026

78

Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL

## Episode Summary In this episode, we cover: - **Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.28123) - **Audio-Visual Intelligence in Large Foundation Models** (arXiv) - [Read more](http://arxiv.org/abs/2605.04045v1) - **X2SAM: Any Segmentation in Images and Videos** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.00891) - **Skills-Coach: A Self-Evolving Skill Optimizer via Training-Free GRPO** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.27488) - **Reinforcement Learning for LLM-based Multi-Agent Systems through Orchestration Traces** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.02801) --- *Sponsored by LimitLess AI*

May 6, 2026

77

HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help?

## Episode Summary In this episode, we cover: - **HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help?** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.09408) - **Counting as a minimal probe of language model reliability** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.02028) - **Linear-Time Global Visual Modeling without Explicit Attention** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.01711) - **Assessing Pancreatic Ductal Adenocarcinoma Vascular Invasion: the PDACVI Benchmark** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.27582) - **Prior-Aligned Data Cleaning for Tabular Foundation Models** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.25154) --- *Sponsored by LimitLess AI*

May 5, 2026

76

From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills

## Episode Summary In this episode, we cover: - **From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.24026) - **Web2BigTable: A Bi-Level Multi-Agent LLM System for Internet-Scale Information Search and Extraction** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.27221) - **When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models** (arXiv) - [Read more](http://arxiv.org/abs/2605.00817v1) - **Stable-GFlowNet: Toward Diverse and Robust LLM Red-Teaming via Contrastive Trajectory Balance** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2605.00553) - **Let ViT Speak: Generative Language-Image Pre-training** (arXiv) - [Read more](http://arxiv.org/abs/2605.00809v1) --- *Sponsored by LimitLess AI*

May 4, 2026

75

Co-Evolving Policy Distillation

## Episode Summary In this episode, we cover: - **Co-Evolving Policy Distillation** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.27083) - **Instruction-Guided Poetry Generation in Arabic and Its Dialects** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.27766) - **Efficient Training on Multiple Consumer GPUs with RoundPipe** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.27085) - **InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation?** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.27419) - **Safety Drift After Fine-Tuning: Evidence from High-Stakes Domains** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.24902) --- *Sponsored by LimitLess AI*

May 3, 2026

74

LLM as Clinical Graph Structure Refiner: Enhancing Representation Learning in EEG Seizure Diagnosis

## Episode Summary In this episode, we cover: - **LLM as Clinical Graph Structure Refiner: Enhancing Representation Learning in EEG Seizure Diagnosis** (arXiv) - [Read more](http://arxiv.org/abs/2604.28178v1) - **Step-level Optimization for Efficient Computer-use Agents** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.27151) - **AEGIS: A Holistic Benchmark for Evaluating Forensic Analysis of AI-Generated Academic Images** (arXiv) - [Read more](http://arxiv.org/abs/2604.28177v1) - **Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.24954) - **Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.28139) --- *Sponsored by LimitLess AI*

May 2, 2026

73

FlashRT: Towards Computationally and Memory Efficient Red-Teaming for Prompt Injection and Knowledge Corruption

## Episode Summary In this episode, we cover: - **FlashRT: Towards Computationally and Memory Efficient Red-Teaming for Prompt Injection and Knowledge Corruption** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.28157) - **Length Value Model: Scalable Value Pretraining for Token-Level Length Modeling** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.27039) - **Leveraging Verifier-Based Reinforcement Learning in Image Editing** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.27505) - **Compliance versus Sensibility: On the Reasoning Controllability in Large Language Models** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.27251) - **Exploration Hacking: Can LLMs Learn to Resist RL Training?** (arXiv) - [Read more](http://arxiv.org/abs/2604.28182v1) --- *Sponsored by LimitLess AI*

May 1, 2026

72

Operating-Layer Controls for Onchain Language-Model Agents Under Real Capital

## Episode Summary In this episode, we cover: - **Operating-Layer Controls for Onchain Language-Model Agents Under Real Capital** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.26091) - **GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.26752) - **Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.26951) - **RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.26067) - **Select to Think: Unlocking SLM Potential with Local Sufficiency** (arXiv) - [Read more](http://arxiv.org/abs/2604.26940v1) --- *Sponsored by LimitLess AI*

Apr 30, 2026

71

Co-Director: Agentic Generative Video Storytelling

## Episode Summary In this episode, we cover: - **Co-Director: Agentic Generative Video Storytelling** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.24842) - **Recursive Multi-Agent Systems** (arXiv) - [Read more](http://arxiv.org/abs/2604.25917v1) - **AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.25256) - **Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.25636) - **Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models** (arXiv) - [Read more](http://arxiv.org/abs/2604.25903v1) --- *Sponsored by LimitLess AI*

Apr 29, 2026

70

PageGuide: Browser extension to assist users in navigating a webpage and locating information

## Episode Summary In this episode, we cover: - **PageGuide: Browser extension to assist users in navigating a webpage and locating information** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.23772) - **Discovering Agentic Safety Specifications from 1-Bit Danger Signals** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.23210) - **UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.17565) - **The Chameleon's Limit: Investigating Persona Collapse and Homogenization in Large Language Models** (arXiv) - [Read more](http://arxiv.org/abs/2604.24698v1) - **Sapiens2** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.21681) --- *Sponsored by LimitLess AI*

Apr 28, 2026

69

DiffNR: Diffusion-Enhanced Neural Representation Optimization for Sparse-View 3D Tomographic Reconstruction

## Episode Summary In this episode, we cover: - **DiffNR: Diffusion-Enhanced Neural Representation Optimization for Sparse-View 3D Tomographic Reconstruction** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.21518) - **LLM Safety From Within: Detecting Harmful Content with Internal Representations** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.18519) - **Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.22294) - **Learning Evidence Highlighting for Frozen LLMs** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.22565) - **How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks** (arXiv) - [Read more](http://arxiv.org/abs/2604.22750v1) --- *Sponsored by LimitLess AI*

Apr 27, 2026

68

Temporal Taskification in Streaming Continual Learning: A Source of Evaluation Instability

## Episode Summary In this episode, we cover: - **Temporal Taskification in Streaming Continual Learning: A Source of Evaluation Instability** (arXiv) - [Read more](http://arxiv.org/abs/2604.21930v1) - **WebGen-R1: Incentivizing Large Language Models to Generate Functional and Aesthetic Websites with Reinforcement Learning** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.20398) - **Hybrid Policy Distillation for LLMs** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.20244) - **PersonalAI: A Systematic Comparison of Knowledge Graph Storage and Retrieval Approaches for Personalized LLM agents** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2506.17001) - **When Prompts Override Vision: Prompt-Induced Hallucinations in LVLMs** (arXiv) - [Read more](http://arxiv.org/abs/2604.21911v1) --- *Sponsored by LimitLess AI*

Apr 26, 2026

67

Coevolving Representations in Joint Image-Feature Diffusion

## Episode Summary In this episode, we cover: - **Coevolving Representations in Joint Image-Feature Diffusion** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.17492) - **Trust but Verify: Introducing DAVinCI -- A Framework for Dual Attribution and Verification in Claim Inference for Language Models** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.21193) - **3D-VCD: Hallucination Mitigation in 3D-LLM Embodied Agents through Visual Contrastive Decoding** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.08645) - **Nemobot Games: Crafting Strategic AI Gaming Agents for Interactive Learning with Large Language Models** (arXiv) - [Read more](http://arxiv.org/abs/2604.21896v1) - **Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.20987) --- *Sponsored by LimitLess AI*

Apr 25, 2026

66

Seeing Fast and Slow: Learning the Flow of Time in Videos

## Episode Summary In this episode, we cover: - **Seeing Fast and Slow: Learning the Flow of Time in Videos** (arXiv) - [Read more](http://arxiv.org/abs/2604.21931v1) - **LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.17295) - **Encoder-Free Human Motion Understanding via Structured Motion Descriptions** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.21668) - **MathDuels: Evaluating LLMs as Problem Posers and Solvers** (arXiv) - [Read more](http://arxiv.org/abs/2604.21916v1) - **From Research Question to Scientific Workflow: Leveraging Agentic AI for Science Automation** (arXiv) - [Read more](http://arxiv.org/abs/2604.21910v1) --- *Sponsored by LimitLess AI*

Apr 24, 2026

65

Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts

## Episode Summary In this episode, we cover: - **Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.19835) - **Benign Fine-Tuning Breaks Safety Alignment in Audio LLMs** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.16659) - **DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.20841) - **Abstain-R1: Calibrated Abstention and Post-Refusal Clarification via Verifiable RL** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.17073) - **COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.20720) --- *Sponsored by LimitLess AI*

Apr 23, 2026

64

Mind's Eye: A Benchmark of Visual Abstraction, Transformation and Composition for Multimodal LLMs

## Episode Summary In this episode, we cover: - **Mind's Eye: A Benchmark of Visual Abstraction, Transformation and Composition for Multimodal LLMs** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.16054) - **What Makes an LLM a Good Optimizer? A Trajectory Analysis of LLM-Guided Evolutionary Search** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.19440) - **RDP LoRA: Geometry-Driven Identification for Parameter-Efficient Adaptation in Large Language Models** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.19321) - **Epistemic orientation in parliamentary discourse is associated with deliberative democracy** (arXiv) - [Read more](http://arxiv.org/abs/2604.19699v1) - **Mitigating Multimodal Hallucination via Phase-wise Self-reward** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.17982) --- *Sponsored by LimitLess AI*

Apr 22, 2026

63

MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval

## Episode Summary In this episode, we cover: - **MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval** (arXiv) - [Read more](http://arxiv.org/abs/2604.18584v1) - **MARCO: Navigating the Unseen Space of Semantic Correspondence** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.18267) - **Agentic Forecasting using Sequential Bayesian Updating of Linguistic Beliefs** (arXiv) - [Read more](http://arxiv.org/abs/2604.18576v1) - **On the Robustness of LLM-Based Dense Retrievers: A Systematic Analysis of Generalizability and Stability** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.16576) - **When Can LLMs Learn to Reason with Weak Supervision?** (arXiv) - [Read more](http://arxiv.org/abs/2604.18574v1) --- *Sponsored by LimitLess AI*

Apr 21, 2026

62

Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips

## Episode Summary In this episode, we cover: - **Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2502.07408) - **The Amazing Agent Race: Strong Tool Users, Weak Navigators** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.10261) - **Evaluating the Progression of Large Language Model Capabilities for Small-Molecule Drug Design** (arXiv) - [Read more](http://arxiv.org/abs/2604.16279v1) - **Where Do Vision-Language Models Fail? World Scale Analysis for Image Geolocalization** (arXiv) - [Read more](http://arxiv.org/abs/2604.16248v1) - **Elucidating the SNR-t Bias of Diffusion Probabilistic Models** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.16044) --- *Sponsored by LimitLess AI*

Apr 20, 2026

61

Boosting Visual Instruction Tuning with Self-Supervised Guidance

## Episode Summary In this episode, we cover: - **Boosting Visual Instruction Tuning with Self-Supervised Guidance** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.12966) - **Think in Latent Thoughts: A New Paradigm for Gloss-Free Sign Language Translation** (arXiv) - [Read more](http://arxiv.org/abs/2604.15301v1) - **SuperLocalMemory V3.3: The Living Brain -- Biologically-Inspired Forgetting, Cognitive Quantization, and Multi-Channel Retrieval for Zero-LLM Agent Memory Systems** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.04514) - **Generalization in LLM Problem Solving: The Case of the Shortest Path** (arXiv) - [Read more](http://arxiv.org/abs/2604.15306v1) - **Diagnosing LLM Judge Reliability: Conformal Prediction Sets and Transitivity Violations** (arXiv) - [Read more](http://arxiv.org/abs/2604.15302v1) --- *Sponsored by LimitLess AI*

Apr 19, 2026

60

Three-Phase Transformer

## Episode Summary In this episode, we cover: - **Three-Phase Transformer** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.14430) - **Beyond Prompts: Unconditional 3D Inversion for Out-of-Distribution Shapes** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.14914) - **TRACER: Trace-Based Adaptive Cost-Efficient Routing for LLM Classification** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.14531) - **Reinforcement Learning via Value Gradient Flow** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.14265) - **Switch-KD: Visual-Switch Knowledge Distillation for Vision-Language Models** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.14629) --- *Sponsored by LimitLess AI*

Apr 18, 2026

59

MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation

## Episode Summary In this episode, we cover: - **MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation** (arXiv) - [Read more](http://arxiv.org/abs/2604.15309v1) - **RadAgent: A tool-using AI agent for stepwise interpretation of chest computed tomography** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.15231) - **R3D: Revisiting 3D Policy Learning** (arXiv) - [Read more](http://arxiv.org/abs/2604.15281v1) - **RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework** (arXiv) - [Read more](http://arxiv.org/abs/2604.15308v1) - **Towards Autonomous Mechanistic Reasoning in Virtual Cells** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.11661) --- *Sponsored by LimitLess AI*

Apr 17, 2026

58

VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors

## Episode Summary In this episode, we cover: - **VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.02486) - **Xpertbench: Expert Level Tasks with Rubrics-Based Evaluation** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.02368) - **AgentSocialBench: Evaluating Privacy Risks in Human-Centered Agentic Social Networks** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.01487) - **AgentHazard: A Benchmark for Evaluating Harmful Behavior in Computer-Use Agents** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.02947) - **Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence?** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.03016) --- *Sponsored by LimitLess AI*

Apr 6, 2026

57

Therefore I am. I Think

## Episode Summary In this episode, we cover: - **Therefore I am. I Think** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.01202) - **Working Notes on Late Interaction Dynamics: Analyzing Targeted Behaviors of Late Interaction Models** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.26259) - **LOME: Learning Human-Object Manipulation with Action-Conditioned Egocentric World Model** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.27449) - **ActionParty: Multi-Subject Action Binding in Generative Video Games** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.02330) - **An Empirical Recipe for Universal Phone Recognition** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.29042) --- *Sponsored by LimitLess AI*

Apr 5, 2026

56

Video Models Reason Early: Exploiting Plan Commitment for Maze Solving

## Episode Summary In this episode, we cover: - **Video Models Reason Early: Exploiting Plan Commitment for Maze Solving** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.30043) - **AIBench: Evaluating Visual-Logical Consistency in Academic Illustration Generation** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.28068) - **Signals: Trajectory Sampling and Triage for Agentic Interactions** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.00356) - **ASI-Evolve: AI Accelerates AI** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.29640) - **Steerable Visual Representations** (arXiv) - [Read more](http://arxiv.org/abs/2604.02327v1) --- *Sponsored by LimitLess AI*

Apr 4, 2026

55

When Users Change Their Mind: Evaluating Interruptible Agents in Long-Horizon Web Navigation

## Episode Summary In this episode, we cover: - **When Users Change Their Mind: Evaluating Interruptible Agents in Long-Horizon Web Navigation** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.00892) - **ClawKeeper: Comprehensive Safety Protection for OpenClaw Agents Through Skills, Plugins, and Watchers** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.24414) - **Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.26648) - **$\texttt{YC-Bench}$: Benchmarking AI Agents for Long-Term Planning and Consistent Execution** (arXiv) - [Read more](http://arxiv.org/abs/2604.01212v1) - **AgentWatcher: A Rule-based Prompt Injection Monitor** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2604.01194) --- *Sponsored by LimitLess AI*

Apr 2, 2026

54

BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation

## Episode Summary In this episode, we cover: - **BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.25732) - **The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.29025) - **Distilling Conversations: Abstract Compression of Conversational Audio Context for LLM-based ASR** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.26246) - **Colon-Bench: An Agentic Workflow for Scalable Dense Lesion Annotation in Full-Procedure Colonoscopy Videos** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.25645) - **CREval: An Automated Interpretable Evaluation for Creative Image Manipulation under Complex Instructions** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.26174) --- *Sponsored by LimitLess AI*

Apr 1, 2026

53

A Comparative Study in Surgical AI: Datasets, Foundation Models, and Barriers to Med-AGI

## Episode Summary In this episode, we cover: - **A Comparative Study in Surgical AI: Datasets, Foundation Models, and Barriers to Med-AGI** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.27341) - **On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.28762) - **STRIDE: When to Speak Meets Sequence Denoising for Streaming Video Understanding** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.27593) - **AdaptToken: Entropy-based Adaptive Token Selection for MLLM Long Video Understanding** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.28696) - **EpochX: Building the Infrastructure for an Emergent Agent Civilization** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.27304) --- *Sponsored by LimitLess AI*

Mar 31, 2026

52

ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling

## Episode Summary In this episode, we cover: - **ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.25746) - **LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.23607) - **MedOpenClaw: Auditable Medical Imaging Agents Reasoning over Uncurated Full Studies** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.24649) - **Composer 2 Technical Report** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.24477) - **PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.25730) --- *Sponsored by LimitLess AI*

Mar 30, 2026

51

Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes

## Episode Summary In this episode, we cover: - **Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.25562) - **VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.24575) - **S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.25702) - **AVO: Agentic Variation Operators for Autonomous Evolutionary Search** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.24517) - **Electrostatic Photoluminescence Tuning in All-Solid-State Perovskite Transistors** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.25718) --- *Sponsored by LimitLess AI*

Mar 29, 2026

50

AVControl: Efficient Framework for Training Audio-Visual Controls

## Episode Summary In this episode, we cover: - **AVControl: Efficient Framework for Training Audio-Visual Controls** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.24793) - **PMT: Plain Mask Transformer for Image and Video Segmentation with Frozen Vision Encoders** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.25398) - **IQuest-Coder-V1 Technical Report** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.16733) - **WAFT-Stereo: Warping-Alone Field Transforms for Stereo Matching** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.24836) - **Nudging Hidden States: Training-Free Model Steering for Chain-of-Thought Reasoning in Large Audio-Language Models** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.14636) --- *Sponsored by LimitLess AI*

Mar 28, 2026

49

Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models

## Episode Summary In this episode, we cover: - **Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.24844) - **Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.24800) - **Can MLLMs Read Students' Minds? Unpacking Multimodal Error Analysis in Handwritten Math** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.24961) - **SlotVTG: Object-Centric Adapter for Generalizable Video Temporal Grounding** (arXiv) - [Read more](http://arxiv.org/abs/2603.25733v1) - **MemMA: Coordinating the Memory Cycle through Multi-Agent Reasoning and In-Situ Self-Evolution** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.18718) --- *Sponsored by LimitLess AI*

Mar 27, 2026

48

EVA: Efficient Reinforcement Learning for End-to-End Video Agent

## Episode Summary In this episode, we cover: - **EVA: Efficient Reinforcement Learning for End-to-End Video Agent** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.22918) - **6Bit-Diffusion: Inference-Time Mixed-Precision Quantization for Video Diffusion Models** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.18742) - **PLDR-LLMs Reason At Self-Organized Criticality** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.23539) - **Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.24472) - **StreamingClaw Technical Report** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.22120) --- *Sponsored by LimitLess AI*

Mar 26, 2026

47

Abstraction as a Memory-Efficient Inductive Bias for Continual Learning

## Episode Summary In this episode, we cover: - **Abstraction as a Memory-Efficient Inductive Bias for Continual Learning** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.17198) - **AgentRVOS: Reasoning over Object Tracks for Zero-Shot Referring Video Object Segmentation** (arXiv) - [Read more](http://arxiv.org/abs/2603.23489v1) - **UniFunc3D: Unified Active Spatial-Temporal Grounding for 3D Functionality Segmentation** (arXiv) - [Read more](http://arxiv.org/abs/2603.23478v1) - **ReqFusion: A Multi-Provider Framework for Automated PEGS Analysis Across Software Domains** (arXiv) - [Read more](http://arxiv.org/abs/2603.23482v1) - **CanViT: Toward Active-Vision Foundation Models** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.22570) --- *Sponsored by LimitLess AI*

Mar 25, 2026

46

Repurposing Geometric Foundation Models for Multi-view Diffusion

## Episode Summary In this episode, we cover: - **Repurposing Geometric Foundation Models for Multi-view Diffusion** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.22275) - **BubbleRAG: Evidence-Driven Retrieval-Augmented Generation for Black-Box Knowledge Graphs** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.20309) - **FluidWorld: Reaction-Diffusion Dynamics as a Predictive Substrate for World Models** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.21315) - **SEM: Sparse Embedding Modulation for Post-Hoc Debiasing of Vision-Language Models** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.19028) - **VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding** (arXiv) - [Read more](http://arxiv.org/abs/2603.22285v1) --- *Sponsored by LimitLess AI*

Mar 24, 2026

45

Measuring Faithfulness Depends on How You Measure: Classifier Sensitivity in LLM Chain-of-Thought Evaluation

## Episode Summary In this episode, we cover: - **Measuring Faithfulness Depends on How You Measure: Classifier Sensitivity in LLM Chain-of-Thought Evaluation** (arXiv) - [Read more](http://arxiv.org/abs/2603.20172v1) - **Reasoning as Compression: Unifying Budget Forcing via the Conditional Information Bottleneck** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.08462) - **Breaking the Capability Ceiling of LLM Post-Training by Reintroducing Markov States** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.19987) - **CoVR-R:Reason-Aware Composed Video Retrieval** (arXiv) - [Read more](http://arxiv.org/abs/2603.20190v1) - **CurveStream: Boosting Streaming Video Understanding in MLLMs via Curvature-Aware Hierarchical Visual Memory Management** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.19571) --- *Sponsored by LimitLess AI*

Mar 23, 2026

44

Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation

## Episode Summary In this episode, we cover: - **Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.13045) - **NavTrust: Benchmarking Trustworthiness for Embodied Navigation** (arXiv) - [Read more](http://arxiv.org/abs/2603.19229v1) - **MOSS-TTS Technical Report** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.18090) - **SimulU: Training-free Policy for Long-form Simultaneous Speech-to-Speech Translation** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.16924) - **ReactMotion: Generating Reactive Listener Motions from Speaker Utterance** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.15083) --- *Sponsored by LimitLess AI*

Mar 22, 2026

43

Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens

## Episode Summary In this episode, we cover: - **Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens** (arXiv) - [Read more](http://arxiv.org/abs/2603.19232v1) - **Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding** (arXiv) - [Read more](http://arxiv.org/abs/2603.19235v1) - **DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative Latent Denoising** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.19216) - **Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation** (arXiv) - [Read more](http://arxiv.org/abs/2603.19220v1) - **FinTradeBench: A Financial Reasoning Benchmark for LLMs** (arXiv) - [Read more](http://arxiv.org/abs/2603.19225v1) --- *Sponsored by LimitLess AI*

Mar 21, 2026

42

Cognitive Mismatch in Multimodal Large Language Models for Discrete Symbol Understanding

## Episode Summary In this episode, we cover: - **Cognitive Mismatch in Multimodal Large Language Models for Discrete Symbol Understanding** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.18472) - **PARSA-Bench: A Comprehensive Persian Audio-Language Model Benchmark** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.14456) - **Tinted Frames: Question Framing Blinds Vision-Language Models** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.19203) - **What Really Controls Temporal Reasoning in Large Language Models: Tokenisation or Representation of Time?** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.19017) - **AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.18429) --- *Sponsored by LimitLess AI*

Mar 20, 2026

41

GMT: Goal-Conditioned Multimodal Transformer for 6-DOF Object Trajectory Synthesis in 3D Scenes

## Episode Summary In this episode, we cover: - **GMT: Goal-Conditioned Multimodal Transformer for 6-DOF Object Trajectory Synthesis in 3D Scenes** (arXiv) - [Read more](http://arxiv.org/abs/2603.17993v1) - **BenchPreS: A Benchmark for Context-Aware Personalized Preference Selectivity of Persistent-Memory LLMs** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.16557) - **Fanar-Sadiq: A Multi-Agent Architecture for Grounded Islamic QA** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.08501) - **FINER: MLLMs Hallucinate under Fine-grained Negative Queries** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.17662) - **Temporal Gains, Spatial Costs: Revisiting Video Fine-Tuning in Multimodal Large Language Models** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.17541) --- *Sponsored by LimitLess AI*

Mar 19, 2026

40

ECG-Reasoning-Benchmark: A Benchmark for Evaluating Clinical Reasoning Capabilities in ECG Interpretation

## Episode Summary In this episode, we cover: - **ECG-Reasoning-Benchmark: A Benchmark for Evaluating Clinical Reasoning Capabilities in ECG Interpretation** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.14326) - **ARISE: Agent Reasoning with Intrinsic Skill Evolution in Hierarchical Reinforcement Learning** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.16060) - **Demystifing Video Reasoning** (arXiv) - [Read more](http://arxiv.org/abs/2603.16870v1) - **Sparking Scientific Creativity via LLM-Driven Interdisciplinary Inspiration** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.12226) - **SuperLocalMemory V3: Information-Geometric Foundations for Zero-LLM Enterprise Agent Memory** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.14588) --- *Sponsored by LimitLess AI*

Mar 18, 2026

39

POLCA: Stochastic Generative Optimization with LLM

## Episode Summary In this episode, we cover: - **POLCA: Stochastic Generative Optimization with LLM** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.14769) - **Towards Generalizable Robotic Manipulation in Dynamic Environments** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.15620) - **HorizonMath: Measuring AI Progress Toward Mathematical Discovery with Automatic Verification** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.15617) - **Spectrum Matching: a Unified Perspective for Superior Diffusability in Latent Diffusion** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.14645) - **OxyGen: Unified KV Cache Management for Vision-Language-Action Models under Multi-Task Parallelism** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.14371) --- *Sponsored by LimitLess AI*

Mar 17, 2026

38

VoXtream2: Full-stream TTS with dynamic speaking rate control

## Episode Summary In this episode, we cover: - **VoXtream2: Full-stream TTS with dynamic speaking rate control** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.13518) - **VisionCoach: Reinforcing Grounded Video Reasoning via Visual-Perception Prompting** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.14659) - **Make it SING: Analyzing Semantic Invariants in Classifiers** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.14610) - **Autonomous Agents Coordinating Distributed Discovery Through Emergent Artifact Exchange** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.14312) - **Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.15557) --- *Sponsored by LimitLess AI*

Mar 15, 2026

37

Autonomous Agents Coordinating Distributed Discovery Through Emergent Artifact Exchange

## Episode Summary In this episode, we cover: - **Autonomous Agents Coordinating Distributed Discovery Through Emergent Artifact Exchange** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.14312) - **Garments2Look: A Multi-Reference Dataset for High-Fidelity Outfit-Level Virtual Try-On with Clothing and Accessories** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.14153) - **Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.15557) - **VoXtream2: Full-stream TTS with dynamic speaking rate control** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.13518) - **The PokeAgent Challenge: Competitive and Long-Context Learning at Scale** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.15563) --- *Sponsored by LimitLess AI*

Mar 14, 2026

36

SciMDR: Benchmarking and Advancing Scientific Multimodal Document Reasoning

## Episode Summary In this episode, we cover: - **SciMDR: Benchmarking and Advancing Scientific Multimodal Document Reasoning** (arXiv) - [Read more](http://arxiv.org/abs/2603.12249v1) - **GRADE: Benchmarking Discipline-Informed Reasoning in Image Editing** (arXiv) - [Read more](http://arxiv.org/abs/2603.12264v1) - **CREATE: Testing LLMs for Associative Creativity** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.09970) - **XSkill: Continual Learning from Experience and Skills in Multimodal Agents** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.12056) - **Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training** (arXiv) - [Read more](http://arxiv.org/abs/2603.12246v1) --- *Sponsored by LimitLess AI*

Mar 13, 2026

35

COMIC: Agentic Sketch Comedy Generation

## Episode Summary In this episode, we cover: - **COMIC: Agentic Sketch Comedy Generation** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.11048) - **StyleVLA: Driving Style-Aware Vision Language Action Model for Autonomous Driving** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.09482) - **In-Context Reinforcement Learning for Tool Use in Large Language Models** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.08068) - **Hindsight Credit Assignment for Long-Horizon LLM Agents** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.08754) - **Meissa: Multi-modal Medical Agentic Intelligence** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.09018) --- *Sponsored by LimitLess AI*

Mar 12, 2026

34

Do What I Say: A Spoken Prompt Dataset for Instruction-Following

## Episode Summary In this episode, we cover: - **Do What I Say: A Spoken Prompt Dataset for Instruction-Following** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.09881) - **Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.09906) - **Test-Driven AI Agent Definition (TDAD): Compiling Tool-Using Agents from Behavioral Specifications** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.08806) - **Micro-Diffusion Compression -- Binary Tree Tweedie Denoising for Online Probability Estimation** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.08771) - **PathMem: Toward Cognition-Aligned Memory Transformation for Pathology MLLMs** (arXiv) - [Read more](http://arxiv.org/abs/2603.09943v1) --- *Sponsored by LimitLess AI*

Mar 11, 2026

33

Variational Flow Maps: Make Some Noise for One-Step Conditional Generation

## Episode Summary In this episode, we cover: - **Variational Flow Maps: Make Some Noise for One-Step Conditional Generation** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.07276) - **Training-free Latent Inter-Frame Pruning with Attention Recovery** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.05811) - **Evaluating Financial Intelligence in Large Language Models: Benchmarking SuperInvesting AI with LLM Engines** (arXiv) - [Read more](http://arxiv.org/abs/2603.08704v1) - **ByteFlow: Language Modeling through Adaptive Byte Compression without a Tokenizer** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.03583) - **Agentic Planning with Reasoning for Image Styling via Offline RL** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.07148) --- *Sponsored by LimitLess AI*

Mar 10, 2026

32

Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation

## Episode Summary In this episode, we cover: - **Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.05494) - **Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion** (arXiv) - [Read more](http://arxiv.org/abs/2603.06577v1) - **τ-Knowledge: Evaluating Conversational Agents over Unstructured Knowledge** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.04370) - **Mario: Multimodal Graph Reasoning with Large Language Models** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.05181) - **IF-RewardBench: Benchmarking Judge Models for Instruction-Following Evaluation** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.04738) --- *Sponsored by LimitLess AI*

Mar 9, 2026

31

SkillNet: Create, Evaluate, and Connect AI Skills

## Episode Summary In this episode, we cover: - **SkillNet: Create, Evaluate, and Connect AI Skills** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.04448) - **POET-X: Memory-efficient LLM Training by Scaling Orthogonal Transformation** (arXiv) - [Read more](http://arxiv.org/abs/2603.05500v1) - **AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2602.23166) - **Large Multimodal Models as General In-Context Classifiers** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2602.23229) - **Mozi: Governed Autonomy for Drug Discovery LLM Agents** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.03655) --- *Sponsored by LimitLess AI*

Mar 8, 2026

30

SageBwd: A Trainable Low-bit Attention

## Episode Summary In this episode, we cover: - **SageBwd: A Trainable Low-bit Attention** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.02170) - **Lightweight Visual Reasoning for Socially-Aware Robots** (Hugging Face Daily) - [Read more](https://huggingface.co/papers/2603.03942) - **Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought** (arXiv) - [Read more](http://arxiv.org/abs/2603.05488v1) - **Accelerating Text-to-Video Generation with Calibrated Sparse Attention** (arXiv) - [Read more](http://arxiv.org/abs/2603.05503v1) - **Leveraging LLM Parametric Knowledge for Fact Checking without Retrieval** (arXiv) - [Read more](http://arxiv.org/abs/2603.05471v1) --- *Sponsored by LimitLess AI*

Mar 7, 2026