Deep Dive in Research podcast artwork

PODCAST · technology

Deep Dive in Research

Discussion about interesting research papers

  1. 18

    The Optimal Architecture for Small Language Models

    This article details a systematic study of optimal architectures for small language models with approximately 70 million parameters. Researchers discovered that model performance follows a binary tier system determined by a specific hidden dimension threshold or a "Goldilocks" depth of 32 layers. While most traditional architectures performed similarly at this scale, diffusion models like the new Dhara-70M emerged as superior for high-speed throughput and factual accuracy. The study also highlights that converting existing models to diffusion architectures is ten times more efficient than training them from scratch. Ultimately, the findings suggest that model shape and inference style are more critical than specific family designs for small-scale efficiency.

  2. 17

    OpenEvolve Hindi Overview

    A brief overview of the OpenEvolve evolutionary coding agent in Hindi.

  3. 16

    Ellora: Standardized Recipes for LoRA and LLM Enhancement

    The text presents Ellora, a collection of standardized, production-ready methodologies, referred to as recipes, for enhancing Large Language Models (LLMs) through Low-Rank Adaptation (LoRA). This approach is justified by the fact that LoRA achieves performance comparable to full fine-tuning while drastically reducing computational costs and training up to 10,000x fewer parameters. Ellora’s recipes often utilize self-supervised methods like the Magpie approach for data generation and confirm that combining parameter-efficient techniques with reinforcement learning yields significant speed and memory savings. The six structured recipes address diverse operational needs, including recovering model accuracy after quantization, extending context windows up to 2 million tokens, and teaching secure code generation. Specifically, one recipe demonstrates a 97% vulnerability reduction through automated security analysis and Group Relative Policy Optimization (GRPO). Ultimately, Ellora provides concrete, reproducible templates for practitioners to maximize model capabilities efficiently without requiring new, complex training frameworks.

  4. 15

    The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix

    Today's podcast is based on an article from Hugging Face detailing an extensive research project that addresses the high cost and scale of training modern large language models. The authors, through over 50 systematic experiments, sought to find an optimal data mixing strategy that would allow a GPT-2 model to achieve comparable performance to models trained on ten times the data. Their central finding is that a static dataset mix of 50% finePDFs, 30% DCLM-baseline, and 20% FineWeb-Edu significantly outperforms more complex curriculum learning approaches, which often led to catastrophic forgetting or overfitting. This optimal 50-30-20 mixture successfully trained a GPT-2-70M model that achieved over 90% of the original GPT-2's benchmark performance while using substantially fewer resources. The key takeaway is that dataset quality and intelligent composition are more critical than sheer quantity for training efficient language models.Read the full article on https://huggingface.co/blog/codelion/optimal-dataset-mixing

  5. 14

    Unsupervised Model Improvement Through Internal Coherence Maximization

    https://huggingface.co/blog/codelion/internal-coherence-maximizationThe article presents a novel method for improving large language models (LLMs) called Internal Coherence Maximization (ICM) combined with Direct Preference Optimization (DPO), which operates without any human supervision. This unsupervised approach demonstrates superior performance in mathematical reasoning tasks compared to traditional human-supervised methods like Group Relative Policy Optimization (GRPO). Key contributions include a complete implementation of ICM with diverse solution generation and a pipeline to convert ICM results into preference pairs for DPO training. The research also shows successful cross-model capability transfer, where knowledge from a stronger model (Qwen3) improves a weaker one (Gemma3), offering a scalable and cost-effective alternative to current LLM alignment paradigms. The authors emphasize that pretrained models already possess rich understanding, and ICM+DPO offers a way to elicit and refine this internal coherence, leading to better performance without the bottleneck of human annotation.

  6. 13

    EDINET-Bench: LLMs on Japanese Financial Tasks

    The article introduces EDINET-Bench, a novel open-source Japanese financial benchmark designed to evaluate Large Language Models (LLMs) on complex financial tasks. This benchmark addresses the scarcity of challenging Japanese financial datasets for LLM evaluation, crucial for tasks like accounting fraud detection, earnings forecasting, and industry prediction. The EDINET-Bench dataset is automatically compiled from ten years of Japanese annual reports available through the Electronic Disclosure for Investors’ NETwork (EDINET). Initial evaluations indicate that even state-of-the-art LLMs perform only marginally better than logistic regression in some complex financial tasks, highlighting the need for domain-specific adaptation and further research. The project makes its dataset, benchmark construction code, and evaluation code publicly available to foster advancements in LLM applications within the financial sector.

  7. 12

    AutoThink: Efficient LLM Reasoning with Adaptive Budgeting

    The article introduces AutoThink, an innovative approach designed to enhance the inference efficiency and accuracy of reasoning Large Language Models (LLMs). AutoThink addresses the challenge of LLMs generating excessive or insufficient reasoning tokens, which leads to computational inefficiency and suboptimal performance. This system comprises two main components: a query complexity classifier that dynamically allocates the optimal number of reasoning tokens, and a dataset of control vectors derived from "pivotal tokens" to guide the LLM's reasoning path. Experimental results demonstrate that AutoThink significantly reduces output tokens while substantially improving accuracy on complex reasoning tasks, suggesting a more strategic approach to LLM resource allocation rather than simply increasing computation.

  8. 11

    System Prompt Learning for LLM Problem-Solving Strategies

    The article introduces System Prompt Learning (SPL), an innovative approach enabling Large Language Models (LLMs) to learn and refine problem-solving strategies through practical experience. This method addresses the current disparity where most developers lack the sophisticated system prompts that make advanced AI assistants so capable. SPL represents a "third paradigm" of LLM learning, augmenting traditional pretraining and finetuning by allowing models to classify problems, apply relevant strategies, and continuously improve these strategies over time. The system maintains a dynamic database of human-readable strategies, demonstrating significant performance improvements across various benchmarks and offering benefits like cumulative learning, transparency, and adaptability. Implemented as an open-source plugin in optillm, SPL offers a practical way to integrate this adaptive intelligence into LLM applications.

  9. 10

    OpenEvolve: Open Source AlphaEvolve Implementation

    This article introduces OpenEvolve, an open-source implementation of Google DeepMind's AlphaEvolve, a system that leverages Large Language Models (LLMs) in an evolutionary framework to generate and optimize code. OpenEvolve allows users to evolve entire codebases by iteratively creating modifications using LLMs, evaluating them with automated metrics, and selecting promising solutions through an evolutionary process. The article details OpenEvolve's architecture, highlighting its key components like the Prompt Sampler and LLM Ensemble, and provides examples demonstrating its ability to achieve results comparable to AlphaEvolve in complex problems such as circle packing and function minimization, showcasing the evolution from simpler algorithms to more sophisticated solutions. It also discusses the importance of LLM performance and diversity for successful evolution and provides guidance on how to install and use the software for developing and improving algorithms.

  10. 9

    PTS: Pivotal Token Search

    This paper introduces Pivotal Token Search (PTS), a novel method for improving the performance of large language models by focusing on critical decision points in their output sequences. Unlike traditional methods that treat all generated tokens equally, PTS identifies "pivotal tokens" that significantly influence the probability of a successful generation. By using a binary search algorithm to pinpoint these key tokens, PTS generates preference pairs specifically centered on these critical decisions, leading to a more efficient learning signal during training. The release includes an open-source implementation, datasets of pivotal tokens and preference pairs, and fine-tuned models demonstrating the technique's effectiveness. This approach has potential applications in improving reasoning abilities, agent trajectories, and model interpretability.

  11. 8

    CameraBench: Understanding Video Motion

    This episode introduces CameraBench, a large-scale dataset and benchmark designed to improve camera motion understanding in videos. It details a taxonomy of camera motion primitives developed with cinematographers, highlighting how motions can relate to scene content like tracking subjects. The authors describe a rigorous annotation framework and human study demonstrating how domain expertise and training enhance annotation accuracy. Using CameraBench, they evaluate both Structure-from-Motion (SfM) and Video-Language Models (VLMs), finding that SfM struggles with semantic primitives while VLMs struggle with precise geometric motions. Finally, they show that fine-tuning a generative VLM on CameraBench significantly improves performance on tasks like motion-augmented captioning and video question answering.

  12. 7

    Step1X-Edit: General Image Editing Framework

    This epidsode introduces Step1X-Edit, an open-source image editing model designed to close the performance gap with proprietary models like GPT-4o. The developers created a large-scale, high-quality dataset and a new benchmark (GEdit-Bench) reflecting real-world editing instructions to train and evaluate the model. Step1X-Edit integrates a Multimedia Large Language Model (MLLM) with a diffusion-based image decoder to perform diverse edits based on natural language instructions. Experimental results indicate that Step1X-Edit outperforms existing open-source models and achieves performance comparable to leading closed-source systems.

  13. 6

    VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models

    Visual reasoning is a core component of human intelligence and a critical capabilityfor advanced multimodal models. Yet current reasoning evaluations of multimodallarge language models (MLLMs) often rely on text descriptions and allow languagebased reasoning shortcuts, failing to measure genuine vision-centric reasoning.To address this, we introduce VisuLogic: a benchmark of 1,000 human-verifiedproblems across six categories (e.g., quantitative shifts, spatial relations, attributecomparisons). These various types of questions can be evaluated to assess the visualreasoning capabilities of MLLMs from multiple perspectives. We evaluate leadingMLLMs on this benchmark and analyze their results to identify common failuremodes. Most models score below 30% accuracy—only slightly above the 25% random baseline and far below the 51.4% achieved by humans—revealing significantgaps in visual reasoning. Furthermore, we provide a supplementary training datasetand a reinforcement-learning baseline to support further progress. Code, data, andbaselines are available at https://visulogic-benchmark.github.io/VisuLogic.

  14. 5

    Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

    Reinforcement Learning with Verifiable Rewards (RLVR) has recently demonstrated notable success in enhancing the reasoning capabilities of LLMs, particularly in mathematics and programming tasks. It is widely believed that RLVR enables LLMs to continuously self-improve, thus acquiring novel reasoning abilities that exceed corresponding base models' capacity. In this study, however, we critically re-examines this assumption by measuring the pass@k metric with large values of k to explore the reasoning capability boundary of the models across a wide range of model families and benchmarks. Surprisingly, the RL does not, in fact, elicit fundamentally new reasoning patterns. While RL-trained models outperform their base models at smaller values of k (\eg, k=1), base models can achieve a comparable or even higher pass@k score compared to their RL counterparts at large k values. The reasoning paths generated by RL-trained models are already included in the base models' sampling distribution, suggesting that most reasoning abilities manifested in RL-trained models are already obtained by base models. Further analysis shows that RL training boosts the performance by biasing the model's output distribution toward paths that are more likely to yield rewards, therefore sampling correct responses more efficiently. But this also results in a narrower reasoning capability boundary compared to base models. Similar results are observed in visual reasoning tasks trained with RLVR. Moreover, we find that distillation can genuinely introduce new knowledge into the model, different from RLVR. These findings underscore a critical limitation of RLVR in advancing LLM reasoning abilities which requires us to fundamentally rethink the impact of RL training in reasoning LLMs and the need of a better paradigm. Project Page: https://limit-of-RLVR.github.io

  15. 4

    Learning to Reason under Off-Policy Guidance

    Recent advances in large reasoning models (LRMs) demonstrate that sophisticated behaviors such as multi-step reasoning and self-reflection can emerge via reinforcement learning (RL) with simple rule-based rewards. However, existing zero-RL approaches are inherently ``on-policy'', limiting learning to a model's own outputs and failing to acquire reasoning abilities beyond its initial capabilities. We introduce LUFFY (Learning to reason Under oFF-policY guidance), a framework that augments zero-RL with off-policy reasoning traces. LUFFY dynamically balances imitation and exploration by combining off-policy demonstrations with on-policy rollouts during training. Notably, we propose policy shaping via regularized importance sampling to avoid superficial and rigid imitation during mixed-policy training. Remarkably, LUFFY achieves an over +7.0 average gain across six math benchmarks and an advantage of over +6.2 points in out-of-distribution tasks. It also substantially surpasses imitation-based supervised fine-tuning (SFT), particularly in generalization. Analysis shows LUFFY not only imitates effectively but also explores beyond demonstrations, offering a scalable path to train generalizable reasoning models with off-policy guidance.

  16. 3

    AI's Potential to Transform the World

    This episode explores a hopeful vision of the future with powerful AI, focusing on how AI could revolutionize five key areas: biology and health, neuroscience and mind, economic development and poverty, peace and governance, and work and meaning. Join us as we examine the potential of AI to solve humanity’s biggest challenges and unlock a future of abundance and well-being for everyone.

  17. 2

    Contents On the Nature of Time

    This text explores the nature of time from a computational perspective. It argues that time is not a fundamental coordinate but rather a consequence of the universe's computational processes. The author proposes that time is "the progressive doing of computation by the universe," and that our perception of time arises from our own computational limitations as observers. The text further suggests that the universe's computational irreducibility, the idea that there is no shortcut to understanding a system's evolution, contributes to the robustness of time as a unidirectional flow. The author also examines the concepts of multiple threads of time, the ruliad (the totality of all possible computational processes), and the role of computational boundedness in shaping our perception of time and physical laws.

  18. 1

    MovieGen: A Detailed Review of Meta's Text-to-Video Generation System

    This research paper describes the development and capabilities of "Movie Gen," a new suite of generative AI models that produce high-quality, realistic videos and audio. The paper highlights key advancements in text-to-video and video-to-audio synthesis, video editing, and video personalization. The authors detail their models' architecture, training procedures, and evaluation metrics, demonstrating superior performance compared to existing commercial and open-source solutions. This research aims to advance the field of media generation and enable new creative possibilities.

Type above to search every episode's transcript for a word or phrase. Matches are scoped to this podcast.

Searching…

We're indexing this podcast's transcripts for the first time — this can take a minute or two. We'll show results as soon as they're ready.

No matches for "" in this podcast's transcripts.

Showing of matches

No topics indexed yet for this podcast.

Loading reviews...

ABOUT THIS SHOW

Discussion about interesting research papers

HOSTED BY

NotebookLM

CATEGORIES

Frequently Asked Questions

How many episodes does Deep Dive in Research have?

Deep Dive in Research currently has 18 episodes available on PodParley. New episodes are automatically indexed when they're published to the podcast feed.

What is Deep Dive in Research about?

Discussion about interesting research papers

How often does Deep Dive in Research release new episodes?

Deep Dive in Research has 18 episodes. Check the episode list to see recent publication dates and frequency.

Where can I listen to Deep Dive in Research?

You can listen to Deep Dive in Research on PodParley by clicking any episode. We provide an embedded audio player for direct listening, and you can also subscribe via your preferred podcast app using the RSS feed.

Who hosts Deep Dive in Research?

Deep Dive in Research is created and hosted by NotebookLM.
URL copied to clipboard!