EPISODE · Jan 28, 2025 · 36 MIN
Episode 60: DeepSeek Models Explained Part I
from Machine Learning Made Simple · host Saugata Chatterjee
What if AI could match enterprise-grade performance at a fraction of the cost? In this episode, we dive deep into DeepSeek, the groundbreaking open-source models challenging tech giants with 95% lower costs. From innovative training optimizations to revolutionary data curation, discover how a resource-constrained startup is redefining what's possible in AI. 🎯 Episode Highlights: Beyond cost-cutting: How DeepSeek matches top-tier AI performance Game-changing memory optimization and pipeline parallelization Inside the technology: Zero-redundancy training and dependency parsing The future of efficient, accessible AI development Whether you're an ML engineer or AI enthusiast, learn how clever optimization is democratizing advanced AI capabilities. No GPU farm needed! References for main topic: [2401.02954] DeepSeek LLM: Scaling Open-Source Language Models with Longtermism DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence [2405.04434] DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model [2412.19437] DeepSeek-V3 Technical Report https://arxiv.org/abs/2501.12948 https://www.deepspeed.ai/2021/03/07/zero3-offload.html [1910.02054] ZeRO: Memory Optimizations Toward Training Trillion Parameter Models [2205.05198] Reducing Activation Recomputation in Large Transformer Models [2406.03488] Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training
NOW PLAYING
Episode 60: DeepSeek Models Explained Part I
No transcript for this episode yet
Similar Episodes
Apr 21, 2026 ·13m
Apr 19, 2026 ·16m
Apr 17, 2026 ·13m
Apr 13, 2026 ·11m
Apr 11, 2026 ·16m