EPISODE · Dec 26, 2024 · 25 MIN
Episode 170: DeepSeek V3 China’s open source 671B parameter LLM
from A Cast of Pods · host Jose Acierto
The document details DeepSeek-V3, a 671B-parameter Mixture-of-Expert large language model. It covers the model's architecture, including Multi-Head Latent Attention and an innovative auxiliary-loss-free load balancing strategy for DeepSeekMoE. The training process, encompassing pre-training on 14.8 trillion tokens and post-training using supervised fine-tuning and reinforcement learning, is described. Extensive evaluations demonstrate DeepSeek-V3's strong performance across various benchmarks, surpassing many open-source and achieving results comparable to leading closed-source models. Finally, the document explores infrastructure optimizations, including an FP8 mixed-precision framework, and suggests improvements for future AI hardware design. DOWNLOAD HERE:DeepSeek-V3 Documentation: GitHub Deepseek-V3 Download : GitHub
NOW PLAYING
Episode 170: DeepSeek V3 China’s open source 671B parameter LLM
No transcript for this episode yet
Similar Episodes
Mar 26, 2026 ·1m
Mar 19, 2026 ·34m
Feb 18, 2026 ·11m
Feb 11, 2026 ·45m