EP39 · DeepSeek V4 / 理解债 / Agent 产品设计 · 04.26 早报 episode artwork

EPISODE · Apr 26, 2026 · 20 MIN

EP39 · DeepSeek V4 / 理解债 / Agent 产品设计 · 04.26 早报

from BestBlogs

今日精讲DeepSeek V4 报告太详尽了!484 天换代之路全公开今日精讲 ①:量子位逐条对照 DeepSeek V4 技术报告。V4-Pro 1.6T 参数 / 49B 激活,V4-Flash 284B / 13B,1M 上下文标配,单 token FLOPs 砍到 V3.2 的 27%、KV cache 只剩 10%。架构动了三处:mHC(流形约束超连接)把残差流约束到双随机矩阵的 Birkhoff polytope 上,谱范数硬上限不爆炸;CSA / HCA 混合稀疏注意力交替叠加,CSA 先压再选 top-k、HCA 直接 128 倍硬压做全局信号;Muon 主优化器接管绝大多数参数,AdamW 只留给 embedding 和 RMSNorm。后训练把传统 mixed RL 换成 OPD(On-Policy Distillation)+ 三档推理强度。Codeforces ...去小宇宙查看完整单集简介前往小宇宙评论区与主播互动

NOW PLAYING

EP39 · DeepSeek V4 / 理解债 / Agent 产品设计 · 04.26 早报

0:00 20:43

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

No similar episodes found.

No similar podcasts found.

Frequently Asked Questions

How long is this episode of BestBlogs?

This episode is 20 minutes long.

When was this BestBlogs episode published?

This episode was published on April 26, 2026.

What is this episode about?

今日精讲DeepSeek V4 报告太详尽了!484 天换代之路全公开今日精讲 ①:量子位逐条对照 DeepSeek V4 技术报告。V4-Pro 1.6T 参数 / 49B 激活,V4-Flash 284B / 13B,1M 上下文标配,单 token FLOPs 砍到 V3.2 的 27%、KV cache 只剩 10%。架构动了三处:mHC(流形约束超连接)把残差流约束到双随机矩阵的 Birkhoff polytope 上,谱范数硬上限不爆炸;CSA / HCA...

Can I download this BestBlogs episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!