EPISODE · Jun 10, 2025 · 18 MIN
An Analysis of Xiaohongshu's dots.llm1 MoE Model
from Rapid Synthesis: Delivered under 30 mins..ish, or it's on me! · host Benjamin Alloul 🗪 🅽🅾🆃🅴🅱🅾🅾🅺🅻🅼
Xiaohongshu's dots.llm1, a new open-source large language model utilizing a Mixture of Experts (MoE) architecture with 142 billion total parameters and 14 billion active parameters during inference. A key feature highlighted is its extensive pretraining on 11.2 trillion high-quality, non-synthetic tokens, alongside a 32K token context window. Released under the permissive MIT license, the model includes intermediate training checkpoints to support research. The text discusses the advantages and challenges of the MoE architecture compared to dense models and notes dots.llm1's strong performance, particularly in Chinese language tasks, positioning it competitively within the evolving global landscape of open-source AI, particularly among Chinese technology firms.
NOW PLAYING
An Analysis of Xiaohongshu's dots.llm1 MoE Model
No transcript for this episode yet
Similar Episodes
Apr 22, 2025 ·32m
Feb 27, 2025 ·0m
Sep 20, 2024 ·57m
Aug 7, 2024 ·16m