LLM Architecture in 2026: What You Need to Know with Sebastian Raschka episode artwork

EPISODE · Apr 13, 2026 · 1H 18M

LLM Architecture in 2026: What You Need to Know with Sebastian Raschka

from Vanishing Gradients · host Hugo Bowne-Anderson and Sebastian Raschka, PhD

If you take a model release as an anchor point, let’s say Nemotron 3 or Qwen 3.5, you can go in both directions: You can either plug them into an agent and play around with that, or you can look, okay, what does the model look like under the hood? What are the ingredients? What type of attention mechanism do they use? What are currently research techniques that could make that even better in the next generation of models? What can we swap out, basically? And I’m interested in both of these!Sebastian Raschka, Independent AI Researcher and author of Build a Large Language Model from Scratch, joins Hugo to talk about what’s changed in AI architecture, from post-training to hybrid models, and why understanding what’s under the hood matters more than ever for developers building in the agentic era. Sebastian’s upcoming book, Build a Reasoning Model from Scratch, currently available for pre-order on Amazon and in early access on Manning!Vanishing Gradients is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.We Discuss:* Ed Tech for Agents: should we design educational content specifically for agentic systems, or is there a better approach?* Inference Scaling is the new frontier, driving “gold-level” performance during generation via parallel sampling and internal meta-judges;* Hybrid Architectures from Qwen 3.5 and Nemotron 3 scale almost linearly, making long-context agentic workflows significantly more affordable and performant;* Multi-head Latent Attention (MLA), developed by DeepSeek, wins the KV cache war by drastically reducing memory overhead without performance hits;* Agent Harnesses need to be continuously simplified as frontier models are post-trained on agent trajectories. Teams that don’t strip back their scaffolding risk the harness getting in the way of a more capable model.* “AI Psychosis”: the cognitive load of supervising self-supervising agents, and why we’re all conducting an orchestra we were never trained to conduct;* Sebastian’s AI Stack: a surprisingly simple setup (Mac mini, Codex, Ollama) with a ~20-item QA checklist, delegating the boring work to preserve energy for creative development;* Fine-tuning is now an economic decision, optimizing costs and latency for high-volume tasks where long system prompts outweigh a one-time training run;* Process Reward Models (PRMs) are the next frontier, verifying intermediate reasoning steps to solve “hallucination in the middle” for complex math and code tasks;* “Implementation Does Not Lie”: Sebastian’s layer-by-layer verification philosophy, comparing from-scratch builds against HuggingFace references to catch details invisible in papers;* Architecture Details dictate inference stack choices; nuances like RMSNorm stability or RoPE flavors are critical for optimal performance and troubleshooting;* The Distillation Loop drives open-weight parity, enabling specialized, “frontier-class” models by “pre-digesting” frontier outputs without multi-million dollar training risks.You can also find the full episode on Spotify, Apple Podcasts, and YouTube.You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!Our flagship course Building AI Applications just wrapped its final cohort but we’re cooking up something new. If you want to be first to hear about it (and help shape what we build), drop your thoughts here.Links and Resources* Build a Reasoning Model (From Scratch): Sebastian’s new book, currently available for pre-order on Amazon and in early access on Manning. You’ll learn how reasoning LLMs actually work by starting with a pre-trained base LLM and adding reasoning capabilities step by step in code. A hands-on follow-up to Build a Large Language Model from Scratch.* LLM Architecture Gallery: Sebastian’s collection of architecture figures and fact sheets from his blog posts, updated with each major model release. A go-to visual reference for comparing what’s changed under the hood across model generations.* Sebastian Raschka on LinkedIn* Sebastian’s website* Ahead of AI (Sebastian’s Substack)* Build a Large Language Model from Scratch* PinchBench: OpenClaw Benchmark Leaderboard* DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning* Gated Delta Networks: Improving Mamba2 with Delta Rule (ICLR 2025)* DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning* Hugging Face Model Hub* Upcoming Events on Luma* Vanishing Gradients on YouTubeA Bit More on Agent Harnesses* Components of A Coding Agent by Sebastian* How To Build An Agent that Builds its own Harness by Hugo and Ivan Leo (DeepMind, ex-Manus)* Build Your Own Deep Research Agent with Hugo & Ivan Leo (Google DeepMind, ex-Manus): In this livestream, you’ll learn how to build a production-grade agent harness from scratch in pure Python;* AI Agent Harness, 3 Principles for Context Engineering, and the Bitter Lesson Revisited with Lance Martin (Anthropic), Duncan Gilchrist (Delphina), and Hugo* The Post-Coding Era: What Happens When AI Writes the System? with Nicholas Moy (Google DeepMind), Duncan Gilchrist (Delphina), and Hugo* What is an Agent Harness? from What 300+ Engineers from Netflix, Amazon, and Instacart Asked About AI Engineering.How You Can Support Vanishing GradientsVanishing Gradients is a podcast, workshop series, blog, and newsletter focused on what you can build with AI right now. Over 70 episodes with expert practitioners from Google DeepMind, Netflix, Stanford, and elsewhere. Hundreds of hours of free, hands-on workshops. All independent, all free.If you want to help keep it going:* Become a paid subscriber, from $8/month* Share this with a builder who’d find it useful* Subscribe to our YouTube channel.Thanks for reading Vanishing Gradients! This post is public so feel free to share it. Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe

NOW PLAYING

LLM Architecture in 2026: What You Need to Know with Sebastian Raschka

0:00 1:18:02

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Vanishing Postcards Evan Stern Vanishing Postcards is a documentary travelogue that invites listeners on a road trip exploring the hidden dives, traditions, and frequently threatened histories discovered by exiting the interstates. Named one of the Best Podcasts of 2022 by Digital Trends. True Crime Mystery Hour Anthony Okoye True Crime Mystery Hour, Unraveling the Darkest Mysteries, One Case at a Time.Step into the shadowy world of True Crime Mystery Hour, where unsolved cases, chilling disappearances, and criminal enigmas take center stage. Each episode delves deep into the most baffling mysteries, cold cases, and infamous crimes that have captivated investigators and the public. From notorious serial killers to mysterious vanishing acts, we explore every twist, turn, and hidden clue. Please tune in for a gripping journey through the unknown as we seek to uncover the truth behind some of the darkest stories ever told. Prepare for suspense, intrigue, and a deep dive into the macabre. Most Terrifying Places in America Travel Channel On ‘The Most Terrifying Places in America’, hear real about ghost stories at America's most-famous haunted landmarks. With direct audio from the TV show, a team of ghost hunters, psychic mediums and historians take you around the U.S. to find out why these paranormal hot spots deserve their reputation.Also, go back and listen to episodes of These Woods Are Haunted and The Alaska Triangle. On These Woods Are Haunted, hear true accounts of people who ventured deep into the forest only to come screaming out with stories that defy reality. And on The Alaska Triangle, hear how experts and eyewitnesses attempt to unlock the mystery of the Alaska Triangle, a remote area infamous for alien abductions, Bigfoot sightings, paranormal phenomena and vanishing airplanes. Hosted on Acast. See acast.com/privacy for more information. The Insider Vanishing Inc. Magic Vanishing Inc. is proud to share with you our magic podcast, The Insider. Interviews with the finest magicians in the world.

Frequently Asked Questions

How long is this episode of Vanishing Gradients?

This episode is 1 hour and 18 minutes long.

When was this Vanishing Gradients episode published?

This episode was published on April 13, 2026.

What is this episode about?

If you take a model release as an anchor point, let’s say Nemotron 3 or Qwen 3.5, you can go in both directions: You can either plug them into an agent and play around with that, or you can look, okay, what does the model look like under the hood?...

Can I download this Vanishing Gradients episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!