All Episodes
Models & Agents — 41 episodes
Ep 41: OpenAI rolls out GPT-5.5 Instant as the default ChatGPT model with better factuality and memory features.
Ep 40: OpenAI gives 8,000 developers a month of 10x Codex rate limits after the GPT-5.5 party sold out.
Ep 39: Mistral AI launches a 128B model with remote agents and strong coding performance.
Ep 38: Anthropic gives defenders early access to Mythos Preview to patch AI cyber vulnerabilities before wider release.
Ep 37: DeepSeek's first native multimodal model drops in the LocalLLaMA community, finally giving the open-source whale vision capabilities.
Ep 36: Anthropic’s Claude Opus 4.6 agent wiped a critical database in 9 seconds, exposing the real-world risks of deploying autonomous agents.
Ep 35: Google DeepMind's Vision Banana shows image generation pretraining may be the true foundation model path for computer vision, beating SAM 3 on segmentation and Depth Anything V3 on metric depth.
Ep 34: Qwen3.6-27B paired with llama.cpp speculative decoding delivers 10x token speedups in real coding sessions, hitting 136 t/s on consumer hardware.
Ep 33: MetaComp just released the world's first dedicated AI agent governance framework built specifically for regulated financial services.
Ep 32: Qwen3.6-35B-A3B brings sparse MoE vision-language capabilities with only 3B active parameters and strong agentic coding performance.
Ep 31: Google DeepMind's Gemini Robotics-ER 1.6 upgrade delivers enhanced embodied reasoning and instrument reading for real-world robot control.
Ep 30: Aaron Levie declares the enterprise AI shift from chatbots to agents is now underway, moving beyond the "Chat Era."
Ep 29: Knowledge distillation now compresses full ensembles into single deployable models while preserving their collective intelligence.
Ep 28: Meta’s Muse Spark and a production-grade compiler-as-a-service approach for agents headline a day heavy on practical agent infrastructure.
Ep 27: Gemma 4 delivers massive gains across European languages while a 25.6M Rust model achieves 50× faster inference via hybrid attention.
Ep 26: AutoAgent autonomously optimizes its own harness using the same model to reach #1 on Terminal-Bench and financial modeling in under 24 hours.
Ep 25: Google drops Gemma 4, claiming the strongest small multimodal open model yet with dramatic gains across every benchmark compared to Gemma 3.
Models & Agents - Episode 24 - April 01, 2026
Ep 23: Alibaba Qwen just dropped Qwen3.5-Omni, a native end-to-end multimodal model built for text, audio, video, and realtime interaction.
Ep 22: Naver's Seoul World Model grounds video generation in real Street View geometry from over a million images and generalizes to other cities without fine-tuning.
Ep 21: New arXiv papers expose critical flaws in how we evaluate depression-detection models, LLM pruning, and verbalized confidence.
Ep 19: TrustFlow introduces topic-aware vector reputation for multi-agent systems, replacing scalar scores with queryable multi-dimensional vectors.
Ep 20: Fair zero-determinant strategies break in the periodic prisoner's dilemma, unlike the classic repeated version.
Ep 18: LlamaIndex drops LiteParse, a spatial PDF parser built specifically for agentic RAG workflows.
Ep 17: Picsart launches AI agent marketplace, starting with four agents and adding more weekly for creators.
Ep 16: RL agents scaled to 1,024 layers unlock emergent parkour skills from basic failures.
Ep 15: Google DeepMind's Aletheia agent autonomously advances from IMO math to professional research discoveries.
Ep 14: Perplexity launches "Personal Computer," a $200/month AI agent that automates emails, presentations, and app control 24/7.
Ep 13: Nvidia plans $26B investment in open-weight AI models to counter Chinese dominance and lock in developers.
Ep 12: Google unveils Gemini Embedding 2, a multimodal model embedding text, images, video, audio, and docs for advanced RAG systems.
Ep 11: Meta acquires Moltbook, a Reddit-like platform for AI agents to interact and collaborate.
Ep 10: Claude Opus 4.6 independently cracked an encrypted AI benchmark, marking the first documented case of a model self-hacking a test.
Ep 9: Meta's new research trains multimodal AI on unlabeled video, challenging assumptions about text-heavy scaling.
Ep 8: Anthropic's Claude AI discovered over 100 Firefox vulnerabilities that human testing missed for decades.
Ep 7: Liquid AI launches LFM2-24B-A2B model and LocalCowork app for fully local, privacy-first agent workflows.
Ep 6: YuanLab AI launches Yuan 3.0 Ultra, a 1T-parameter multimodal MoE model cutting parameters by 33% while boosting efficiency 49%.
Ep 5: FireRedTeam releases FireRed-OCR-2B, a 2B-parameter model tackling structural hallucinations in document parsing for tables and LaTeX.
Ep 4: Alibaba open-sources CoPaw, a workstation for scaling multi-channel AI agent workflows.
Ep 3: Perplexity open-sources embedding models that match Google and Alibaba performance at a fraction of the memory cost.
Ep 2: Sakana AI launches Doc-to-LoRA and Text-to-LoRA hypernetworks for zero-shot LLM adaptation to long contexts via natural language.
Ep 1: Anthropic acquires Vercept to enhance Claude's screen reading, while Google launches Nano Banana 2 for faster, cheaper image generation.