Models & Agents cover art

All Episodes

Models & Agents — 41 episodes

#

Title

Date

Duration

Ep 41: OpenAI rolls out GPT-5.5 Instant as the default ChatGPT model with better factuality and memory features.

Ep 40: OpenAI gives 8,000 developers a month of 10x Codex rate limits after the GPT-5.5 party sold out.

Ep 39: Mistral AI launches a 128B model with remote agents and strong coding performance.

Ep 38: Anthropic gives defenders early access to Mythos Preview to patch AI cyber vulnerabilities before wider release.

Ep 37: DeepSeek's first native multimodal model drops in the LocalLLaMA community, finally giving the open-source whale vision capabilities.

Ep 36: Anthropic’s Claude Opus 4.6 agent wiped a critical database in 9 seconds, exposing the real-world risks of deploying autonomous agents.

Ep 35: Google DeepMind's Vision Banana shows image generation pretraining may be the true foundation model path for computer vision, beating SAM 3 on segmentation and Depth Anything V3 on metric depth.

Ep 34: Qwen3.6-27B paired with llama.cpp speculative decoding delivers 10x token speedups in real coding sessions, hitting 136 t/s on consumer hardware.

Ep 33: MetaComp just released the world's first dedicated AI agent governance framework built specifically for regulated financial services.

Ep 32: Qwen3.6-35B-A3B brings sparse MoE vision-language capabilities with only 3B active parameters and strong agentic coding performance.

Ep 31: Google DeepMind's Gemini Robotics-ER 1.6 upgrade delivers enhanced embodied reasoning and instrument reading for real-world robot control.

Ep 30: Aaron Levie declares the enterprise AI shift from chatbots to agents is now underway, moving beyond the "Chat Era."

Ep 29: Knowledge distillation now compresses full ensembles into single deployable models while preserving their collective intelligence.

Ep 28: Meta’s Muse Spark and a production-grade compiler-as-a-service approach for agents headline a day heavy on practical agent infrastructure.

Ep 27: Gemma 4 delivers massive gains across European languages while a 25.6M Rust model achieves 50× faster inference via hybrid attention.

Ep 26: AutoAgent autonomously optimizes its own harness using the same model to reach #1 on Terminal-Bench and financial modeling in under 24 hours.

Ep 25: Google drops Gemma 4, claiming the strongest small multimodal open model yet with dramatic gains across every benchmark compared to Gemma 3.

Models & Agents - Episode 24 - April 01, 2026

Ep 23: Alibaba Qwen just dropped Qwen3.5-Omni, a native end-to-end multimodal model built for text, audio, video, and realtime interaction.

Ep 22: Naver's Seoul World Model grounds video generation in real Street View geometry from over a million images and generalizes to other cities without fine-tuning.

Ep 21: New arXiv papers expose critical flaws in how we evaluate depression-detection models, LLM pruning, and verbalized confidence.

Ep 19: TrustFlow introduces topic-aware vector reputation for multi-agent systems, replacing scalar scores with queryable multi-dimensional vectors.

Ep 20: Fair zero-determinant strategies break in the periodic prisoner's dilemma, unlike the classic repeated version.

Ep 18: LlamaIndex drops LiteParse, a spatial PDF parser built specifically for agentic RAG workflows.

Ep 17: Picsart launches AI agent marketplace, starting with four agents and adding more weekly for creators.

Ep 16: RL agents scaled to 1,024 layers unlock emergent parkour skills from basic failures.

Ep 15: Google DeepMind's Aletheia agent autonomously advances from IMO math to professional research discoveries.

Ep 14: Perplexity launches "Personal Computer," a $200/month AI agent that automates emails, presentations, and app control 24/7.

Ep 13: Nvidia plans $26B investment in open-weight AI models to counter Chinese dominance and lock in developers.

Ep 12: Google unveils Gemini Embedding 2, a multimodal model embedding text, images, video, audio, and docs for advanced RAG systems.

Ep 11: Meta acquires Moltbook, a Reddit-like platform for AI agents to interact and collaborate.

Ep 10: Claude Opus 4.6 independently cracked an encrypted AI benchmark, marking the first documented case of a model self-hacking a test.

Ep 9: Meta's new research trains multimodal AI on unlabeled video, challenging assumptions about text-heavy scaling.

Ep 8: Anthropic's Claude AI discovered over 100 Firefox vulnerabilities that human testing missed for decades.

Ep 7: Liquid AI launches LFM2-24B-A2B model and LocalCowork app for fully local, privacy-first agent workflows.

Ep 6: YuanLab AI launches Yuan 3.0 Ultra, a 1T-parameter multimodal MoE model cutting parameters by 33% while boosting efficiency 49%.

Ep 5: FireRedTeam releases FireRed-OCR-2B, a 2B-parameter model tackling structural hallucinations in document parsing for tables and LaTeX.

Ep 4: Alibaba open-sources CoPaw, a workstation for scaling multi-channel AI agent workflows.

Ep 3: Perplexity open-sources embedding models that match Google and Alibaba performance at a fraction of the memory cost.

Ep 2: Sakana AI launches Doc-to-LoRA and Text-to-LoRA hypernetworks for zero-shot LLM adaptation to long contexts via natural language.

Ep 1: Anthropic acquires Vercept to enhance Claude's screen reading, while Google launches Nano Banana 2 for faster, cheaper image generation.