Learning GenAI via SOTA Papers cover art

All Episodes

Learning GenAI via SOTA Papers — 183 episodes

#
Title
1

EP183: AI coding agents cheat with keywords

2

EP182: AI logic is its weakest link

3

EP181: Small models beating GPT-5 with logic

4

EP180: How AI agents rewrite their code

5

EP179: AIBuildAI Builds New AI Models From Scratch

6

EP178: AI agents reaching silent latent consensus

7

EP177: CAPO math stops overconfident AI lies

8

EP176: Trigonometry fixes the AI memory bottleneck

9

EP175: How AI models teach themselves reasoning

10

EP174: 1-bit Bonsai brings powerful AI offline

11

EP173: AI models diagnosing diseases from blank scans

12

EP172: How HyperAgents rewrite their own code

13

EP171: Helium makes AI agent workflows 40x faster

14

EP170: Qwen3.5 Multimodal Agent

15

EP169: Cybersecurity Risks of Autonomous AI Agents

16

EP168: Turning AI Agents into Mathematical Functions

17

EP167: Why AI models ignore visual evidence

18

EP166: The Auton solution to the integration paradox

19

EP165: Translating hidden AI logic into English

20

EP164: [LACONIC] Teaching AI to stop overthinking

21

EP163: Why AI Models Only Remember Five Percent

22

EP162: AI agents beat humans with malicious skills

23

EP161: Small AI Judges Beat Massive Coding Giants

24

EP160: [AgentSys] Securing AI agents with hierarchical memory

25

EP159: Brute force scale dominates the AI frontier

26

EP158: The hidden blind spots of AI logic

27

EP157: [AgentHeLLM] Protecting drivers from hijacked vehicle AI

28

EP156: [Uncertainty Quantification] How AI Agents Know They Are Guessing

29

EP155: [Agentic Proposing] Small models beat giants with logic bricks

30

EP154: [FS-Researcher] Giving AI agents a file system

31

EP153: [SERA] Training AI coding agents on untested code

32

EP152: DeepVerifier forces AI to check its work

33

EP151: [MagicGUI-RMS] AI agents that think before they click

34

EP150: The Leap to Autonomous Agentic Reasoning

35

EP149: [IDRBench] Interactive AI beats lone wolf models

36

EP148: How AI masters math through self-correction

37

EP147: [DeepSynth-Eval] AI fails at deep research synthesis

38

EP146: How InfiAgent solves the AI memory bottleneck

39

EP145: [LongDA] Why smart AI fails at messy data

40

EP144: [Evo-Memory] Building AI agents with self-evolving memory.

41

EP143: Your AI will blackmail you to survive

42

EP142: [DR-Arena] A ruthless arena for deep research agents

43

EP141: [AIRS-Bench] AI agents beat human research benchmarks

44

EP140: [LeWorldModel] AI learns physics on one GPU

45

EP139: Mamba-3 Fixes the Transformer Memory Bottleneck

46

EP138: [Mamba-2] Transformers and SSMs Are the Same Engine

47

EP137: Attention Residuals Solve the LLM Depth Bottleneck

48

EP136: Modular skills for autonomous AI agents

49

EP135: [SoK] Curing AI Amnesia with Agentic Skills

50

EP134: Autonomous AI squads building software

51

EP133: RelayLLM Slashes AI Costs With Collaborative Decoding

52

EP132: How Autonomous LLM Agents Actually Work

53

EP131: MUSE creates self evolving AI agents

54

EP130: [GAP] Graph-based planning for faster AI agents

55

EP129: Why AI agents fail half the time

56

EP128: MCP-Zero lets AI find its own tools

57

EP127: Why tool use makes AI less intelligent

58

EP126: OrcaLoca locates bugs in massive codebases

59

EP125: Why AI Needs an Agent Computer Interface

60

EP124: FRIDAY the AI that runs your computer

61

EP123: MemGPT Turns LLMs into Operating Systems

62

EP122: The Four Pillars of LLM Autonomous Agents

63

EP121: How ToolLLaMA mastered 16000 real world APIs

64

EP120: How Reflexion agents learn through verbal feedback

65

EP119: HuggingGPT Turns LLMs Into AI Managers

66

EP118: The AI Memory Wall Crisis

67

EP117: AI agents learn through textual reflection

68

EP116: Why AI struggles with empathy and interruptions

69

EP115: Dr.LLM brings dynamic depth to AI

70

EP114: FlashAttention-4 Solves Blackwell Hardware Bottlenecks

71

EP113: How FlashAttention-3 Doubles H100 Speed

72

EP112: GPT 5.4 Outperforms Human Professionals

73

EP111: Claude Opus 4.6 Runs Businesses and Catches Manipulation

74

EP110: Single agents beat expensive multi agent teams

75

EP109: The Rise of Agentic Reasoning

76

EP108: GPT-5 Can Lie and Play Dumb

77

EP107: DeepMind’s SIMA 2 Masters Unseen Video Games

78

EP106: Fixing AI Agents With Symbolic Guardrails

79

EP105: iStar Autonomous Agents Grading Their Own Homework

80

EP104: WebExplorer Beats Giants at Web Research

81

EP103: Why AI Agents Think Themselves To Death

82

EP102: Gemini 2.5 Thinks Before It Speaks

83

EP101: Kimi k1.5 Breaks the AI Data Wall

84

EP100: Meta's Llama 4 Herd Ends Monolithic Models

85

EP099: Is AI Thinking Just Expensive Noise

86

EP098: OpenAI o3 Hacked Its Own Grading System

87

EP097: DeepSeek R1 Taught Itself to Reason

88

EP096: Gemini 1.5 Pro's 10 Million Token Window

89

EP095: Microsoft Phi-4 Beats Giants With Synthetic Data

90

EP094: DeepSeek-V3 Rivals GPT-4 for $6 Million

91

EP093: How OpenAI o1 Cracked the Strawberry Cipher

92

EP092: BitNet b1.58 Replaces Multiplication With Addition

93

EP091: Qwen 2.5 Beats Llama With Synthetic Data

94

EP090: Pixtral 12B Beats Llama With Better Eyesight

95

EP089: Qwen2-VL Gives AI Native Eyesight

96

EP088: Qwen2 Beats Llama-3 Through Data Quality

97

EP087: Meta's Chameleon Unifies Text and Images

98

EP086: DeepSeek-V2 Breaks The Impossible Triangle

99

EP085: Aya 23 Breaks The Curse Of Multilinguality

100

EP084: Microsoft Phi-3 Fits Supercomputing in Your Pocket

101

EP083: How Meta Engineered the Llama 3 Herd

102

EP082: Command R Plus The Verifiable Enterprise Agent

103

EP081: Replacing MLPs With Interpretable KANs

104

EP080: Jamba Hybrid Solves Transformer Memory Limits

105

EP079: DBRX Beats GPT-3.5

106

EP078: Claude 3 Knew It Was Being Tested

107

EP077: Google Squeezes Gemini Into Your Laptop

108

EP076: OLMo Cracks Open the AI Black Box

109

EP075: Microsoft Phi Beats Giants With Synthetic Textbooks

110

EP074: How Gemini Beat Human Experts

111

EP073: Mixtral 8x7B Sparse Experts Beat Giants

112

EP072: Mamba Solves The Transformer's Fatal Flaw

113

EP071: How Zephyr-7B Beat Llama-70B

114

EP070: Mistral 7B Beats Llama 2 13B

115

EP069: Alibaba's Qwen Specialized Models Beat Generalists

116

EP068: vLLM Fixes the KV Cache Bottleneck

117

EP067: FlashAttention-2 Unlocks Massive Context Windows

118

EP066: Llama 2 Ghost Attention And Safety Secrets

119

EP065: Teaching Small AI To Think Like Giants

120

EP064: Synthetic Textbooks Break AI Scaling Laws

121

EP063: RWKV Smashes the Transformer Memory Ceiling

122

EP062: VOYAGER AI Masters Minecraft by Writing Code

123

EP061: Fine-Tuning LLaMA 65B on One GPU

124

EP060: Direct Preference Optimization Replaces RLHF

125

EP059: Tree of Thoughts Unlocks System 2 Thinking

126

EP058: Inside the Autonomous AI Town of Smallville

127

EP057: Blind GPT-4 Taught LLaVA To See

128

EP056: Pythia Turns AI Alchemy Into Chemistry

129

EP055: Can GPT-4 Fairly Judge Other AI

130

EP054: Alpaca - Stanford Built a $600 GPT Clone

131

EP053: Sparks of AGI in Early GPT-4

132

EP052: GPT-4 Bar Exam and Visual Reasoning

133

EP051: ControlNet Solves Spatial Control With Zero Convolutions

134

EP050: How Meta's LLaMA Beat GPT-3

135

EP049: Toolformer Teaches Itself to Use APIs

136

EP048: BLIP-2 Teaches Frozen Models to See

137

EP047: Bootstrapping AI With Self-Generated Instructions

138

EP046: Training AI With A Constitution

139

EP045: BLOOM The Open Source Rival To GPT-3

140

EP044: How ReAct Synergizes Reasoning and Acting

141

EP043: Weak Supervision Made OpenAI Whisper Robust

142

EP042: Running 175B Models on Consumer Hardware

143

EP041: FlashAttention Smashes the AI Memory Wall

144

EP040: Meta's Open Source GPT-3 Replica

145

EP039: Flamingo Unlocks Few-Shot Visual Reasoning

146

EP038: PaLM's 540 Billion Parameters Unlock Reasoning

147

EP037: DeepMind Chinchilla Ends The Parameter Wars

148

EP036: How 40 People Taught GPT-3 Manners

149

EP035: How Google LaMDA Learned To Use Tools

150

EP034: Chain of Thought Prompting Unlocks Reasoning

151

EP033: Democratizing Image Generation with Latent Diffusion

152

EP032: WebGPT Fights Hallucinations With Web Search

153

EP031: DeepMind RETRO Swaps Memorization For Retrieval

154

EP030: DeepMind's Gopher Exposes Limits of Scale

155

EP029: Instruction Tuning Unlocked Zero-Shot Learning

156

EP028: Train Short for Infinite Context

157

EP027: From Creative Writer to Logic Engine

158

EP026: LoRA Fine-Tunes Massive Models Without Supercomputers

159

EP025: RoPE Solves Sequence by Rotating Vectors

160

EP024: OpenAI CLIP Bridges Language and Vision

161

EP023: Scaling Switch Transformers to Trillion Parameters

162

EP022: DALL-E Treats Images Like Language

163

EP021: Vision Transformers Beat CNNs at Scale

164

EP020: Big Bird Scales Transformers With Sparse Attention

165

EP019: Facebook's Linformer Solves the Attention Bottleneck

166

EP018: Turning Digital Static Into Images With Diffusion

167

EP017: RAG Gives AI a Library Card

168

EP016: GPT-3 Learns From Examples Without Retraining

169

EP015: Longformer Smashes the 512 Token Barrier

170

EP014: ELECTRA Beats GPT On One GPU

171

EP013: Reformer Cracked the Transformer Memory Wall

172

EP012: Google T5 Turns Every Task Into Text

173

EP011: ZeRO Solved the Trillion Parameter Memory Wall

174

EP010: ALBERT Outperforms BERT With Parameter Sharing

175

EP009: Slicing the AI Brain with Megatron-LM

176

EP008: RoBERTa Proves BERT Was Just Undertrained

177

EP007: How GPT-2 Hallucinated Ovid's Unicorn

178

EP006: Transformer-XL Cures AI Amnesia

179

EP005: How BERT Mastered Language by Hiding Words

180

EP004: How 7000 Unpublished Books Birthed GPT

181

EP003: How ELMo Made Word Vectors Dynamic

182

EP002: ULMFiT Was the ImageNet Moment for Text

183

EP001: How Transformers Smashed the Sequential Bottleneck