EPISODE · Jun 17, 2026 · 1H 46M
Radically Better Reasoning: Elicit's Andreas Stuhlmüller & Jungwon Byun on World Models for Research
from "The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis · host Erik Torenberg, Nathan Labenz
Andreas Stuhlmüller and Jungwon Byun return to discuss how Elicit is building trusted reasoning workflows for scientific research as frontier models grow more powerful but less transparent. They explain process supervision, domain-specific reasoning primitives, and world models that make evidence, causality, and counterfactuals more inspectable. The conversation also covers life sciences use cases, evaluating conflicting evidence, automated software engineering at Elicit, token costs, Gemini, and why legible reasoning may still beat neuralese. Mercury: Command is Mercury’s new conversational interface, giving you natural-language access to your finances and helping you take actions within your existing permissions and approval policies. Visit https://mercury.com to learn more and apply online in minutes. LINKS: Elicit Research Platform Andreas Stuhlmüller Personal Site Jungwon Byun X Profile Ought Research Organization Elicit Founders Previous Episode GPT-4 Technical Report Monitoring Reasoning Models Paper Ought ICE GitHub Repository Hard-to-Verify Tasks Essay Karpathy LLM Wiki Gist Obsidian Knowledge Base App Mixpanel Analytics Platform Amplitude Analytics Platform Anthropic Tracing Thoughts Research Claude AI Chat Assistant METR Long Tasks Measurement Pi Agent Scaffold Repository Personal AI Infrastructure Repository Elicit Claude Opus Evaluation Elicit API Documentation METR Developer Productivity Study Elicit Planning Is Unsolved Rich Sutton Bitter Lesson Meta Llama AI Models Recursive San Francisco Event zero.xyz Agent Tool Access Anthropic Dynamic Workflows Coverage Sponsor: Claude: Claude by Anthropic is an AI collaborator that understands your workflow and helps you tackle research, writing, coding, and organization with deep context. Get started with Claude and explore Claude Pro at https://claude.ai/tcr
What this episode covers
Andreas Stuhlmüller and Jungwon Byun return to discuss how Elicit is building trusted reasoning workflows for scientific research as frontier models grow more powerful but less transparent. They explain process supervision, domain-specific reasoning primitives, and world models that make evidence, causality, and counterfactuals more inspectable. The conversation also covers life sciences use cases, evaluating conflicting evidence, automated software engineering at Elicit, token costs, Gemini, and why legible reasoning may still beat neuralese. Mercury: Command is Mercury’s new conversational interface, giving you natural-language access to your finances and helping you take actions within your existing permissions and approval policies. Visit https://mercury.com to learn more and apply online in minutes. LINKS: Elicit Research Platform Andreas Stuhlmüller Personal Site Jungwon Byun X Profile Ought Research Organization Elicit Founders Previous Episode GPT-4 Technical Report Monitoring Reasoning Models Paper Ought ICE GitHub Repository Hard-to-Verify Tasks Essay Karpathy LLM Wiki Gist Obsidian Knowledge Base App Mixpanel Analytics Platform Amplitude Analytics Platform Anthropic Tracing Thoughts Research Claude AI Chat Assistant METR Long Tasks Measurement Pi Agent Scaffold Repository Personal AI Infrastructure Repository Elicit Claude Opus Evaluation Elicit API Documentation METR Developer Productivity Study Elicit Planning Is Unsolved Rich Sutton Bitter Lesson Meta Llama AI Models Recursive San Francisco Event zero.xyz Agent Tool Access Anthropic Dynamic Workflows Coverage Sponsor: Claude: Claude by Anthropic is an AI collaborator that understands your workflow and helps you tackle research, writing, coding, and organization with deep context. Get started with Claude and explore Claude Pro at https://claude.ai/tcr
NOW PLAYING
Radically Better Reasoning: Elicit's Andreas Stuhlmüller & Jungwon Byun on World Models for Research
No transcript for this episode yet
Similar Episodes
Mar 26, 2026 ·1m
Mar 19, 2026 ·34m
Feb 18, 2026 ·11m
Feb 11, 2026 ·45m