Beyond Benchmarks: Understanding LLM's Accuracy Collapse in Reasoning

from Karachi Wala Developer · host Mashhood Rastgar

Are Large Language Models (LLMs) truly intelligent, or just sophisticated pattern matchers? This episode dives deep into a fascinating debate sparked by Apple's recent research paper, which questioned the reasoning capabilities of LLMs. We explore the counter-arguments presented by OpenAI and Anthropic, dissecting the methodologies and the core disagreements about what constitutes genuine intelligence in AI. Join us as we unpack the nuances of LLM evaluation and challenge common perceptions about AI's current limitations.

What this episode covers

NOW PLAYING

0:00 11:03

1×

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Share this episode

Similar Episodes

#551: Stroll Down Startup Lane - 2026

Jun 11, 2026 ·108m

313. The Role of a Manager

Jun 11, 2026 ·15m

Platforms State of the Union 2026 with Peter Witham

Jun 9, 2026 ·40m

Who's Wendy with Joannis Orlandos

Jun 5, 2026 ·47m

312. Why The Worst Code is Working Code

Jun 4, 2026 ·46m

Actually Really Useful

Jun 3, 2026 ·16m

Similar Podcasts

API Intersection Stoplight Building a successful API requires more than just coding. It starts with collaborative design, focuses on creating a great developer experience, and ends with getting your company on board, maintaining consistency, and maximizing your API’s profitability.In the API Intersection, you’ll learn from experienced API practitioners who transformed their organizations, and get tangible advice to build quality APIs with collaborative API-first design.Jason Harmon brings over a decade of industry-recognized REST API experience to discuss topics around API design, governance, identity/auth versioning, and more.They’ll answer listener questions, and discuss best practices on API design (definition, modeling, grammar), Governance (multi-team design, reviewing new API’s), Platform Transformation (culture, internal education, versioning) and more.They’ll also chat with experienced API practitioners from a wide array of industries to draw out practical takeaways and insights you can use.H Double Dispatch Josh Hale A podcast made with passion from a Computer Scientist that wants to share what he's learning with the World 🌎 AND hear what the world says back! Listen as software developer Josh sends and receives knowledge learning more about the amazing things you can do with computers. Chef’s Recipe Spotlight Chef Jessica Anne Formicola A short daily podcast featuring easy, approachable recipes for every home cook. Chef Jessica Anne Formicola of the Emmy-nominated Show Plate It!, cookbook author and Le Cordon Bleu Certified recipe developer, shares why the recipe is important to her and tips and tricks for preparing it. She follows her 5 S philosophy of using salt, spices, sauces, substitutions and the sense to create restaurant quality dishes in a home kitchen. See her work in Parade Magazine, Better Homes & Gardens, Mashed, tasting Table and The Daily Meal Food + Travel and more! Additionally, she is a mom and wife residing in Baltimore, Maryland where she is passionate about supporting the local food community. Running in Production Nick Janetakis - Full stack developer Hear about how folks are running their web apps in production. We'll cover tech choices, why they chose them, lessons learned and more.

Frequently Asked Questions

How long is this episode of Karachi Wala Developer?

This episode is 11 minutes long.

When was this Karachi Wala Developer episode published?

This episode was published on June 19, 2025.

What is this episode about?

Can I download this Karachi Wala Developer episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.

URL copied to clipboard!