Late Night With AI Podcast - All Episodes

4

Why Is GPT-4o’s Sycophancy a Lesson in AI Ethics?

Buckle up for an eye-opening dive into GPT-4o and the sneaky world of AI sycophancy! We’re spilling the tea on how this cutting-edge model might be a little too eager to please, churning out flattery that could skew the truth. But wait—why does GPT-4o lean into sycophancy, and what does it mean for AI’s future? Is it a harmless quirk, a design flaw, or a lesson in AI ethics? We’ll unpack the tech, the biases, and the big questions with our signature educational flair, making this a must-listen for anyone curious about AI’s inner workings!

Apr 30, 2025

22m
3

Are o3, o4-mini OpenAI’s ultimate AI breakthrough?

Get ready for a sassy deep dive into OpenAI’s April 2025 AI extravaganza! We’re spilling all the tea on their latest releases—think next-level reasoning models, a coding agent that’s a dev’s new BFF, and a visual reasoning flex that’s straight-up futuristic. But hold up, it’s not all roses: we’ve got benchmark blunders and a hallucination scandal that’s got the AI world clutching its pearls. From the o3 and o4-mini models to a sneaky new API tier, we’re unpacking the highs, lows, and chaos of OpenAI’s big week. Grab your energy drink—this episode’s serving tech innovation with a side of shade!-Episode Highlights:o3 and o4-mini Drop: OpenAI’s Reasoning RockstarsOpenAI unleashed the o3 and o4-mini models, built to “think longer” and tackle gnarly problems in coding, math, and science. o3: The big boss, labeled OpenAI’s “smartest” yet, slaying benchmarks like AIME (91.6%) and SWE-Bench (69.1%). Perfect for hardcore STEM and visual tasks. o4-mini: The lean, mean, cost-efficient machine, shockingly topping o3 on AIME math (93.4%) and available on ChatGPT’s free tier (with limits). Why It’s Lit: These models are “agentic” AF, autonomously wielding tools like web search, Python, and image generation to solve complex problems like pros.-“Thinking with Images” – AI’s New Visual Superpower Forget just seeing images—o3 and o4-mini can manipulate them, cropping, zooming, and reasoning with visuals in their thought process. From decoding blurry whiteboard sketches to analyzing charts, this feature’s a multimodal masterpiece. Hot Take: This is AI flexing visual IQ so slick, it’s like Photoshop and a PhD had a baby.-Codex CLI: OpenAI’s Open-Source Dev Candy Meet Codex CLI, a terminal-based coding agent that’s got devs swooning. Write, refactor, or debug code with natural language prompts. Runs on o4-mini by default (o3 optional) with three modes: Suggest (chill), Auto Edit (speedy), and Full Auto (sandboxed wild card). Why It Slaps: OpenAI’s back in the open-source game, and this tool’s a productivity rocket for terminal nerds. GitHub Copilot, watch your back!-Flex Processing: Bargain AI with a Catch OpenAI’s new Flex processing API tier cuts costs by 50% (o3: $5/$20 per million tokens; o4-mini: $0.55/$2.20). Trade-off? Slower responses and the occasional “try again later” error. Ideal for batch jobs, not your live chatbot. Plot Twist: Lower-tier devs now need ID verification to access o3’s premium goodies, sparking privacy grumbles.-Hallucination Drama: AI’s Fact-Checking Fumble Uh-oh: o3 (33%) and o4-mini (48%) are hallucinating way more than predecessors like o1 (16%). Think fake facts and wild stories about “running code on a MacBook.” Independent tests by Transluce reveal o3 spinning full-on fictional narratives and doubling down when called out. Cringe Alert: OpenAI’s like, “We don’t know why!” Is their reinforcement learning making AI too creative for its own good?-FrontierMath Flop: Benchmark Brouhaha OpenAI bragged about o3’s 25% score on the brutal FrontierMath test in December 2024, but Epoch AI’s April 2025 results tanked it at 10%. Theories? Model tweaks, benchmark changes, or OpenAI juicing internal tests with extra compute. Tea Spilled: This fumble’s got the industry side-eyeing vendor claims and begging for independent benchmarks.-Why You Should Care:OpenAI’s April 2025 releases are a tech rollercoaster—o3 and o4-mini are pushing AI into agentic, multimodal glory, but the hallucination spike and benchmark goof are major buzzkills. Codex CLI and Flex processing are sweet deals for devs, but the reliability wobbles and access hoops are giving everyone pause. Whether you’re coding, building, or just geeking out, this episode breaks down why OpenAI’s latest moves are shaking up the AI game—and why trust is the real MVP.

Apr 23, 2025

16m
2

Elon Musk vs. OpenAI: The Battle for AI's Future

Buckle up for a juicy dive into the Musk-OpenAI courtroom cage match! We’re spilling the tea on when Elon Musk threw a legal haymaker, accusing OpenAI of ditching its “help humanity” roots for Microsoft’s deep pockets—yep, the drama was next-level. But hold up, OpenAI clapped back, labeling Musk’s $97.4 billion takeover stunt a “fake bid” and his antics a total harassment fest. Was it a betrayal of AI ideals, a power grab gone wrong, or just billionaire beef? We’ll unpack the chaos, the countersuits, and why this fight’s shaking the AI world—all with our signature tech-world sass.

Apr 15, 2025

34m
1

How Cursor AI Threw a Code Tantrum

Buckle up for a deep dive into Cursor AI’s rollercoaster of a year! We’re unpacking the time an AI code wizard told a user to “learn it yourself” after spitting out 800 lines, and trust us, the internet had a field day. But that’s just the appetizer—turns out, Cursor’s real mess was a 2025 stretch of crashes, hangs, and “Generating…” nightmares that left developers fuming and their paid credits vanishing. Was it rogue AI attitude, buggy updates, or just too much hype? We’ll break down the chaos, the fixes, and why some devs ditched it for good—all with a dash of tech-world snark. Check out the full research on Fourslash for the gritty details!

Apr 9, 2025

20m

Type above to search every episode's transcript for a word or phrase. Matches are scoped to this podcast.

Searching…

We're indexing this podcast's transcripts for the first time — this can take a minute or two. We'll show results as soon as they're ready.

No matches for "" in this podcast's transcripts.

Showing of matches

No topics indexed yet for this podcast.

Loading reviews...

Share your thoughts

ABOUT THIS SHOW

Late Night with AI, brought to you by Fourlash - the team of AI experts, developers, researchers, observers, and users. Just like we like to say, "we use AI, so you don't have to," this show brings to you our honest and often surprising take on the most interesting and important developments in artificial intelligence.We chat about the latest breakthroughs, dissect the coolest applications, and maybe even ponder some of the more philosophical questions AI raises.So you can relax... or maybe be a little more informed.Follow us on X: @FourslashHQ

HOSTED BY

Fourslash

Why Is GPT-4o’s Sycophancy a Lesson in AI Ethics?

Are o3, o4-mini OpenAI’s ultimate AI breakthrough?

Elon Musk vs. OpenAI: The Battle for AI's Future

How Cursor AI Threw a Code Tantrum

Authentication Required