OpenAI's GPT-4.1 and separating the API from ChatGPT episode artwork

EPISODE · Apr 14, 2025 · 7 MIN

OpenAI's GPT-4.1 and separating the API from ChatGPT

from Interconnects · host Nathan Lambert

https://www.interconnects.ai/p/openais-gpt-41-and-separating-theRecently I gave another talk on RLVR experiments and I posted some thoughts on OLMoTrace — Ai2’s recent tool to let you look at the training data of OLMo 2.OpenAI has been making many small updates toward their vision of ChatGPT as a monolithic app separate from their API business. Last week OpenAI improved the ChatGPT memory feature — making it so the app can reference the text of previous chats in addition to basic facts about the user. Today, OpenAI announced a new suite of API-only models, GPT 4.1, which is very directly in competition with Google’s Gemini models.Individually, none of OpenAI’s recent releases are particularly frontier-shifting — comparable performance per dollar models exist — but together they paint a picture of where OpenAI’s incentives are heading. This is the same company that recently teased that it has hit 1 billion weekly active users. This is the company that needs to treat ChatGPT and the models that power it very differently from any other AI product on the market. The other leading AI products are all for coding or information, where personality, vibes, and entertainment are not placed on as high a premium.A prime example of this shift is that GPT-4.5 is being deprecated from the API (with its extreme pricing), but is going to remain in ChatGPT — where Sam Atlman has repeatedly said he’s blown away by how much users love it. I use it all the time, it’s an interesting and consistent model.Among their major model releases, such as o3, o4, or the forthcoming open model release, it can be hard to reinforce the high-level view and see where OpenAI is going.A quick summary of the model performance comes from this chart that OpenAI released in the live stream (and blog post):Chart crimes aside (using MMLU as y-axis in 2025, no measure of latency, no axis labels), the story from OpenAI is the simple takeaway — better models at faster inference speeds, which are proportional to cost. Here’s a price comparison of the new OpenAI models (Gemini Pricing, OpenAI pricing):* GPT-4.1: Input/Output: $2.00 / $8.00 | Cached Input: $0.50* GPT-4.1 Mini: Input/Output: $0.40 / $1.60 | Cached Input: $0.10* GPT-4.1 Nano: Input/Output: $0.10 / $0.40 | Cached Input: $0.025And their old models:* GPT-4o: Input/Output: $2.5 / $10.00 | Cached Input: $1.25* GPT-4o Mini: Input/Output: $0.15 / $0.60 | Cached Input: $0.075To Google’s Gemini models:* Gemini 2.5 Pro* (≤200K tokens): Input/Output: $1.25 / $10.00 | Cached: Not available* Gemini 2.5 Pro* (>200K tokens): Input/Output: $2.50 / $15.00 | Cached: Not available* Gemini 2.0 Flash: Input/Output: $0.10 / $0.40 | Cached Input: $0.025 (text/image/video), $0.175 (audio)* Gemini 2.0 Flash-Lite: Input/Output: $0.075 / $0.30 | Cached: Not available*As a reasoning model, Gemini 2.5 Pro will use many more tokens, which are also charged to the user.The academic evaluations are strong, but that isn’t the full picture for these small models that need to do repetitive, niche tasks. These models are clearly competition with Gemini Flash and Flash-Lite (Gemini 2.5 Flash coming soon following the fantastic release of Gemini 2.5 Pro — expectations are high). GPT-4o-mini has largely been accepted as laggard and hard to use relative to Flash.To win in the API business, OpenAI needs to crack this frontier from Gemini:There are many examples in the OpenAI communications that paint a familiar story with these releases — broad improvements — with few details as to why. These models are almost assuredly distilled from GPT-4.5 for personality and reasoning models like o3 for coding and mathematics. For example, there are very big improvements in code evaluations, where some of their early models were “off the map” and effectively at 0.Evaluations like coding and mathematics still fall clearly short of the likes of Gemini 2.5 (thinking model) or Claude 3.7 (optional thinking model). This shouldn’t be surprising, but is worth reminding ourselves of. While we are early in a paradigm of models shifting to include reasoning, the notion of a single best model is messier. These reasoning models use far more tokens to achieve this greatly improved performance. Performance is king, but tie goes to the cheaper model.I do not want to go into detail about OpenAI’s entire suite of models and naming right now because it does not make sense at all. Over time, the specific models are going to be of less relevance in ChatGPT (the main thing), and different models will power ChatGPT than those used in the API. We’ve already seen this with o3 powering only Deep Research for now, and OpenAI only recently walked back the line that “these models won’t be available directly.”Back to the ChatGPT side of things. For most users, the capabilities we are discussing above are effectively meaningless. For them, the dreaded slider of model effort makes much more sense:The new memory feature from last week got mixed reviews, but the old (simple) memory has been something I really enjoy about using ChatGPT. I don’t have to remind it that my puppy is a X week old miniature schnauzer or the context of my work. This’ll continue to get better over time.This feels extremely similar to as when I didn’t really notice when ChatGPT first added the search option, but now it feels like an essential part of my use (something that Claude still hasn’t felt like it does well on). Claude was my daily driver for personality, but with great search and a rapidly improving personality, ChatGPT was indispensable. Still, Gemini 2.5 Pro is a better model, but not in a better interface.I strongly expect that the memory feature will evolve into something I love about ChatGPT. It’ll be much easier to ask ChatGPT to remind you of that thing you found a couple months ago than it would be to try and parse your Google search history.Some were skeptical of these new memories from crossing personal and work uses, but I think with search, this is easy, rather than algorithmic feeds that try to balance all your interests in one. The funnel is per use, and interactions are more narrow and seem easier technically to get right.A final related point — people have long balked at the prices of chat interfaces relative to the API, but the reality that is fast approaching is that the personal experiences only exist in the app, and these are what people love. With the API, you could build a competition that accumulates its own interactions, but as OpenAI has a huge product head start, this will be an uphill battle.All of this reinforces what we know — products are the key to developments in AI right now. Memory and better separation of the ChatGPT lineage from the API helps OpenAI pave that path forward (and maybe do advertising, especially with memory), but we have a long way until it is fully realized. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe

NOW PLAYING

OpenAI's GPT-4.1 and separating the API from ChatGPT

0:00 7:21

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Hardware-Conscious Data Processing (ST 2023) - tele-TASK Prof. Dr. Tilmann Rabl Hardware development continuously advances, with different technologies improving at different pace. While the amount of transistors in a CPU package are growing, the single core performance is stagnating due to physical limitations. These trends require changes in data processing to keep database management systems efficient. In this lecture, we will take a look at current computer architectures and accelerator technologies and how they can be used for efficient data processing. We will cover CPU and memory architecture; the storage hierarchy; modern memory technolgoies, such as NVM and NVMe; fast interconnects, such as Infiniband, RDMA, and NVLink; and accelerators, such as GPUs and FPGAs. The course has a significant practical part, where the students learn to implement data structures and algorithms tailored to hardware concious data processing. Musical Tourism Synapset Synapset is a blitz collective formed in Barcelona, over a week in the beginning of April 2010 by Synapskollaps and reSet Sakrecoer. This album is based on experimenting with the risk of taking opportunities in life and reproduce them with machines. It questions the space existing between people and how music interconnects them. This album was written, recorded, mixed and mastered in 7 days.It's core formation is Synapskollaps and reSet Sakrecoer, with special appearance by Dr.Tikov and MC Charlot. Recorded In The FragleRock Studio v2.59, Barcelona. Cover photo by Patsy Boop, Edit by the Sakrecoer Design Robot. Mastered By Dr. Tikov9 tracks of pure kick and base!"Including amazing holiday pictures, healthy Sub-Vibes and pure feelings." - Basspistol.com"Congratulation on the release" - Goodkarma.ru Audistorium Stygian Catalyst Audistorium is a multi-genre spanning dark anthology audio drama created by Landon 'Lemon' Whisnant. From dread horror to absurdist comedy, Audistorium weaves a web of its own that interconnects It's stories in its own macabre, sometimes goofy way.Produced by Stygian Catalyst and co-creator of the Questionable Guide to Life Podcast.At the caring chiding of those close to us, we have decided to open up a way for people to contribute to the shows production, for the price of a simple cup of coffee, you can support Audistorium by clicking here for our Ko-Fi page.For contact, email us at [email protected],We can be found @AudistoriumPod on TwitterYou can find Landon <a href="https://open.acast.com/shows/653838418299010011ba94bc/episodes/@https://twitter.com/Lemjam The Undisputed Truth. Lily Stinson The undisputed truth…is within you.We’ll be diving into resonance beyond words. The truth we’re all searching for——LOVE. Simple. Direct. Digestible truth❤️ I’m not here to dull myself down and neither are you! A peak into limitless creation—- hosted by Lily (love)! I will reflect the truth within you——what interconnects and intertwines us all. Love. The simple truth humanity has forgotten about—-the cure of it all. The lion sleeps no more.

Frequently Asked Questions

How long is this episode of Interconnects?

This episode is 7 minutes long.

When was this Interconnects episode published?

This episode was published on April 14, 2025.

What is this episode about?

https://www.interconnects.ai/p/openais-gpt-41-and-separating-theRecently I gave another talk on RLVR experiments and I posted some thoughts on OLMoTrace — Ai2’s recent tool to let you look at the training data of OLMo 2.OpenAI has been making many...

Can I download this Interconnects episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!