OpenAI and Codex with Thibault Sottiaux and Ed Bayes

What this episode covers

AI coding agents are rapidly reshaping how software is built, reviewed, and maintained. As large language model capabilities continue to increase, the bottleneck in software development is shifting away from code generation toward planning, review, deployment, and coordination. This shift is driving a new class of agentic systems that operate inside constrained environments, reason over

of MATCHES

TRANSCRIPT · AUTO-GENERATED

AI coding agents are rapidly reshaping how software is built, reviewed, and maintained. As large language model capabilities continue to increase, the bottleneck in software development is shifting away from code generation toward planning, review, deployment, and coordination. This shift is driving a new class of agent systems that operate inside constrained environments, reason over long-time horizons, and integrate across tools like IDEs, version control systems, and issue trackers. OpenAI is at the forefront of AI research and product development.

In 2025, the company released Codex, which is an agent-type coding system designed to work safely inside sandboxed environments while collaborating across the modern software development stack. Tibo Sotio is the Codex engineering lead, and Ed Bayes is the Codex product designer. In this episode, they joined Kevin Ball to discuss how Codex is built, the co-evolution of models and harnesses, multi-agent futures, Codex's open-source CLI, model specialization, latency and performance considerations, and much more. Kevin Ball, or K-Ball, is the vice president of engineering at MENTO and an independent coach for engineers and engineering leaders.

He co-founded and served as CTO for two companies, founded the San Diego JavaScript meetup, and organizes the AI and action discussion group through Latent Space. Check out the show notes to follow K-Ball on Twitter or LinkedIn, or visit his website, kball.llc. Hey guys, welcome to the show. Hey, thanks for having us.

Yeah, I'm excited about this one. You guys are doing some really interesting stuff, and I want to dig in. But let's start with you a little bit. Can you each give a little bit of your backgrounds and then how you got involved with Codex and what you do there?

Yeah, I'm a product designer on Codex. I'll be opening after just every year, and before that, I worked on robotics and generally kind of the intersection of design and research. And yeah, I've been on the Codex thing for about six months, and with each one of release, each product has just got more and more into putting side and excited to chat about how we use on the scene today. And I'm T-Bo.

Join about the same time as you actually been thinking about AI intelligence systems for as far as I can remember. It's one of the first programs I tried to write as a kid. And then over time, it got more and more fascinating. I feel like two days, it's come to life, right?

Where I finally have the thing that I was trying to build, like when I was seven, where actually I'm able to type in my terminal and get an intelligent response back and have this little assistant in my computer. So it's actually a while to think about that, that it's come true. But yeah, join up now about a year and a half ago. One of those was possible.

We didn't really have reliable agents doing work over many, many hours in periods of time. And so I've been tinkering with that at OpenAI since I felt that models were actually capable of that. Late last year, it kind of became obsessed with this idea that model capabilities were continued to evolve. And it was really about getting the right infrastructure and product around it so that we could continue to benefit and have that step change in utility that you can get from the models compared to just being able to chat with them.

Kind of felt like chat was a bit saturated and we're able to express a lot more things. Evolved over time, there was a lot of prototyping earlier this year, and it really came together as a team. And now we're pushing on Codex with quite a few people over here. And it's more exciting than ever I would say.

Yeah, I definitely have felt that kind of acceleration across the board in the last couple of years that it's just wild to experience in our industry. I'd love to actually dig into a few of those different pieces. And one of the distinctions you made there is around kind of the models, the model capabilities and their advancements, and then the infrastructure and the harness and all these different pieces around it. So I'm curious from your perspective, how you and the team think about what is the relationship between those two, how do they connect and feed back into each other?

Yeah, that's a good question. I mean, I think on the research side, on the infrastructure side, I defer to Tivo, but I think one of the really interesting developments that's happened over the past, say six or seven months is this kind of like co-evolution of the model on the harness. And I think it's really come together in our products. And that if you use our models in a harness, it's kind of different from then if you use it elsewhere.

And I think that's really exciting. This is a product person as a designer, the idea of like not just building a model that you can use an API and kind of shows up elsewhere, but really co-evolving these two together and all the incredible things that that can lead to. Yeah, definitely that element of like co-evolution. And that co-evolution is happening at many levels.

There is co-evolution of the harness and the model, co-evolution of like the products that need to develop a really rapid pace right now. It definitely doesn't feel like we have yet figured out the ultimate form factor of how you interface with an ever more intelligent system that is doing all these things for you on your behalf. But if you think about the harness, it's really just your body, right? You have your brain, you have your body, like how you end up acting upon the world around you.

And then there's a little bit more to that as well, which is how you act but like safely. So one of the things that we do like out of the box is, it's inside a sandbox. It's the network access is restricted. The file system access is restricted.

And this is really important because it allows the model to experiment and touch its environment, but without potential like negative consequences. And this is an important topic where we view coding engines very much under the lens of alignment and safety. And so there's this aspect as well like where does the harness stop and where does it start to be the world? But definitely seeing that when we think about the two together, we get much better results.

And I think this will continue to be true. And then there's this separate aspect of like, what is the right interface to this agent? And that's, you know, where products really come in and delights. And I think that, yeah, this will definitely need to continue to evolve as we have like agents that just are never interrupted and just run forever.

But that's going to be a whole lot of game at that point. Sandboxing is an interesting one to maybe just pinhole down into for a minute there because I think one of the things that stands out to me, I use all the agents, at least as an aspect of research. Some of them I use every day. I was using codecs to solve a problem for me earlier today.

And some of them I just try and then I say, you know what, you're not ready or I'm not using you. But one of the things that stands out about codecs is the strong sandboxing model. Everything is sandboxed to begin with. And that is both good and sometimes frustrating and can cause some awkward user experiences.

So I'm kind of curious how you think about that balance and how you see this sort of safety question evolving across the ecosystem here. Yeah, it's a really good question. I mean, I think like from a product perspective, from like a user experience perspective, as you say, that's where some of these tensions surface from, you know, you're always being asked to approve send commands. Ultimately, these agents are extremely powerful.

So we have great sandboxing, great safety features. And that's a core part of the product as well in terms of why you might use codecs about others. So within those constraints, I still think there are some interesting things that you can do around user experience to make it a little easier and put some control in your hands. So users can change their sandbox permissions.

They can change the kind of mode. If you use our product in the ID extension, you can basically choose between agent mode, where it will go off and make changes in your working directory or kind of this re-only mode, which is a little bit more restrictive and will ask you for missions in many areas. But I think one thing we recently released which I think is as you go along, as you kind of approve certain commands, we give you defined grain control over what exactly which commands are you approving, how will they be saved into your kind of config. I think exploring as well, whether it's between you and your team.

So I think ultimately giving users control, but still maintaining that really high threshold. Yeah. And if we take a step back of where we started with codecs, it was codecs on web, sometimes you're for a test codecs slide. But we start with the idea of all of this should happen in a safe environment.

So it's like a completely isolated virtual machine with its own sandbox. We use catacontainers under hood. And then from there, we decided to actually bring that to your machine, like through the codecs UI and the codecs code extension. And then definitely keep true to that principle that it should be safe by default.

It doesn't matter how convenient it is to run outside of a sandbox. Ultimately, you are giving control to a very capable and intelligent entity to do whatever, if you were not using a sandbox, to do whatever it would want to do to your machine, using your own credentials and having any consequences that this can carry. And by default, we prefer to be safe. Obviously, there are use cases where you don't want to use a sandbox and we do caution against that.

But it's also something that we do support if you do know what you're doing. Yeah. Well, and I will say codecs has never tried to delete my database, which is not true of every coding agent I've tried. Sometimes, you know, it can be that the agent maybe does something inadvertently and that has negative consequences on you as a user.

It could also be that it's been instructed. There's obviously like prompt injections and autorisks to think about. But ultimately, like if you do give control to an agent to something that's quite sensitive, that you'll either like want to have it deleted or take any other nefarious action, that is something worth thinking about as a user. And we do really feel like the responsibility that we have there to make sure that there are no like unintended negative consequences.

The thing that you mentioned in terms of the different ways of running codecs brings me back to another thing I'd love to hear from you guys, which is how do you use codecs internally? Are you running it all through codecs, web or cloud, whichever one you're calling it now? Or do some of you use the IDE? Are you like, like, I am?

How does that play out internally? That's a very good question. It's a bit of a meme, which is everything is codecs, right? We have a bunch of codecs models, you have codecs web, and then we have the CLI product.

And we kind of think about the same coding agent that shows up in different spaces. But internally, it's been really cool to see how it's evolved every time. So, as Steve said, we initially shipped the web product earlier this year and got great excitement internally for this. So a lot of teams, I think the really cool thing about it as well is you can actually get a hub and your team settings, and you can go in and you can not touch a line of code, and you can ask for something.

It can do pretty amazing things. So that's super empowering for perhaps a UX copy team or maybe go to market, like want to change some string about pricing. They can do it themselves. So, I think that's one of the first few cases that we saw.

And then I think to see a lot is really popular, we have a bunch of incredible developers across the company and developers often live in the command line. So, I think that's become really popular. But also, personally, I use it in the ID extension a lot. I prefer the GUI, I prefer being on the flip-around.

I also, that's just my kind of go-to-development environment. But some other really cool things as well that I think we've seen recently, we've shipped a linear integration, we've shipped a Slack integration. So, what you will often see as well now in threads is you might be chatting back and forth maybe about a piece of customer feedback or some you feature that people are discussing, and someone can just hop in and kind of add codecs. And basically, that will kick off a task in the background.

It will route it through all of our different Slack or a linear, and it will just ping you with a task that you can click, you can open it in the web. So, that's cool. It's kind of seeing it surface within threads and you can assign issues as well in linear as well, which is super fun. So, I'd say, yeah, it's kind of one of those everywhere things.

But I feel like the scale is pretty popular. Yeah, there's a lot of different use cases like among technical staff. But we also have a lot of like, ambient intelligence where it's sort of like all around you, including code review, where every single PR that is written in the LBI, and I always reviewed that codecs. And it's sort of access to safety net where it's hard to think about the world where we wouldn't have that safety net anymore, given how many critical flaws it catches every day.

And like how much time it saves, it's really able to go much more in depth than the time that we had, like when we're reviewing each other's code, especially now that like generating code is so cheap. But the cool thing as well is it's not just about technical staff, like more and more people across the company are using this tool to do a lot more than just writing code. Yeah, I think one really cool trend that I've seen over the past few months is within the design team. Right, we have a few of these Slack groups, like these work in progress groups where people post work.

And I've kind of seen this basically slow, well, not that slow, like change over the past few months from static images from Figma to these interactive prototypes, even sometimes links that you can click into and use yourself, which is cool. And I've DMed a few people who post them and was like, I didn't make a code. And they're like, I couldn't until I tried codec. So this is range, right?

And so it's for professional software developers who obviously have very high bar of code of view standards to go through. It's for these throwaway prototypes that designers can play with. So you can test responsiveness in all of these edge cases that you can't in like a set of prototype. And it basically kind of like collapses the boundary between this which has been artificial over the past 50 years or so, or the kind of recent history and technology because of these disciplines and certain, often even in organizations, boundaries of, you know, this staff can access this technology.

And it's a great equaliser. So it might be worth us going through, you mentioned a few different points in the software development lifecycle where codecs is sort of taking place now or speeding things up or simplifying or collapsing boundaries. Have you thought rigorously across that whole lifecycle? Like if we look at our industry, we're all trying to figure out how do we adapt?

I think the process of developing software has probably changed more in the last year and a half than in my 20 year career, like before that. It is wild. So how are you adapting across all of those different points using codecs? It's maybe goes back to that coalition where you can do a lot of from first principle thinking and trying to understand like how exactly you should structure the teams and the work to your best benefit from this as it's going.

Or you can just stay very flexible and learn every day as you're going evolving the ways that you work as a team, as an individual and an organization together with a coding agent. And that's definitely a lot of what we're seeing where, for example, small teams that have a lot of energy and ambition are able to achieve so much more and are highly effective because they can iterate and learn much faster. We've seen this with Sora. We've seen this recently with Atlas as well.

Where entire parts of the code base were able to be spun up just based on an idea and a few individuals that were really steering a whole series of codecs agents. And then also, it's clear that bottlenecks are moving around. So code generation is almost maybe solved right now. And the bottleneck is moving to code review, moving to deployment, also moving to planning and bringing in a lot of these ideas and the user feedback.

And we're thinking about how to solve those bottlenecks. Like what the product seems like we're definitely not just focused on code generation. This is why we started to invest very early on in code review because we identified that this was going to be a bottleneck. So there's a lot to the story or to the pictures.

Some of the bottlenecks we anticipated beforehand, some of them were like, ah, we hadn't really thought about this. And now this is breaking because everything else has gotten so productive. Yeah, I think the thing that has really surprised me since joining OpenEye is just how small some of these teams are that build these products that reach billions of people. You know, I remember chatting to a designer who was on, I think, was deep research on these products.

And it's like, you know, one, one designer, a few engineers, a few researchers and it's kind of purposefully small by default. And I think internally, the way that we're able to do that is that we're kind of evolving as co-workers with models as well, right? You know, we're building models and we're able to access them immediately and really integrate them into people's workloads. So I think that's very cool to watch.

And also, yeah, the way that we're building products is we're building for professional software that means thinking through the entire lifecycle of product development, which CVS says, it's not just writing code, it's the planning process at the beginning, right? So it's using tools like linear or Slack and meeting people where they work, where they speak, where they plan work, integrating coding agents there. It's about the code review point as well, which T was already spoken about. So I think like, I think an interesting thing to look at in the future is thinking through what is the full lifecycle of a software development cycle and where can you support beyond just code generation.

And there are parts there that are easier to crack going back to the safety and the sandboxing part of the conversation as a clearly code generation there. And it's easier for it to happen in a sandbox if you're thinking about what happens next around deployment and being on call to a service like now you enter a whole realm of this agent. If we want intelligence to be driving this, if we want agents to be driving this, they need to act in a way that also carries a lot of risk. And like, how do you do this?

How do you achieve this? This is like still very much, I think like an open question of like how to achieve this safely. This kind of goes to another question I have. So as we talk about applying this in a wide range of things, one of the things that I've definitely observed in my work and working with a bunch of different things is that different models seem to be better at different things.

When GBD5 came out, we do a lot of work and go and it is phenomenal at working with go. It is phenomenal. Can, can, can, blue, every other model we were using. Sometimes less good at working with HTML and CSS.

And we still sometimes go to other models, maybe even non-open AI models for some of that work. How do you think about the sort of multi-model aspect of this and the extent to which are you aiming for a model that can do everything and you go to the right things? Are you imagining a multi-model future? Like, how do you see that ecosystem playing out?

We're definitely aiming for like a holy grail, like one model that is spectacularly good at everything. And then you don't need to everything again about which model to choose. In practice, what we do think is going to evolve into is like more like a multi-agent type of world where you don't necessarily have to be the one deciding of like, hey, what is the right underlying setup of which precise model, which configuration, which tools in order to achieve that job. You know, maybe you will get held there as well.

And realizing that as much as humans also collaborate in order to achieve useful things in the world, maybe it will also be the same for agents where they have to collaborate together and use the specific strengths that they have. There's a whole series of issues there of like, as a model, like how do you disclose your strengths? Is it something that the model even knows? Is it like intrinsic to the model and I can knowledge that the model possesses?

Or is it something that needs to be discovered by you as a human or by other models in order to be able to understand like, hey, this is actually the strength of this particular setup versus this other setup which achieves maybe similar results at lower cost or maybe this one achieves like better results but like higher latencies. And so there's like all these trade-offs where I think it's going to be this beautiful world of collaboration between agents but hopefully also much simplified for you as a user. Yeah, I think that the kind of meme in the design world is all designers are redesigning the composer, right? And trying and work with this tension of how much you expose the capabilities, right?

You see from modes, these different amazing things that they can do, like image generation, for example, you know, a model like for always like natively multimodal. So you can just ask it and it will do it, right? But like how do you expose that in the UI? The same with the model picker, right?

This meme as well as we kind of go back and forth, not just as everyone. And you list out a thousand different options and you yourself have tested them. So you know which one is exactly right, your use case or do you simplify it? Steve as well, I think like obviously we're aiming for this kind of the ideal, the single model.

But yeah, how we get there is to be. Now, if I open up, I have Codex CLI running here and I do slash model, I see a list of five. So you're clearly not falling into the show, everything. One differential I'm going to ask about here is like, GPT, I see, for example, it's defaulting to GPT-5-1, Codex Max.

There's also 5-1 Codex, there's also 5-1. If we were to like peel back the covers, how would you describe the difference between any five generation and the Codex version of that? So where we started when we got significant traction with Codex and Codex CLI was roughly like three months ago when GPT-5 came out. Just saying that, I don't have to do like a double thing that was like three months ago.

Three and a half months ago, GPT-5. And then we have been training, like we have been training on the side of another model, which was even more effective, like specifically within the Codex harness. So this is how you think about it. It's like you have GPT-5 and then you have GPT-5 Codex and GPT-5 Codex is a version that will be more at ease within the harness that Codex provides and be able to achieve better results.

So this is always like the model that we recommend. You have the same for 5.1 and 5.1 Codex and then with 5.1 Codex Max, that was we were able to have like a few research breakthroughs that we packed into that model which made it even more effective and able to like work for longer and we published a benchmark there as better results across like a different tier. So like able to achieve stronger results but also using fewer tokens and being cheaper on average which allows us to just pack like a lot more in the same subscriptions like whether you have a plus or a prescription, you just get more out of it. And at the end of the day, it's like really about how much economical value are you able to achieve, right?

In a unit of time or a unit of cost and this is really what we're striving to provide and we've restricted the model picker to the few models which we think work very well in Codex and then there's a default as well which that's the one we recommend by default for folks like if you don't just really want to think about it, just use a default and you'll be well off. And what goes into making the model, you said it works better with the Codex harness and I will say like within the Codex CLI I always use the recommended and it seems to just work. That's great. When I'm often using for example cursor, I will also use GPT-5-1 or whatever and actually in that context I found just like the bare model 5.1 often works better for me than the Codex model.

What is it you're doing that's connecting it to that harness? It's actually really thinking about that co-evolution of the harness and the model and thinking about it as one entity and one agent fundamentally what we're building as Codex, the Codex team is like an agent and then we figure out where to put it to work and agent isn't just a model itself, the agent is the model together with the set of tools and the way that it's going to handle its context and be able to think and reason through which actions it should take. And it's pretty clear that if you co-evolve and co-train like these three things you can achieve better results which is like what we're achieving. I think one cool thing as well is Codex, the CLI product is completely open source.

So to your question of what's going on in the hood, the great thing and we have a really vibrant open source community who country a lot of great ideas and issues and you can just go you can look at the system prompt. It was also funny to think when we released the new model there was this tweet which was like a system prompt leak that's like yeah it's in the open source repo. So I think in terms of utility capabilities or tools you don't have a lot of which I think is super exciting. There's a lot of effort and research that goes into what are the optimal tools in order to get the results that you want and oftentimes we're actually quite quite like how simple the harness is and how simple the set of tools is this is something that we strive for is that simplicity.

Being able to have the harness scale with the continued levels of capabilities jumped that we expect to see over the coming months and years is something that if you don't optimize for it eventually sort of like comes back to you because you have hyper optimized something in the short term that doesn't scale with continued capabilities improvements. And then by being so close to Codex, we run it as like one unit we have product we have engineering we have research we all sit together ideating a lot and using some techniques from research to put in the harness and using parts of the harness and using that in training. And so there's just like this side guys there and like the sharing of ideas and always zooming in on what will make the agent perform better as one unit. It's not about optimizing the model in isolation it's not about optimizing the harness isolation it's finding that combination that works the best together.

And that's what the Codex series of model offers as well is like that guarantee that we have considered how well it actually works in Codex and that's like the best that we can do. If it's not too much secret sauce how do you consider that is that related to like the reinforcement training that you're doing is a different you know initial data set like what is actually causing it to behave differently there. It's really about thinking about the model as not just needing to be intelligent but needing to be an efficient agent. If you think about what an agent is it's going to be a model that gathers its own context and an accident in its environment in order to achieve a goal.

And so if you set yourself to train a model to be extremely good at that and be an extremely good coding agent you're doing different trade offs you find out you're able to take different trade offs at the research level. You know be it like a post training or the RL or the specifics of the training which we're not going to go into but the trade offs are there. And so you're able to achieve like efficiency gains and like move up in the performance curve. Now I mentioned some of the models I see there's one other model that I see in this list which is GPT 5.2 which I think was not there when I looked at this a week ago.

So what's that about? It was just released yesterday. It's been very successful. More so than maybe we anticipated we were actually like some of the team was up like all night like shuffling computer on and making sure that you know it kept working and we were achieving the latency target.

But that we have for agents the latency of the model and the reasoning and exactly where the compute is is more important than ever before. This is because you have that latency element between you know that GPU that we run somewhere and then your computer where you know the tool calls are run. So you have always this back and forth and then obviously like if the model is able to perform and like we're able to sample more tokens per second you know that's going to translate into like a shorter amount of time to get that result. 5.2 is a particularly exciting model launch I would say.

It's like a significantly higher jump than you know what one might expect like from compared to like 5.1. GDP file you know captures this fairly well. I think a lot of benchmarks these days are saturated but like a good way to think about it is you know economical value that you're able to create in the world and GDP value you know I think we see like a more than a 20% jump there. So definitely recommend trying it out in Codex.

It's quite exciting. Depending on when this podcast goes out you know you might have something even more exciting to try out but we'll see about that. I think that is interesting and thinking about that like interaction between local and data centers. So like how are you all I mean some of that I'm sure is proprietary but like how are you thinking about locality in this are you pushing compute what does that look like for somebody at the scale you're at?

The closest to compute is you know to your laptop if that's where you run like Codex CLI like the better. It's because you reduce that. Another way of doing it is you know to bring the computer environment closer to the compute right so to bring like for example virtual machines and you know have those effectively be as close to the GPUs as possible that's the approach that we take with Codex web but then if you're running locally within you know your VS code extension and you know the agent runner is effectively running on your machine then you do want that GPU you know to be as close to you as possible. So there's an element of like you know where in the world is that running for you and you know sometimes you might be better off if you're like you know somewhere in the middle of nowhere on an island.

We're not like running a data center there's like you're gonna feel that actual agency. Let's actually dive in a little bit to the guts of the agent because I think you know one of the things that most of the software development world right now is trying to figure out how to build effective agents and I think coding agents are really at the front here they're pushing the edge of what that looks like. So can we kind of break down just like first the very high level pieces that you think of that go into this agent the software layer not the model. Yeah where are the higher level pieces of we touched a lot on it already so you have the model and the inference that's going to be the intelligence that's driving the rest of the software stack and so you have this interesting combination of a piece that is non deterministic and a piece that is deterministic at least for now a lot of the harness is considered to be deterministic and it's quite simple if you look at it under the hood like it's all open source for codecs is like there isn't that much magic it's a for loop and then a bunch of tool calls and then tools that have been designed to work well like for coding and it's a pattern that you can apply essentially to you know any other discipline and any other agent is that control going from the model back to its environment executing an action and then taking what's been observed in the environment and then pushing that back to the model in order to decide the next action and then doing that over and over and over again you know maybe hundreds of times until the point where the model believes that the desired outcome has been achieved and decides to stop.

So at the very beginning you have a prompt you know or like an intent from a user then you give control to the model decides on like the next tool call goes on and on and on and at some point has you know achieved or is unable to achieve and decides to yield back control and that's when the agent has finished its job but the lightfully simple is just a couple of tools for loop and then you know a model that's like given control or not but the really exciting thing is not done I think it's really the products around it that you know allow you to have control steer and supervise those agents and then as well as thinking about the agent being its little system that will continue to evolve and being increasingly more complicated and being able to perform increasingly complex works it's not maybe a single agent that's going to be at work maybe it's going to be like multiple agents that are going to be at work but I think like a really exciting thing is like how do you interface with this ever more complex system that is doing work on your behalf. Yeah totally and if you think you know that there are some parlors I think if you look at JT and when it was released right it's like very very simple you go online as a text input you type some of some intense a message and the model respond back but as Deeba says in this world where we have an agent loop and an agent is carrying out work for you maybe it's delegating to other agents it's collaborating with other agents it's you know speaking to external agents even I think the user experience changes it goes from this back and forth to a little bit more like how we interact with other humans in the world today right like if I ask Tebow to you know get me a glass of water it's going to take a little bit of time he's got to do a bunch of stuff right if I'm collaborating with a colleague if I'm collaborating with a colleague I might ask them to do some significant tasks so you know build some new infrastructure project it'll take time they'll have to go out and coordinate so I think we're moving from to kind of longer and longer tasks with more and more complexity and you know models are increasing in capabilities and I think the interesting question from a product perspective is then how do you design those interactions in a way that is simple maintains simplicity also fits into you know everyday work those nowadays and also exposes these incredible capabilities of the model as well in very simple so one of the things that's interesting to explore in this domain of like user experience of agents especially is going on is like when I wrote code in the olden days right it was doing multiple things for me it was creating this runnable artifacts that somebody can interact with okay that's great that's useful it was updating my mental model of the system that I have that I'm working with and it was also doing some kind of like problem solving and updating my mental model probably of the user's problem or at least how to map that user problem into my system and so there's these like cascade of mental models that I'm updating as well as the final artifact that's being generated so now as we get into this world where we're delegating more and more of the work of generating the artifact there is still this like very real need for us to update our mental models across the board so how do you think about or see that working in the sygetic world how does the product facilitate it what does that look like yeah I think that's a super important point and it ties a lot into in my mind of what we've seen people use coding agents for primarily over the last you know say like six to three months which has been a lot to solve and write code for them on their behalf I think there's a much deeper role that agents have to play in the future which is to know hey what do you care about how can it help you understand the state of the world efficiently around you like maybe you should send you like something every day about here's how the codebase changed here's what users are thinking about the product here's how to you know really explore this topic a little bit more and so you go much further than just a code generation you're helping with planning you're helping you know with ideating you're helping with understanding user feedback you're bringing a lot more context into play than just code itself and in a way you know if you were just to focus on code generation you would you know miss out a lot on like the opportunity here we're thinking about like this broader set of things that we can help you know people I think it's going to be ever more important like maybe co-generation actually will be like a very small part of what agents end up doing for you and we're definitely thinking about this at the product yeah yeah I mean it's interesting like a very small maybe example on the team is you know I think when a new starter comes on board right it often takes a long time to get used to a codebase you have to really get to understand it but yeah as well as writing code I've seen new engineers on the team just speaking to codecs and really deeply understanding the codebase going back and forth and you know that means that they don't need to happen to their colleagues shoulder as much anymore if they do it's for like some really high value touchpoint but yeah as you know I've seen people use it for all sorts of things writing notes code understanding so it's yeah really beyond just code generation yeah and here's this awesome thing about like giving codecs to someone who just started on the team and be like hey explore the codebase you know with the help of codecs and then we barely write documentation like how things work because that's just in the code itself what we tend to document more is you know why things exist it's like there's going to be an evolution there as well like how do we maintain so like the knowledge base you know like how much of it is redundant definitely when you have sort of intelligence and a little buddy that you know you can just like send off in order to explain something for you but you tend to find that you know maybe it also shifts like what you want to write down yeah absolutely well and there's kind of an interesting thing there like one of the techniques that we found works really well for us with agents is actually documentation that is like maybe transient in some form so it's like here's this problem that I'm solving gather all the relevant pieces and documentation link to the relevant files so that I have like one short piece of context okay now use that to get me to a solution on this particular thing and so it's like much more temporary documentation than permanent documentation but giving the agent this like map that it can work with yeah is this more like a design doc where depends how to differ so we can dive into this in a couple different ways so I'll use a very quick example of one of my common practices and I've done this with codecs or other agents so I have a problem to solve and I know like roughly the area the codebase evolved but I don't want to maybe I don't know it that well so I'll say to codecs for example hey I'm going to be wanting to muck around with this subsystem please do an analysis of how that system works today right me know document that includes you know filings and symbols and all these other things and conceptually to me what I'm doing is I'm creating a map of the territory it's like essentially a context condensation right it doesn't need to read all those files all the time but it needs to know where roughly everything is so when it needs something it can pull it okay now I have that subsystem and I say okay I'm looking for a solution that looks something like this can you map out like three different variations of that have an argument about which one's better whatever makes some characters do this kind of map out the solution space now I have these two very rich documents and I can say okay based on these look at this look at this pick which is the best solution right me an implementation plan okay pretty good break it down instead of test lists go so I'm in some ways manually managing the process of this but kind of guiding it towards here's all the relevant parts of the codebase with me and the loop often so it'd be like you missed something over here you got to go look at that again or something along those lines the workflow that you're describing like is extremely powerful and it's all based on files and you deciding on like no hey this workflow is actually very useful for myself and you discovered it by so you know maybe talking to other people and it is like sort of sharing element right now like you know recipes of like how to work with agents it's not necessarily that the product is like prescriptive about it it's delightfully like open-ended actually right now where you can ask for it to do anything for you and as as creative aspect of like you know what what you actually ask it like you know how can you help you and you know maybe it's through this like you know complex workflow like planning and then adiating on like you know different options and then going and like performing some implementation of it we like that a lot you know that flexibility and we try to also be very mindful when we introduce more opinionated frameworks into the product that you know could also sort of like restrict you know that flexibility that's definitely not something that we want another interesting angle as well is just seeing how some of the maybe like more non-technical people across open the eyes or in different disciplines from just traditional software engineering so there's one person that I know who basically uses it for everything he uses right documents you know on the design team I know a lot of the designers and the product managers you know they might do some coding but they'll do a lot of things ideation as you say planning some very cool things as well on the like data science go-to-market side a lot of just like teacher analysis right crunching through numbers you know looking through csv so yeah and I think like from a product perspective you know we've been you know deliberately opinionated about keeping things simple just like chat ubt right it's this you know general purpose in face that you can go to and you can ask you to do anything with chat to me that might be you know generating images you know answering a question in the internet and I think for us you know the amazing thing about coding agent is like extremely general purpose so you really want to keep it as simple as possible and then let the user you know that creates a good one while so on that you know and looking forward a little bit one question I'd love to ask is are there any plans for enabling like an inside codec SDK or like hooks or some other way to generate because like for example mentioned I have this workflow which figuring out like oh I want to steer it in this way to fit in my workflow it would be great if for example every time there was a context poll or something like that it reinforced this or other different ways to kind of nudge and control tightly control the context that's going on to fit as a terick workflows that you don't want to have in the core agent yeah when you're getting guys like really what we're seeing with like the power users even including within opening i like some of our most prolific users maintain their own fork of codecs that's like one of the awesome parts of it just being code that's open source as well if you want to change it you can just forward a code if you happen to be advanced you know that shouldn't be too daunting either codecs can help you like change it you know in a way that's like productive to you as well it is written and rust you know sometimes we get some comments on that but we want it to be like very robust and like perform it as well like it's quite delightful when you just have you type codecs and it opens instantly that's what we get from like putting like a lot of effort into that hooks are like something that we're debating like you know we'll get there eventually what we're super excited about right now is like building the right set of like primitives for the agent to be able to perform like increasingly complex work and so you can think like you know what will happen if you're able to run an agent for like you know an entire day or like maybe an entire week and you know steer it as it goes you know is that like a different thing does that require like different you know product thinking and then we touched upon you know multi-agent as well and this is something that we think is extremely exciting and is going to emerge like you know in 2026 for sure as something that is not at a prototype stage like you're seeing across the industry right now we're like you know maybe you know folks are like excited about their little sub-agent but it's going to be like these really robust networks of agents collaborating together in order to achieve like something for you that's the kind of stuff that we're really excited about right now hooks maybe at some point yeah nothing massive I just to say you know we have a codecs SDK so you know it's possible now we have a documentation page up and you can start to play with it but yeah to see most point I think there's a century between you know like catering for you know very specific workflows and then thinking about what these primitives are these building blocks so that you can build on top of that and build some you know three carbon prices what would you say the missing primitives right now are we have a long list of github issues some of them are like you know top voted those are actually the ones that we tend to prioritize so one of the things like really that's been requested like sub-agents so we're like we're actively working on like how to how to think about like multi-agent networks and then a lot of it is still product overhang I think where you know it's not about the agent itself but like how can we make the product more delightful and more interesting and better suited for managing steering and supervising agents at scale that's you know what's keeping us very busy right now yeah again without you know going too deep into the roadmap you know I think one interesting provocation to think about is as the complexity of using agents or in a multi-agent world you know becomes incredibly complex how do you stay on top of that how do you you know keep track of what different agents are doing what actions you know they're taking and you know whether you need to give any even a permissions and the like along the way there's any artifacts that they've created whether that's code or else where you know like keeping a track of that and staying on top of it I think for me as a designer on the team are really interesting interaction design problem right like you know to say we're moving from this world where you you watch a rollout of like a minute to as I say this 10 hour job like how do you stay on top of that how do you keep it delightful how do you you know meet users where they are and so you're not kind of context switching all the time between all these different things that's yeah as well as the kind of you know core primitives from my turn perspective that kind of problems that we've found on the product side one other thing you talked about there was you all of the non-technical use cases and I think one of the most amazing things I've seen with as coding agents and lms just as coding assistants have grown is the extent to which now subject matter experts are able to build at least their own prototypes and often their own applications to help them in their workflows are there any aspects from either a product or technology standpoint that you're thinking about particularly for those non-technical users looking forward yeah we're thinking about it especially since it's been sort of like a natural thing that's been happening where we see like increasingly amounts of like non-technical people like inside opening eye outside of opening eye use codecs in their terminal and you know get it to do like cool things for now and then it definitely got us thinking about like you know how we can do this better and also there's this sort of like pull for generality as ultimately the very best coding agent you know is a general agent that's able to reason across like much more than just code the coding station and the models are like extremely good at instruction following like you know people find us very useful like for data analysis for editing spreadsheets for doing like market like research and these things and it's definitely something that we want to lean into and you know cater to at some point at the same time right now we're also laser focused on you know making codecs a very best tool for you know professional software engineering and you know there's a stention of uh you know hey it's like you know if you want to be really good at this you know should we also like you know think like a lot about like these other things but ultimately like we see it combined very well and so you know it's it also got us thinking is very satisfying to see codecs use for more and more things like in addition to just being like an extremely good tool for coding yeah and from a product perspective right there are some things we're building an agent for example a coding agent so in on the web right after seven environment there are some things that you just can't get around which are pretty technical and you know if you're a software developer you know you need to kind of go to these questions but I think from like a core product experience perspective there's also more that we can do and this is what I'm focused on which is just what's that first experience is it delightful is it simple can you as a non-programmer kind of you know rock up and just get involved and how can that be an on-ramp for you to learn more about coding and to get deeper into it yourself so like this is something in the design team for example we had this off site and we had a few other people on the team who code kind of going around and basically onboarding everyone into codex into the satellite product into the extension depending you know where they where they worked and you know to be honest like for some of the non-codex interbodating to like get into the terminal yeah for like you know they were installing mpm and these things that might be a little new for them but once they got onboarded and you know I think once they kind of you know saw what the work that the model was doing and started to learn a bit about it you know some of the people who just took their toes now I see them more and more coding so I think it's also like a really cool opportunity to kind of you know expand the aperture of you know what is a software developer and create a really great on-ramp for people to learn more and go and you know dig deep deep into themselves. Awesome well we are getting close to the end of our time is there anything we have not talked about yet today that you think would be important to leave folks with. One thing we haven't talked about is like the mindset that's important to continue to adopt and I feel like it's an amazing time to have problems as like solving them you know has never been easier and then there's also this aspect of like hey this really helps with answering questions and there's like this curiosity that gets like super rewarded right now and definitely like being able to like try and get interested in like changing your approach of like know how you're going about your day and you know thinking about solving the problems that you have like maybe you had like useful ways of doing that you know that were effective like two years ago and you've stuck to them I definitely feel like it's the right time to like question everything and try new things first of all I find it super exciting and you know having like always like many many ideas and unsolved problems is like you know finding that the amount of problems that are unsolved like it's like reduces with time it's just like I hope like you know someday like you know agents will be able to also create a little come up with like super interesting problems that I should be thinking about because like we're not there yet but you know what a time you know to just try these things and you know get like a ton of like new things done yeah plus one I think like you know also what a time to be a creative as a designer in the team and you know when I'm speaking to you to young designers or occasionally teach your kind of mentor with my two young folks the main thing I say is just kind of just like get involved and give things to try because it's you've never been there's never been a time where kind of curiosity has been better rewarded by really just getting your hands dirty pushing yourself out your comfort zone and very quickly realizing that you know you're able to achieve way more than you might have thought for just by you know even on a week by week basis you look over the past few weeks with all of these model releases you know it's just crazy the acceleration that's happening so yeah just kind of you know stay curious and get involved yeah it's not long ago where in six months ago where you know you would show you know static figmas or slides like you know and just be like hey you know this is an idea of mine and then now it's like fully functional you know little products they're like whoa you know this is better than what we have shifted production you know it's like we better get this out soon and it was that step change and like what you're able to achieve like solo as a designer it's like I don't know if like even referring to you as a designer you know does it justice anymore there's this blurring of like you know it's quite delightful yeah there's never been a better time I think to be a software engineer or a designer

Share this episode

Similar Episodes

I'm ok

Mar 26, 2026 ·1m

REMIX: Why we over-shop and compulsively acquire, and how to stop, with Dr Jan Eppingstall

Jan 9, 2026 ·61m

REMIX: OCD and hoarding disorder with Jenna Overbaugh

Jan 2, 2026 ·47m

REMIX: Therapy and hoarding disorder - what are the options? With Dr Jan Eppingstall

Dec 26, 2025 ·78m

REMIX: ADHD and hoarding disorder with Professor Sharon Morein

Dec 21, 2025 ·46m

#207 13 actionable pieces of mental health advice from six former podcast guests

Dec 12, 2025 ·53m

Similar Podcasts

Ask A Spaceman Archives - 365 Days of Astronomy Ask A Spaceman Archives - 365 Days of Astronomy Podcasting Astronomy Every Day of the Year That Hoarder: Overcome Compulsive Hoarding That Hoarder Hoarding disorder is stigmatised and people who hoard feel vast amounts of shame. This podcast began life as an audio diary, an anonymous outlet for somebody with this weird condition. That Hoarder speaks about her experiences living with compulsive hoarding, she interviews therapists, academics, researchers, children of hoarders, professional organisers and influencers, and she shares insight and tips for others with the problem. Listened to by people who hoard as well as those who love them and those who work with them, Overcome Compulsive Hoarding with That Hoarder aims to shatter the stigma, share the truth and speak openly and honestly to improve lives. The Small Business Startup School – Business Notes | Financial Literacy | Retail Psychology – For Professionals & Entrepreneurs The Small Business Startup School Inc. Starting or buying a small business? While personal circumstances may vary, business patterns remain timeless. On The Small Business Startup School, we explore strategies, insights, and practical solutions to help entrepreneurs confidently navigate their journey.Hosted by Ola Williams—a retail entrepreneur, fintech founder, and financial coach with over two decades of experience—this podcast marries financial awareness and retail psychology with optimism to deliver actionable takeaways.Join us to learn, grow, and connect as we uncover the keys to business success.Let’s continue to learn together and be encouraged to keep on connecting! DIOSA. Carolina Sanper This podcast is a sacred space created by Carolina Sanper where you connect with your inner wisdom and embody your magnetic feminine power.It is the realization that the mystical realm is where you plant the seeds of your desired reality.It is a portal to your true essence: awareness, presence, and receiving with ease. Welcome home, DIOSA. 🖤

Frequently Asked Questions

How long is this episode of Podcast Archives - Software Engineering Daily?

This episode is 50 minutes long.

When was this Podcast Archives - Software Engineering Daily episode published?

This episode was published on January 29, 2026.

What is this episode about?

AI coding agents are rapidly reshaping how software is built, reviewed, and maintained. As large language model capabilities continue to increase, the bottleneck in software development is shifting away from code generation toward planning, review,...

Can I download this Podcast Archives - Software Engineering Daily episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.