Reinventing the Python Notebook with Akshay Agrawal

What this episode covers

Interactive notebooks were popularized by the Jupyter project and have since become a core tool for data science, research, and data exploration. However, traditional, imperative notebooks often break down as projects grow more complex. Hidden state, non-reproducible execution, poor version control ergonomics, and difficulty reusing notebook code in real software systems make it hard to

of MATCHES

TRANSCRIPT · AUTO-GENERATED

Interactive notebooks were popularized by the Jupyter project and have since become a core tool for data science, research, and data exploration. However, traditional imperative notebooks often break down as projects grow more complex. Hidden state, non-reproducible execution, poor version control ergonomics, and difficulty reusing notebook code in real software systems make it hard to move from exploration to production. At the same time, sharing results often requires collaborators to recreate entire environments, limiting interactivity and slowing feedback.

Marimo is an open-source, next-generation Python notebook designed to address these problems directly. Akshay Agrawal is the creator of Marimo, and he previously worked at Google Brain. He joins the show with Kevin Ball to discuss the limitations of traditional notebooks, the design of reactive notebooks in Python, how Marimo bridges research and production, and where notebooks fit in an increasingly agentic AI-assisted development world. Kevin Ball, or KBall, is the vice president of engineering at Mento and an independent coach for engineers and engineering leaders.

He co-founded and served as CTO for two companies, founded the San Diego JavaScript Meetup, and organizes the AI in Action Discussion Group through Latent Space. Check out the show notes to follow KBall on Twitter or LinkedIn, or visit his website, kball.llc. Akshay, welcome to the show. Thanks, Kevin.

It's great to be here. Yeah, I'm excited to get to talk to you. Let's start out with a little bit about you. So can you give a quick background of who you are and how you got to our topic today, where you got to with Marimo?

Sure, happy to. So my name's Akshay. I've got a background in computer systems, but also machine learning research. I spent a little bit of time at Google Brain.

This was a while ago, back when Google Brain existed, before it was taken over by DeepMind. So this was 2017 to 18, I worked on the TensorFlow team. And then after that, I went back to Stanford to do a PhD in machine learning research. And through that experience and my time at Google, I realized what I really enjoy doing is building open-source developer tools for people who work with data.

So after my PhD in 2022, I started working on Marimo. And Marimo is, it feels like a next-generation open-source Python notebook, and it is a notebook. But as we'll talk about in this episode, it's different than traditional notebooks in many ways. It's reproducible.

It's stored as pure Python, so you can version it with the Git. XG has a Python script. And share it as an interactive web app, too, if you want to. So let's maybe look into that, right?

I feel like notebooks have been a part of the Python ecosystem for a while. And when they came out, it was this big breakthrough of, like, oh, my gosh, I can do this kind of interactive exploration of data and share it and do all these different pieces. What's wrong with the state of notebooks? Why Marimo?

Yeah, it's a great question. And traditional notebooks, like Jupyter notebooks and things like them, I think have been extremely useful in, like, research and education. And I used Jupyter a lot during my own PhD. So my thesis was on vector and fittings.

And I would make a lot of low-dimensional plots of high-dimensional data after doing some dimensionality reduction. So lots of scatter charts and stuff like that. And it was really, really useful to do that in, like, a Jupyter notebook because I needed to run code and then see what the results of my algorithm were. And, like, there was, like, a back-and-forth between you as the algorithm developer and your data, right?

And so that's great. There's nothing wrong with that. The issues that I ran into and that others have run into with, like, these traditional notebooks are a few. There's a few issues.

And I think it comes down to, like, maybe two or three. So one, like, the one that sort of really trips up many people, myself included, is this idea of hidden state. So in a Jupyter notebook, I'm using that shorthand, but just, like, the default experience of a Jupyter notebook using the IPython kernel. It's an imperative paradigm, right?

Like, you run a cell, it mutates memory. And then you run another cell, and then it mutates memory. And what you're really doing, though, is oftentimes once you get past the just exploratory phase, you know, you're kind of writing a program, like, even in a notebook, right? But because it's imperative and because Jupyter doesn't know how your cells are related, you may one run cell, then forget to run other cells that depended on that cell.

And all of a sudden, the code on your page doesn't match the variables in memory. And this can lead to, like, a lot. At least in myself, like, you know, I would do things, like, I would delete a cell, and then I would delete some variable that I forgot was defined in that cell, and it was still in memory, and, like, other code would refer to that variable. And then four hours later, I would realize, like, I just had, like, a bunch of inconsistent state, and I would restart my notebook, restart my analysis.

And so this variety of hidden state is, like, one thing that I think is really challenging that I wanted to solve with Marumo. And then other things include, like, the ergonomics of using notebooks, like, as part of, like, modern software projects. The default file format for Jupyter notebooks is great for communication, because it, like, stores not just the code, but also your plots and things like that in a single JSON file. But it makes it difficult for any kind of software engineering-like task, right, because you don't want a version of basics before data with Git.

Like, it's just not going to work, right? And then also, you might want to reuse some code you wrote in a notebook in, like, another notebook or a Python module, right? But you can't really. Those aren't Python files.

So then you have the issue where, like, not being guilty of this, you duplicate your Jupyter notebook, like, 40 times, and then it's just a total mess. So that was another thing I learned to solve. And then finally, the third thing has to do with, like, shareability and, like, sort of interactivity. So on the one hand, Jupyter notebooks are really great because you have plots, you have code, you have visuals.

This is a document you can share out. Like, if you share out some analysis or some, like, research investigation with a collaborator, they need to also have Python installed and Jupyter installed in order to, like, interrogate your results. Like, they might want to say, like, if I change some parameters, how might this plot change? And that feels, like, kind of unfortunate.

I mean, as a matter of practice, like, my PhD advisor couldn't do that, right? Anyone have questions, and I would have to go back and change things. And so with Marimo, we wanted to make it so that any notebook can also really easily double as, like, an interactive web app where, like, you can promote any variable or anything to, like, this UI component, like a slider or something, so that people could, for themselves, just interact with the document and see how results would change. So if I'm hearing you properly, I'm going to play back.

So two of these things sounded like essentially saying traditional notebooks or Jupyter notebooks or whatever were, in a lot of ways, just kind of a fancy REPL. They're a REPL environment that has a UI baked in and is able to be a little bit more approachable in interweave plots and interweave descriptive text and things like that. But they have all the drawbacks of a REPL in the sense that it's not really meant to be something that's packaged and duplicated. It's your exploring, as you said, having a dialogue with your data.

And what I'm hearing is you want to keep that vibe of dialogue with your data, but kind of bring this up into first-world software development. Like, hey, this is reproducible. It's data-driven. It's state.

It, like, keeps track of all the things. It's not me hacking away in REPL somewhere. Yeah, that's exactly right. And, like, I think one of my teammates on my remote, Trevor, we were just talking about this the other day.

And the model of just a REPL, worst of all, as long as it's just you sort of working on that. But as soon as you bring in a second person and, like, or even the second person can be you a day from now. Like, yeah, you kind of want some guarantees, some reproducibility, some, yeah, that's exactly right. All right, so let's maybe talk, then, about how you solve some of these problems.

And I'm particularly interested in kind of understanding what is the execution model that you're adopting that is not just this sort of imperative REPL. Definitely. So in designing Merimo, we took inspiration from other projects as well. And the two biggest ones are these other really cool notebook systems for other languages.

So Pluto.JL for Julia, which in turn is inspired by Observable for the JavaScript ecosystem. And what those two notebooks are and what Merimo is, at the core, they're what are called reactive notebooks. So reactivity is sort of the alternative to this imperative style of notebooks. And so what this means is, say in a Merimo notebook, if you run a cell, say that cell defines some variable X.

When you run that cell, Merimo then, by default, will automatically run all other cells that read the variable X. And so it reacts to your code execution of one cell to keep the outputs of the rest of the notebook in sync with the action you just took. The way that this works is that there's no runtime tracing involved. And instead, Merimo basically just statically reads the code of every cell that you have and determines where the variables it defines and where the variables it references.

And from there, it just builds this dependency graph. And it's kind of like Excel in some sense, right? It's not magic. And we have, like, configuration.

And so, like, automatic execution may not be even desirable for all notebooks, especially if, like, some downstream cells are going to take a long time to run. So you can make the executor, like, call it lazy so that you run a cell and then Merimo will just mark the effect cell to scale, but it won't run them automatically. But it'll give you one button that you can push to bring all your code and outputs back into sync. So that's the core thing.

And so that can minimize hidden state. And also, as you play with it, you also find that it actually enables you to do data exploration a lot faster because, like, you just change the variable value, hit enter, and you see everything change, and et cetera. Well, this is a paradigm that I think web user interface frameworks have very much gone towards. It's this kind of reactive model.

I think Vue did it first, and you see Svelte and React, and all these folks, like, taking this very data-driven reactive model. And it does allow very fast and easy keeping things in state. So looking at that dependency graph, then, how much overhead does that end up creating? Or, like, are there any things in Python that are hard to statically analyze?

But I'm not super deep on Python in particular. I know in some languages, it can actually be hard to trace the dependencies. So on the question of overhead, it's, like, totally negligible. Python has a built-in AST module that you can use to do static and semantic analysis of code.

And, like, most heavily used libraries in Python, that's implemented in C, so it's really fast. So the overhead is negligible, and especially because it's static analysis, so it's, like, we parse and analyze once, and then every other execution, there's no overhead. So, yeah, you don't notice that. And in terms of what is hard, like, I guess the scope, like, is the static analysis itself, is the semantic analysis that difficult?

And I think what we chose to do is to, like, there's two approaches you could take to making all these two approaches, to making a reactive notebook in Python, right? What we did is, our data flow graph is based only on variable definitions and references, right? So, like I mentioned, the cell defines a variable X, you run that cell, we run all our cells that read the variable X. That's easy to implement faithfully for the Python language.

You could take another approach, which would not be based on definition and references, but just on memory access. And, for example, you could say, if my cell mutates or touches some variable, this cell appends to some list, I'm going to run all our cells that read that list. That's extremely hard to do reliably, and so we explicitly don't try to do that at all. Because you can't, it's impossible to do with static analysis in Python, so then you're going to have to start tracing user code, and you're invariably going to miss, like, it's impossible to implement that 100% correctly, and then you get into an uncanny value when the user runs a cell and they won't know what else is going to run.

So, we're just really upfront, like, hey, we only track variables, definitions, and references. So, if you mutate things, go for it, but be aware that we are not, that's an escape hatch, right? Exactly. If you're mutating things, make sure you do an assignment afterwards.

Exactly. That's exactly right. And, you know, the benefit of this is that, like, the rule set is exceedingly clear to the user, right? Like, they can understand it.

It's, like, a sentence long. And also, it encourages the users to write, like, functional code, right? Which, especially for, like, data-driven stuff, machine learning, it's kind of what you want to do anyway. So, it's good.

Like, I think, like, one of our users described it as gentle parenting or something. We know data, data scientists, machine learning engineers, et cetera, right? Just generally good code. There's a lot of value in that.

Actually, I was, a slight aside, but I was hearing somebody describe it. They said, if LLMs generate no other value, they have dramatically upgraded the quality of code coming out of graduate schools. That's fair, actually. Yeah.

I like it. Okay. So, let's keep going down this road. So, now you have, instead of immutable rubble, you have a well-defined dependency graph.

And you mentioned the next thing was around sort of reproducibility and not checking in these massive binaries or things like that. So, how does Remo handle treating this stuff as code? Yeah. So, that's a great question.

So, the data flow graph gives you some amount of reproducibility insofar as, like, you can't, like, run a cell and then forget to run some other cell. Like, Remo will just run it for you or market a cell and, like, really loud if you've done something sort of. I guess, yeah, it just won't let you step out of its reactive execution model. So, that handles, like, reproducibility and execution in some sense.

And then, I'll get to the file format. I guess it is related, actually. But, Remo does have sort of an optional built-in package management system that's, like, powered by the UV package manager. So, if you opt into it, when you import a package, Remo will detect that import, a module, resolve it to a package, prompt you if you want to install it.

And if you do, it'll add it in a comment block at the top of the Python file. Python has a standard called PEP723 for this. So, basically, we'll document all the packages your notebook has used. And then, the next time you run that notebook, we'll use the UV package manager to create an isolated virtual environment, install just those packages, and then you're off to the races in this sort of reproducible package environment.

So, we handle that as well. But the reason that was an easy feature for us to add is that we actually decided to store our notebooks as Python files instead of as these JSON files that sort of Jupyter has historically used. And the way that this works is each cell is represented as, like, a function. There's, like, some decorator to, like, demarcate.

Like, this is, like, a cell that's going to be going into the notebook. And you can think of each cell as a function mapping, the variable references it uses to the definitions it creates. And at the bottom of the notebook, there's a Python. You can do an if name equals main guard, which is, like, when you run the script, that's what's going to run.

And so, there's an if name equals main guard that will then say app. Like, it will run the Marima notebook in the cells in a topologically sorted order. So, basically, all that to say, you can go to the command line and say python my notebook.py, and it'll run it as a script. You can even parameterize the CLI args.

You're already, though, getting to a place where now this stuff is pluggable, because it's just Python code. You can import the functions to wherever. You can do what have you. So, before we dive down that road, which I am interested to go down the implications there, what, if anything, is lost by storing it as Python rather than this sort of proprietary format?

Yeah, so, there's definitely something lost. So, the main thing that's lost is, by default, when you're using Jupyter, not only as your code, but, like, your plots, for example, like, are stored as basically foreign coded data in the file. So, that you can just, like, put it on GitHub, and, like, you can immediately see a record of your analysis. So, I actually think the ipython notebook file is a really valuable artifact.

I just don't think it should be the artifact that you're developing is centered around. And so, what we do to sort of bridge the gap is that there's a configuration setting that you can turn on, which will basically automatically snapshot your notebook as an ipython notebook alongside the Python file. So, if there's a little underscore underscore marine bi directory, and you'll say, like, my notebook that ipynd in there alongside your Python file so that you can try and get the best of both worlds. There's other things that we ended up having to sort of implement our own versions of because, for example, one thing that's nice about a Jupyter notebook file format or, like, just something that stores the outputs is that when you load up the notebook, like, you can see the previous runs execution without running the whole thing, if that makes sense, right?

Whereas, we start from just a Python file, and that file doesn't have those outputs, so we implemented our own sort of, like, session cache, which is stored in some sort of directory that is hidden from the user. And so, to replicate some of these nice features that Python format did provide. That makes sense. Well, and since you have the dependency, you have all the variables already, like, labeled.

You know what you need to save. Exactly. Yeah. In mobile application security, good enough is a risk.

GuardSquare uses advanced, multilayered code-hardening techniques and automated runtime application self-protection and mobile application security testing, combined with real-time threat monitoring to deliver the highest level of mobile app security. Discover how GuardSquare brings all these together to provide mobile app security for your Android and iOS apps without compromise at www.guardsquare.com. Why is there always a meeting bot in your Zoom call? Blame Recall.ai.

Recall.ai powers the meeting bots and desktop recording apps behind products like Cluely, HubSpot, and ClickUp. They handle the hard infrastructure work, capturing clean recordings, transcripts, and metadata across Zoom, Google Meet, Microsoft Teams, in-person meetings, and more, so developers don't have to build it themselves. If you're building a meeting note-taker or anything involving conversation data, Recall.ai is the API for meeting recording. Get started today with $100 in free credits at Recall.ai slash software.

You know Fidelity is a financial services leader, but did you know that inside Fidelity is a community of technologists working together to shape the future of finance and tech? Fidelity is always investing in tomorrow, from emerging tech to cutting-edge tools that will transform what comes next. Their technologists are encouraged to keep learning so they can expand their skill sets, explore new ground, and stay ahead of this rapidly evolving industry. And right now, Fidelity is hiring technologists to join their team.

Fidelity technologists get the best of both worlds, startup energy that's grounded in the stability of a financial institution. That means support, resources, and amazing benefits. Bring your skills to a culture where you're empowered to dream big and build a tech that drives an organization and makes a real impact on people's lives. Find out more at tech.fidelitycareers.com.

That's tech.fidelitycareers.com. fidelity is an equal opportunity employer okay so i want to come back to the ui which is because that was the thing you talked about as well i think that's interesting but this has like brought me into this question or discussion topic around how notebooks fit into the broader software development life cycle because i think one of the things i have seen in places where ipython notebooks were tended to be used before is they were heavily used by for example data scientists or data science team they were used for data exploration and then if you wanted to then take something that was there and package it for reuse or embed it in a product or whatever it was like a whole effort porting new code all these different pieces but to me it sounds like this marimo file is literally just python once you have something that works you could use it yeah yeah that's correct so you can and we do and many of our own sort of internal utilities that we write for our just like internal tools they happen to be in marimo notebooks that are reusable as python files you can even say like from my notebook import my function from my notebook import my class like that syntax kind of just works there's like some details the function needs to be pure so that serializes correctly which by the way ends up you end up writing better code right so this is and so yeah you totally can't so it really does blur the boundaries of what you can use a notebook for which i think is really exciting because like we see like all kinds of use cases from the traditional data science and research use cases to like back-end engineers like emailing us telling us like yeah we're doing our data pipelines with marimo notebooks just because we can and so i want to hear more about that because yeah my experience has all been notebooks it sort of often research ml data science communities and then that's its own thing so how are you seeing the integration happening when do you choose if you have one of these back-end engineers when are they using a notebook why would they choose to do that over something else and like what is the process there yeah so i think there's at least two reasons one that we'll touch on with like the interactive components that you alluded to earlier but even without that so i think often when making like simple data pipelines like you know not super complicated ones but simple ones it can be helpful to prototype a data pipeline in a notebook because similar to like what i was mentioning about having a back and forth with your data right like you write some code you see if the data is put into the shape you want it to be in sometimes it's easier to do a visual inspection so yeah because easier to do a visual inspection it can be nice to do in a notebook a notebook's also a good choice because with data pipelines the job runs and it can be nice to just have a report alongside it to see like what was the shape of the data that day etc right so it's nice to prototype as a notebook and now with marimo not always nice to prototype as a notebook it's also really easy to just run it as a cron job or as a script like you don't need to like reach for sort of other tools that orchestrate ipy and b files you just say python my job dot py whatever right it's just a python script so that's one area where we do see sort of natural usage so this already is getting me to a place that i'm curious right so often if i'm doing a big data analysis job i will want to do something interactive on a subset of my data and then i'm going to run something async when i do the full data because it's going to be big slow expensive what have you but maybe i want that same visualization right i want that report right in there so like is there an easy way within marimo and maybe i'm just missing something obvious to be like plug okay right now we're using this local subset of data for this one you're going to remote call fetch from here you're going to do what have you can you plug into those like i want to run my big data off in the cloud somewhere async fast or slow but get it back into my notebook yeah you totally can so it's not necessarily productized but we have a number of primitives and so one thing you can use so marimo is a notebook but it's also a library that you typically only use in the marimo notebook so you can import marimo as mo into your marimo notebook and you get some primitives one of those primitives is am i running inside in an interactive session or am i running as a script and so you can just use mo.running in notebook to parameterize where the data is being fetched from and i think that would be what you're asking essentially or you could use that to do what you're asking i'm also wondering about yeah can i well maybe this comes back to the publishing side of publishing things as well so maybe we'll come back to this but i'm thinking like okay i've run this on my temporary thing i still want my interactive view even though i'm running this asynchronously can i run it async and still publish the web version of marimo so i can see the report at the end or something like that oh yes yeah you can do that too that is from the cli we have like a marimo export and then choose your file format choice such as html so marimax for html that will run it as a script but also generating the html report at the end got it oh that's super cool right so what i'm hearing then just thinking about life cycles here right it's like okay i'm tinkering with it i'm exploring it i'm running it locally as a notebook subset of data i have this flag i say okay i think this is ready put in my flag saying when you run a script run against this source instead of that source and then i run it generating an html report that looks the same as my notebook lets me go and look at it yeah yeah yeah that's great that's really cool digging into that shareable web side of it so what is interactive what does that web generation look like can i still go and tinker with cells or change things like what is the output starting to look like there yeah so interactivity marimo starts let's say in an interactive edit session so you're working on your notebook you're in browser or vs code wherever you're using your notebook we do have vs code extension as well so i guess like just taking one step back repels are like everyone thinks of a repel as interactive right because it is right like you run a cell and then you see what happens you run something else in like notebooks traditionally like when you want to see what happens when you change the value of some variable you have x equals 5 then you hit backspace and you change the value of x to 6 then you hit like shift enter and then you shift enter a bunch more times right then you see what the new thing looks like and then you go back to there and then you hit backspace x equals 7 and then you do it again and again and it's a great and that's the kind of thing like any normal person like that there should be some ui element right to control the value of that variable so in marimo you can import marimo as mo into your notebook and then the mo.ui module gives you access to a bunch of different ui elements ranging from the very simple to like sliders and text inputs and and drop downs to sort of more complicated ones like interactive scatter charts like selectable scatter charts and things like this and the way that works in marimo is that you can assign a ui element to a variable and so like x equals mo.ui.slider then when you output that variable in the notebook so if you make x the last expression of a cell marimo will display the slider then if you scrub the slider then what marimo will then do is then automatically run all other cells that refer to the variable x so it hooks into the reactive execution system and then every ui element has a value attribute that gives you the value that was assigned or like associated with it on the front end gives it back to you in python and so like just like that with no callbacks required now you have like user interface and interact to it in your notebook and you can use that to say speed up data exploration but as you can imagine you can also use that to make like really simple interactive data apps or web apps whatever you want to call them or like any kind of internal tool and so that's another big use case of marimo for different folks different sort of i guess pathways some people will just use the ui elements to speed up exploration but others will just like you know i was just on the phone with like a i guess i have to speak about them anonymously right now they're a big marimo user they're like a very well-known sports team and they're using like marimo for a bunch of like analytics and stuff and they make a bunch of marimo apps that like you type in a player's name and you see like a big table of like a bunch of stats by the player etc and it's just like fancy interactive web apps that they deploy on an internal site for the rest of their team to use and the way that works in marimo is that any notebook from our cli you can type it the cli marimorunmynotebook.py and it'll serve it as a read-only web app code cells are hidden and then now some non-technical stakeholder can like interact with your data and so you were mentioning back-end engineers and like surprisingly to me like there's we talked to this one company we have a case study published in our blog about them the name is taxwire they're like yeah all our back-end engineers we all use marimo on a weekly basis because we make all these internal web apps about this tax software we're building and its use cases and we embed it inside our internal next js app and it just makes our lives easier i'm like okay cool that's awesome and like honestly that's not something i anticipated as a use case when i first started making marimo right after my phd but it was really cool to hear hear people being empowered about it that's awesome so i think what i'm hearing too with this is like you can decide which variables you are exposing the ui versus which are loaded from somewhere versus how you're managing all of that so if you had this example of loading up a whole bunch of data i mean i guess your source example is it's got a whole bunch of data in the back end but let's you can figure which player you're looking at maybe which slice of data you want to analyze based on exactly yeah that's exactly it and i guess the value of the notebook here in this case it may not necessarily be obvious it's just like it's it's like this like progression from like i just have some data here and i don't really know what it is and i'm just kind of playing with it and i'm like oh actually there's something useful here okay now i want to expose this to other people and it's just like you can just like stay in the same tool and just incrementally whereas like if you've never opened a notebook to look at your data in the first place it might be hard to then you know think about okay what does it react i'm gonna write to like empower the rest of my teams like i don't even know what's in the data so i don't even know what to make and so i think that's what the value of the notebook is just making it really easy for you to first get a feel for your data and then from there making it really easy to make like a tool that's good enough for you and your colleagues and yourself yeah that's that is super valuable i mean i have an example where i've just whipped up some internal scripts to analyze things for me but it outputs a text because i think in text if i thought i hadn't used a notebook suddenly i could share it much more easily with different folks another question that i have in this is i think one of the things that's going on in the software development world right now is things are changing incredibly rapidly right we've got lms copilates agentic tools all these different things is there a similar transformation going on in the sort of data and data programming worlds or how are you seeing notebooks fitting in with this kind of new industrialized coding era we're getting into yeah i think there is so much information there's a few different ways so one example so one of our users is like a very large sort of public company and they told us they have hundreds of like marimo apps deployed internally and they said what made it really easy for them to adopt marimo and the reason they adopted it so quickly was because it turns out claw is like really good at writing them because it is a pure python file format it's like all these things you can get around but like well i guess your notebooks aren't interactive by default anyway so you can't make web apps with them but with yeah marimo being a pure python notebook cloud can write it easily you can also run the notebook as a script to check if it's doing the right thing we also have like a linter cli tool called marimo check which will like report in the air so it finds like with the syntax of how the notebook is stored and honestly like i use cloud also to like make really quick internal like things where i would have made a marimo notebook by hand now i have cloud help me make him marimo notebook of it and it does a good job like like especially like compared to like earlier in the project lifecycle marimo is now i think popular enough that it's like in distribution so it works pretty well i guess that is more similar to like how cloud is like just feeding up software development in general right in terms of for data specifically or for ml research specifically i think that people are still figuring out i was just talking to someone on my team and he told me that he's seen a bunch of gc things out early this year saying 2026 is going to be the year that ai and agents revolutionize how we work with data so i don't understand that to me because i haven't seen one of those yet but i guess people are talking about it i just i'm not too familiar i think like data brooks says that 80% of something something databases are created with agents but i think that's like neon or something and so i mean it's it's sexy right now to say okay we're going to this is going to be revolutionized with ai i do feel like software development is the one place i'm actually seeing that play out and so yeah kind of interesting to see how that happens i was just going to say i think the basic things i think many people have been trying like maybe just being quietly integrated in a bunch of products but like text to sequel type of things like schema like it seems natural in some form that will benefit from coach generation sort of agentic workflows but i guess we'll see yeah we'll see if these predictions end up being true on the sort of forward looking prediction side what do you see as sort of the frontier in terms of use of notebooks data access data exploration that sort of world and what what kinds of stuff are you working on internally to address it the frontier and that's a good question the frontier is always hard to find on because what is the frontier so i can talk about what we're working on and see if i'll back out and see if any of that is going to be getting us to the frontier so let's see so some amount of work we have like a good amount of work is like sort of we're close to parity with the jupiter and colab ecosystems but we're not at full parity so we do honestly have a good amount of parity that we're doing like getting marine working seamlessly inside of jupiter hub which is like a multi-user hostess where jupiter deployment that many universities use we also have a pre-hosted notebook that's similar to google colab called a tiny chicas called mo lab mo for marimo nice and so we're working on that quite a bit this year i guess one thing in terms of frontier so we have a speculative project which i don't exactly even know what exactly it is and we're departing to figure out what it is but it's driving marimo headlessly like potentially with agents there was this paper that came out recently that someone my team sent me that i so far only skimmed the headline and intro but it's something to do with like instead of using i think the claim of the paper was that python rebels or repls in general can be very valuable ways to dynamically create context for lms or agents and in that paper i think they use a jupiter kernel as like the thing that the lm has actually and so we actually did some one of our engineers did a lot of work to like sort of modularize marimo towards the end of last year so we're getting closer to a place where like you can use the kernel headlessly without our ui and one project that he's really interested in exploring is like well what if you gave an agent access to that kernel could that somehow like speed up whether it's like data exploration workflows or research workflows we're not exactly sure but it seems like that could just be like a really valuable primitive like a sandbox in some sense for agents to have like so i'll give you guys one example and this is not related to like necessarily headless but like enabling agents to work more effectively with data so like in marimo we have a built-in like ai assistant sort of system and one thing that you can do is when you write prompts like for generating some code you can tag a variable say like a data frame and when you do that we inspect the data frame see its schema get like sample values etc and like dynamically generate context to give you all lm and now you have like code that's specialized to the data at hand which is like more useful than like say if you're not using cloud or cursor and it's just yeah so they won't know what's in the data frame unless you explicitly tell them it's like that's just like one way that you can make i guess empower agents with runtime information and i guess that's one thing that we want to explore more this quarter yeah no it's super interesting to kind of think about that because there's a couple different pieces that stand out to me so one is the fact that you have kind of the dependency graph already mapped becomes quite interesting in terms of showing just the relevant context to the agent right so you might say hey you're changing something down in this one cell but you don't actually care about that instead of forcing the agent to absorb the context of all the different steps you can say like register which step in the graph you're interested in we'll do all the computation here you go here's the output i do wonder i feel like there is some sort of interesting opportunity here in terms of just like showing it the right things at the right time do you have a concept of lints or correctness checking on cells so like you've got this dependency graph you're going through and maybe a step three or four ways down it's like oh this is outside the bounds of what could be valid so something must be broken upstream yeah yeah yeah we do we do we have a checker or a linter that will check your entire program for like semantic correctness as well as like syntactic correctness so there's a couple of rules that marimo enforces to make sure that your graph remains like a dag basically one of which is you can't have cycles across cells which i think is sensible although i recently learned that excel has a feature that you can turn on in settings to enable cyclic calculations and then you choose like the number of iterations you wanted to go until convergence i got a spreadsheet that was all reference i'm like why did you send me the spreadsheet with all reference no no no you have to enable fixed point iterations anyway sorry there's a depression we don't allow that there's a lot of value in keeping this with dag i'll say yeah yeah so marimo has to be a dag in way of force that that's one rule that we check and the other is actually you can't redefine the same variable across multiple cells and the reason is like we actually allow you to reorder cells for like presentation purposes like column view and stuff those are the two main semantic things that we check for in i was also wondering though in terms of like data range validations right so for example you have a dependency of a set of different computations and you might know something about the shape of the data and say like okay this data needs to actually be in this shape if not fly something it'll keep going or what have you oh that's super interesting and we haven't explored that the reason i think about that is like with agentic coding which has been diving down like the more you can programmatically deterministically limit the sort of possibility and then give that feedback to the agent the more it's able to independently iterate yeah that's very interesting there's a lot of things we could play with but it is interesting so you mentioned in marimo you have these things as functions with decorators around it and you're already building in this kind of some amount of static analysis some amount of linting and things like that what hooks do you expose to your end users in order to kind of plug into that so right now to be honest we don't have the biggest extension api surface area in terms of extension points not in the file format but we have standardized on this protocol called any widget for building like third-party interactive widgets the developer of any widget trevor mans actually works at marimo now and so that's one way that you can plug into marimo you can also hook into our display protocol for objects we support the ipython display protocol but we also have some additional hooks and then in terms of the file format itself like you can actually write marimo notebooks by hand in vim or whatever text choice but so you can do that it's not an extension point but it is designed well so that like you actually still the file format will guarantee you still get like code completion and tell us all these things but we haven't opened up an actual like put another way jupiter is famously i think well designed in terms of the internal protocol i guess internal protocol they have a wire protocol they've got a bunch of things that like their developers can look into we haven't done that yet just because it was too early for us to i think things are moving really quickly it's starting to change we now have our own still internal but semi-public apis for ourselves because we're not consuming marimo in many different ways and i think eventually over time this will evolve into like some of these will be opened up to the public but right now it's i guess in terms of the dag it's like our file format specification is public and so people can target that with cogeneration tools but that's about the extent that makes sense cool let's look a little bit actually at any widget you mentioned that as one of the places that you plug in it's an open source you hired the developer or whatever the sequence i'm a big fan of that type of thing what is it how does it interact with the reactive data model that you have and are there any constraints or things to know going into it yeah so i'm going to caveat this by saying that trevor is a far better spokesman for any widget than i am but i will try to channel and he was just on i think top python with me and had a great hour long conversation about any widget but it is both a spec and tool set for making like reusable widgets i think really focused on for using them in interactive mobile environments and so what any widget was born from my understanding is from talking trevor is so trevor also has a phd and serving like the biocomputation space was originally his focus and he was found himself having to make widgets for like all kinds of sort of domain specific tasks in the jupyter ecosystem and they were kind of really difficult to build and maintain and test like web programming has advanced a lot in recent years and like ipython widgets had not kept up and so i think sort of out of some of those difficulties and frustration like trevor sort of built any widget to like make it a lot easier to implement and maintain these widgets and also to make it a lot easier for like different front ends to consume them like before any widget if you made some kind of domain specific widget that worked in jupyter lab then you would have to go and customize it to work in collab and then also make sure it worked in the vs code extension and now by like making the spec like you can make an any widget just make it once and because people have agreed to support it it can't work anywhere and so that's been really valuable it was really valuable for us like my co-founder miles discovering it pretty early in this life cycle and he's like oh we should use this i'm like i never heard of this it was my original it was really good it was a really good bet because i think it has emerged as the standard for interactive notebooks and in terms of hooking into a reactivity model it's actually quite nice so basically any any widget you can wrap it in a mo.ui.any widget wrapper and then it basically binds it to our reactivity model and makes it into just like any other ui element that's first party in marimo but yeah so it hooks into the data photograph in the same way and i think that the value of it is i mean people make all kinds of really really cool widgets and like we just like there's no way that like us as like a team of seven would be able to satisfy everyone but like which is that were originally developed for jupyter there's a scatterplot widget called jupyter scatter which lets you see like 10 million points on the scatterplot really efficiently and like zoom in zoom out etc that now works in marimo today and it's also reactive which is sort of like gives it superpowers that you might not have had in traditional networking environment got it so yeah it looks to me like it's essentially a vanilla javascript spec that if you meet that then you can wrap it up and it'll just plug it you have a wrapper jupy has a wrapper other folks who support this have a wrapper and it'll just kind of work anywhere yeah yeah and we're gonna be focusing a lot on any widget this quarter as well but one of our employees vincent he runs a youtube channel and does a bunch of things he never sees this to amaze me like how far he can get with viticoding these like really cool any widgets like i don't know he he like invited us like robotic simulations like this humanoid person like i don't know it was really cool so like it really allows you to like you know expand your imagination and get creative nice well we're getting kind of close to the end of our time here is there anything we haven't talked about yet that we should talk about before we wrap i think we've covered all the basics so i guess we didn't really talk about marino's origins and then a little bit about origins and a little bit about where we're going like at least in the next few months so i started marino after my phd like i mentioned i had both appreciation for notebooks and frustration with him and i actually originally got funding from a national lab at stanford it's a lab called slack which is a particle accelerator a lab and there are a bunch of scientists who use python and like had basically the same gripes as i did and so they're really excited to sort of partner with us to bring like a new open source programming environment into the world so we have our roots in academia in that sense and this quarter one thing that we're really interested in doing is like engaging a lot more with universities to help them try out marino for education help support them like incorporate it into their classes because i really do feel like that a combination of reactivity and interactivity can really just make concepts just a lot more intuitive like it's just somehow like if you learn like a numerical algorithm it's just so much easier to just change a parameter and see what happens as opposed to just like extrapolating through it and in fact marino's main inspiration fruto for the julio language it was originally designed exclusively for education it still actually is you know advertised in that way it was at mit where it was designed for a computational thinking class and i don't know it's just something i care a lot about and something that we really want to support so to the extent anyone in your audience is in the intersection of software engineering and education and you find marino interesting like please try it out or better yet like reach out reach out to me and my team we'd be happy to chat with you guys and support you all right let's call that a wrap

Share this episode

Similar Episodes

I'm ok

Mar 26, 2026 ·1m

REMIX: Why we over-shop and compulsively acquire, and how to stop, with Dr Jan Eppingstall

Jan 9, 2026 ·61m

REMIX: OCD and hoarding disorder with Jenna Overbaugh

Jan 2, 2026 ·47m

REMIX: Therapy and hoarding disorder - what are the options? With Dr Jan Eppingstall

Dec 26, 2025 ·78m

REMIX: ADHD and hoarding disorder with Professor Sharon Morein

Dec 21, 2025 ·46m

#207 13 actionable pieces of mental health advice from six former podcast guests

Dec 12, 2025 ·53m

Similar Podcasts

Ask A Spaceman Archives - 365 Days of Astronomy Ask A Spaceman Archives - 365 Days of Astronomy Podcasting Astronomy Every Day of the Year That Hoarder: Overcome Compulsive Hoarding That Hoarder Hoarding disorder is stigmatised and people who hoard feel vast amounts of shame. This podcast began life as an audio diary, an anonymous outlet for somebody with this weird condition. That Hoarder speaks about her experiences living with compulsive hoarding, she interviews therapists, academics, researchers, children of hoarders, professional organisers and influencers, and she shares insight and tips for others with the problem. Listened to by people who hoard as well as those who love them and those who work with them, Overcome Compulsive Hoarding with That Hoarder aims to shatter the stigma, share the truth and speak openly and honestly to improve lives. The Small Business Startup School – Business Notes | Financial Literacy | Retail Psychology – For Professionals & Entrepreneurs The Small Business Startup School Inc. Starting or buying a small business? While personal circumstances may vary, business patterns remain timeless. On The Small Business Startup School, we explore strategies, insights, and practical solutions to help entrepreneurs confidently navigate their journey.Hosted by Ola Williams—a retail entrepreneur, fintech founder, and financial coach with over two decades of experience—this podcast marries financial awareness and retail psychology with optimism to deliver actionable takeaways.Join us to learn, grow, and connect as we uncover the keys to business success.Let’s continue to learn together and be encouraged to keep on connecting! DIOSA. Carolina Sanper This podcast is a sacred space created by Carolina Sanper where you connect with your inner wisdom and embody your magnetic feminine power.It is the realization that the mystical realm is where you plant the seeds of your desired reality.It is a portal to your true essence: awareness, presence, and receiving with ease. Welcome home, DIOSA. 🖤

Frequently Asked Questions

How long is this episode of Podcast Archives - Software Engineering Daily?

This episode is 46 minutes long.

When was this Podcast Archives - Software Engineering Daily episode published?

This episode was published on March 10, 2026.

What is this episode about?

Interactive notebooks were popularized by the Jupyter project and have since become a core tool for data science, research, and data exploration. However, traditional, imperative notebooks often break down as projects grow more complex. Hidden...

Can I download this Podcast Archives - Software Engineering Daily episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.