I hear hope you're hungry. Cool, I'm starving. Wash those hands, pull up a chair, and secure that feed bag. Because it's time to listen to Scott, the Lindsey, and West Boss attempt to use human language to converse with and pick the brains of other developers.
I thought there was gonna be food. So buckle up and grab that old f***ing handle, because this ride is going to get wild. Oh, this is the syntax upper club. Welcome to syntax.
I'm this Friday's upper club, but we're gonna be talking to Yuggiz, nesically, at all about Node.js, performance, all kinds of stuff. And Yuggiz has been doing a ton of really amazing things. He's been a voting member of the Node.js Technical Steering Committee, a voting member of the OpenJS Foundation across Project Council, the founder of the Node.js performance team, the author and maintainer of Fast Query String, 150 plus commits on Node.js, and it's the reason why we have .env in Node.js. So welcome to the show.
How's it going? What's up? How you doing? Good, good.
Yeah, thank you for inviting me and hosting me today. It's good. I was just getting used to my new job at Sentry. So mostly learning new stuff.
How's everything? Yeah, going great over here. And I've been seeing on Twitter your PRs, you're submitting it seems like you hit the ground running over there doing some pretty major PRs right out of the gate. So that's pretty incredible.
The easiest thing that an engineer could do is to replace or refactor or rewrite something. So that's what I'm basically doing, is that replacing all tools with new tools so that you have the performance boost and maybe save some money while doing that stuff. Nice. Nice.
Oh, that's awesome. Yeah, we're gonna talk about, I think kind of like two main things today and then we'll see where that goes is, first I wanna talk about like performance and how you figure out how things are slow and how you figure out how to make things faster. As well as I wanna talk a lot about just Node.js in general. Like you've been part of maybe you can explain to us your involvement in Node.js, but we know you've been, we're a founding member of the PRF team and you're pretty heavily involved as well as we have some pretty exciting announcements of stuff coming to know that you're gonna leak to the audience, right?
Yep. That's right. So like what's your involvement with Node.js? Right now I'm a Node.js technical steering committee member and on top of that around last year, December, a year ago, I founded Node.js performance team with the focus of improving Node.js performance and the crucial parts.
Up until September or August, 2023, I was the performance strategic initiative champion and I resigned to three months ago from that. But yeah, I mostly work on performance and if I had some time and if the tasks are not controversial, I try to add new features, but I mostly work on like performance stuff. Awesome. Let's talk about that real quickly.
It's like you implemented .env support into Node.js and that was like so well received by the community. It was one of those like, yes, thank you. And it feels like Node lately has been saying, you know what? Maybe we should do some of these features that people are doing.
Like I feel like a long time ago it was like, no, like this is part of the community. The tooling, you can install a package to do that type of thing. And recently I feel like Node has switched to saying, you know what? Maybe we should make the X a little bit and bake a few things in like .env.
Has there been a shift in Node for that? Yeah. So in the past, so I started contributing Node. I year and a half ago, maybe around two years ago.
So I didn't see any specific shift from my side. But what I could say is that because Node is an open source project and it's maintained by random people around the globe and because it runs on democracy and like people can reject and either approve or reject any changes, I think the general opinion of Node core collaborators have changed and that's why we're seeing those kind of new features right now. And on top of that, we have attracted more C++ developers in the past couple of months. And as a result of that we have these drastic changes because of the performance gains and ease of implementing those kind of features.
But yeah, .env is one of those features that I've developed and got really positive feedback. So I initially started contributing to Node because I rewrote the URL parser. We had much more impact than .env file, but because it wasn't open and it was not a little bit seen or easily noticeable by engineers, it didn't get that much attention. The URL parser is the new URL and then you get the everything, the port, the path name, the search params, all of that.
And initially I was like, well, isn't that part of the A, that's a browser API, right? That's been implemented in Node? Yes. So that's a Chromium implementation.
And because we don't have Chromium and we have V8, we basically implemented it with the help of Daniel Lemirek, a professor from Canada, and released a library called ADA, which is named after my daughter. So yeah, so every time you call an URL, it actually runs 20,000 lines of C++ code. So it's not as simple as the impact. Wow.
It's extremely huge. Because if you run a Node index that's just and it doesn't even have any files in it, it actually creates new URL five-types. If you run a fetch, it creates five more types. If you import any modules in, yes, it initializes more than I can count.
So your URL is everywhere. And we just don't know it because there isn't anybody like me that tells you that those kinds of small things, they're extremely impactful for performance. And it's mostly like that, why 1,000 counts is the correct explanation of the whole situation. What kind of sick person writes 20,000 lines of C and URL?
What could it possibly be doing in all of those lines? Like, what does it do it? Yeah. You're out.
So there's a group called WATuigi, which is web application bubble. So it's like Safari Chrome, Firefox developers, they come together and they release this URL specification. It basically contains a state machine, but they're really glorified and extremely un-performance state machine. And in order to make that performance, they use the most up-to-date technologies and methodologies in order to speed things up, that's why we have this change.
So I can give you a quick example. If you have HTTPS Google.com, it goes character by character. It relates to the whole string. When it sees a column at HTTPS, HTTPS column, it says that, OK, the protocol state has ended.
So right now, I can have an authority state, a file state, or phone, so forth. So in that scenario, it checks for an ad character, because if you have username, column, password, ads, local host, for example, that username is the username and password is the password. So it checks for the ad characters, but it couldn't find it. So it goes back to the column.
And then it says it sees that there's a two slash character. This means that right now, we are in the host name state. So it goes to the host name state until it sees a column, which is the starting point of port, or slash, which is the starting point of path name, or a question mark, which is the starting point of query, or a hash character, which is the starting point of fragment. It's differentiates and goes through all of those layers.
And each of those states has different encoding and decoding parameters. This means that if you have a space character in query parameters, it's translated into a percentage 20. This is because a lot of people on Google Chrome, they go into the toolbar, and instead of actually writing a URL, they are writing the ad in space, immediately, or syntax space FM. And that needs to be translated into something that is easily encoded.
So all kinds of, I don't think people realize how complicated URLs can be. The URL is the original state machine, or it's the original state manager. You can put so much in it. And I probably, every developer in their life has just said, ah, I'm just going to split it on the question mark, and then split it on the equals, and loop over it, and ah, boom, I've got the query params.
And then you realize, oh, there's seven million edge cases down. Is there like a test suite for all the different possible URLs? And do you know how many tests are in there? So just yesterday, I was invited to the web platform test organization, which is the organization that maintains the tests across all the browsers.
So it has a URL has around maybe 3,000 or 2,000 tests. Data, a URL parser adds more tests on top of that, which influence other libraries as well, including Boost. For example, Boost ported all of our data tests to their code base, and they found six different bugs in their code by just working out it. So it's extremely, it's an extremely complex and underrated part of programming that we all accept as a cost that we always have to pay.
But it's not so. So you mentioned that those kind of things might get overlooked, right? How do you go about finding those types of performance issues or performance areas that you could even begin to start figuring out, you know, untangling the web of what makes it faster? So if you ask this to, like, 200 different engineers, they would have like 90 or 80% of them will say that, I will just run a profiler, and I will just look into the flame graph and see what parts that takes the most time, and I will just try to optimize that.
But I personally, I'm more interested in having much more impact than the flame graph that I'm currently looking at. This means that I want, I like to look at the code, that no one is there to look into that code. And they will accept that because it's working, it's OK. So this is how I started working on a URL, because it was a trivial task.
I also assumed that it was a really easy task as others, but it resulted in writing three or four different implementations. But what is the most fundamental thing that runs on every part of the code base? That might be URL or file system operations for query string parts or those kinds of things that we always have to run. And I try to optimize that.
And yeah, that's what I did. And that's typically done in C++, right? So for anyone not who doesn't know, Node.js is not written in JavaScript. It's written primarily in C++.
And some of the APIs are written in JavaScript. And part of what you do is say, OK, this API is slow because it's been implemented via node libraries in JavaScript. So you can then take a library that's been implemented in JavaScript and just rewrite it to be native, meaning it runs in C++. Yeah, so the thing is most of the non-crucial APIs non-performance-wise, not important, but not crucial APIs are written in JavaScript.
And most of the performance-wise, they require tasks are written in C++. Basically, and whenever you call a C++ function from JavaScript, there's this cost that you always have to pay, which is the serialization of the data that you're passing. If it's a string, if it's a really complex data structure, we always pay for this cost. It's like calling JSON stringify and converting it to string, and then passing string to somewhere else, and then calling JSON-parse.
That cost is the most. And I basically try to find what is the most optimal way, optimal way of paying this price, but also making it fast. So not always moving them to not the C++ site is always a good site, but most of the time, most optimizations and Node.js are a product of these. Move.
For example, just recently, we had maybe 30, 40 different PRs from 40 different collaborators in Node improving the error path of Node FS modules. So we were previously calling a C++ function from JavaScript, and that C++ function was returning an object. And if that object had an error key, that would be a probing error. But if we move that error state to C++ site, we realize that we could improve the faulty path of these functions by 100% or 150% speed it up.
So these kind of things are what is the most impact to what? So what are you using to measure those changes to know if you're having an impact or not? So benchmarks, we basically run thousands of benchmarks in Node on top of that. It's important to find the edge cases that you pass.
So invalid states, valid states, and whatever different inputs does it take to try to enumerate them. And we have a dedicated benchmark CI in the Node.js infrastructure that we run these kinds of things. And because that machine is extremely old, it reflects the worst case scenario of the optimization and which provides a good baseline for giving proper reasoning for that particular change. And do you have to run those benchmarks on different hardware as well?
I remember probably 10 years ago, one of the Node team members was running a bunch of Raspberry Pies. Or, no, it wasn't hard to do. We know a bunch of Raspberry Pies in their office as part of the test suite, because that was one of the things they targeted at the time, was how fast does it run on a Raspberry Pi? So right now, I think we have maybe 20, 30 different CI machines.
We have only one benchmark CI. This is mostly because we are Node.js is an open source organization and there's not any company behind it that could give us those kinds of benchmarks, big machines, to run those benchmarks. But, and mostly, we don't write hardware specific code in Node.js, which means that we don't write specific instructions for Neon, which is a Mac OS, and for SSE 2, for Windows, which is our 664 machines. In particular for Ada, you're a part of the other part, we have made those optimizations, that's why Ada is actually fast, but that requires to have a direct access to the machine itself, and that's where Daniel's expertise comes in.
And because he's a professor and working and have all these different machines, he helps and optimizes for different architectures. So let's talk about like, there's everybody listening to this, is a web developer, they're writing JavaScript every day, like there's probably very few people listening that write C++ in their day to day. I'm just curious about, at what point you should start caring about performance. So an example I have is, all often have an array, and I need to distill it down, so I'll run a filter on it, and then I'll only to map over it and add some data.
And then I might chain a couple of maps and a couple of filters together, because it's much easier for me to reason about it, to debug it, to read the actual code, versus I could probably do it in a single reduce, but it's going to be much more complex. And I'll often opt for the multiple map filter reduces, and people will say, hey, that's what about performance? And my answer to that is, I'm looping over three spans on a page. And I don't think it matters.
So do you have any sort of guidelines as to like when you should start caring about these things? So performance is a mindset. It's not something that I can turn off, or turn off. Even though that particular code is called, maybe, one or two times in the lifespan of that whole project.
I basically care about writing the best code that I could write for that particular portion. But in order to answer the question, let me rephrase the question for a different industry. When, let's say we talk about security. When you should, when should you care about security?
Would you care for security if you get hacked? Yes. Would you care for security if you don't get hacked at all? No, you won't care about it.
But because you don't care about, because you don't get hacked, you don't look into that particular approach to your code. And then you don't realize that you're already hacked. So this is like a prisoner's dilemma. And also I chicken and the act problem as well.
Because I'll point out to the point that you care about performance. You don't know that it's slow. And if you look into that and realize that it's slow, then you change how you write your code, and you try to optimize it. So it's not about iterating or calling map or filter multiple times.
It's about the principles. How many times, like, what is the big annotation of this for loop that I'm iterating? Am I iterating all two and or five and all? What is it growing exponentially or so and so forth?
So the question is, if you know those kinds of things, and if you also make the delivery choice of doing it, I think that's okay. But to summarize, it's not always like a black and white scenario. But for me, when you learn those kinds of things, you can't turn them off. Yeah.
That's why I'm here. That's what it lets me talk to us very well with. Yeah. So if you're writing like a loop in JavaScript, then you think, hmm, should I use, should I spread this thing into an array and then slice the items out of it?
Or should I do it a different way? What would you do in that case? Would you go to like JS bench or JS per something like that and write a test case and run it 100,000 times to see if there's any difference between the two of them? If you run, if you iterate and if you call the slice, this means that you copy the memory twice.
So this is if you have a thousand array, then you have another memory as well. So if you are doing this a lot of times, then you are basically triggering garbage collection a lot of times and that becomes a V8 problem. So, and because V8 is a JIT, which corresponds to just in time compiler, it's extremely hard for V8 to optimize all kinds of code. So what I basically do is that I basically try to write it as fast as I can using less memory as possible.
And then I benchmark that particular portion. But the question is not about benchmarking the whole thing. I benchmark the whole process, for example, parsing the URL or getting or read files. I basically do it that it's like an integration test instead of calling that particular function, you call everything.
And then you realize what is the percentage of the time that I spend on this particular function. That's what flaigraphs gives us. And in that portion, if it corresponds to 10%, but you spend 90% of the time on V8 or any other function, you realize that there's much that you can do. There's limited things that you can do in order to improve this function.
And then you realize that you give up. And that becomes an educated decision rather than any emotion. I think learning how to read a flame graph is a very good skill for anyone to have. So I had some issues, CloudflareWorker, like 50 milliseconds.
And I was using some library and it kept going over. Like, kept going 80 milliseconds. And I distilled it down to simply just creating an object was creating 80 milliseconds. And I was like, this is ridiculous, right?
And that's a really good example of your mindset of performance first. Because whoever wrote that code that caused that 80 milliseconds in the instantiation probably was not intentionally trying to make it 80 milliseconds. But it sort of happened by accident and it wasn't caught. And I was posting on the GitHub and a bunch of people got the flamegrafts out.
And it was one function that was being called that was taking 99% of the actual thing. And the flamegraph was very visual to show it. And I was like, man, I don't know if I would have ever been able to figure out, OK, it's obviously this part. I can comment out the function call.
But you got to go deeper than that. So the question is, because I contributed and read a lot of those Node.js functions, Node.js internals, right now, I know not a new medical, but I know the overhead of calling a function. And with this knowledge, I tend to think towards saying that, OK, if I call this function, it will make two C++ calls. But if I do this one, it will make one.
And the funding on the usage, this might be faster or not. And then the question becomes much more clear. And you start to look into the flamegraph less unless it's extremely specific thing. Or you have a really big project, but you have no idea what's going on, then you dive into the flamegraph.
We talked to Jared Sumner from BUN. And obviously, BUN is very performant. It's minded. And they always post these graphs that are 1,000 times faster than Node.js.
And we asked him, how is it possible that you can make it so much faster? And his answer to that is that he handles lots of common use cases in the code base. So as I understand it, there will be simply like an if statement for common use case. Oh, if it's an array of one thing we don't need to do this work.
Is that a common thing to do in performance? Is it just up front check for these perf wins? Yeah. So that's what we call an happy path, which means that what's the most positive path towards a function execution that's called.
And for example, if you call the read file sync, the default value of encoding is UTF8. And you know that most people call the read file sync. And they just want to get UTF8. So if you produce, if you add a specific if case, and say that if encoding is equal to UTF8, and you call a specific function super optimized for UTF8, then you have a performance.
This is what I did maybe six months ago in read file sync, which improved the read file sync by 40, 50% or something. So this is what we commonly do as well. You're basically looking to what users do. Or let's say go to Webpack, just if you want to optimize FS, make sure that just like point out how many times each FS functions get called and what it's what arguments you try to find the most common ancestors, the most common values for that and try to optimize for that.
And then you will have a significant boost in the web. A happy path. And another one I hear, and maybe you can explain what this is into performance world is the hot path. Yeah.
If that particular path is called a lot of times, that's what we call a hot path as well. So happy path is towards the positive outcome of this function. For example, when you call an FS read file sync, you don't assume that it will throw an error. So because if you make a probability analysis, you will see that 95% of the time, if you call a read file sync in any application, it will not throw an error, because the developer already checks if it exists or not.
But on that 20%, it will throw an error. So in that case, you ask yourself, this is one much more detailed application, the detailed explanation for happy path is that should you optimize for the error path? No, you should optimize for the happy path because that's the sexy path. And then we have this hot path, which is the UTFA.
So yeah, I should have used this example instead of the happy path. That's good. I recently saw that you made a PR on the Century code base to replace Webpack with RSpack. I've been seeing RSpack pop up a whole bunch.
Is that something that you think is ready for prime time use as a drop-in? To summarize for those who don't know, I replaced the Century's Webpack build with RSpack that comes with SWC and it's all written in Rust. So it's extremely fast. The downside is that it has some weird bugs and being to answer your question.
I don't think it's ready to be used by a project with the site of Century because it's any sort of issues, issue with minimizers or any sort of issue with CSS or whatever. It's going to have huge impacts from the company. So I wouldn't use it for any company with the site of Century, but I would use it for my own personal projects. And because it's extremely easy to migrate from Webpack or RSpack, it's also really good choice because you don't need to change a lot about your deployment.
Oh, here's a question. And this is something that I saw on one of the your pull requests, is what's more important? And how do you decide what's more important? Is how fast something runs or how big the bundle is?
Is it worth adding more code, therefore inflating the bundle size as a trade-off for faster code base? And I know the answer is probably it depends, but I'd be curious to see if you have any thoughts on that. If it's a front-end application, yeah, bundle size is a lot more impactful. But I think the most recent example that I can give you is that Rust has this option to optimize for space or optimize for speed.
And if you optimize for speed, this means that you will have a really high storage bundle size because you add all of these large binaries in order to optimize for those each path. But if you go for storage, then your application becomes significant and slow-wish, but the bundle size is extremely low because you're using the most native APIs that the operating system offers. I mostly, depending on the usage, depending on the API that we're calling, and depending on the impact of that function, I go for speed. I don't go for storage because you can always buy more SSD.
You can always buy more part-risk and more storage. But even though if you have latest computers, like 32 gig machines with 10, 15 core CPUs, if the application that you're running is not optimized for running fast, then there's much that you can do with even with the fastest architecture and fastest computer. So yeah, I would go for speed at the end. And this is what we did with Ada.
And I think this is what is going for Node.js as well. Awesome. I just wanted to have a quick question, because I know you mentioned that you have been committing to Node for what you said. How long was it?
Not that long, right? Two years? Yeah, almost two years, I guess. Almost two years.
What's it like working on a project of that nature? Is it? I get the vibe that working on Node can be kind of like thinkless at some time. Do you get that?
Or what's it like? So it's extremely beneficial for personal development, because you basically have access to a lot of really smart and intelligent people that wants to improve the code that you've written, because you share a common thing with that person. But in terms of the impact, it's impactful for personal and worldwide. But it's the most like you never get any positive feedback about from people all around the globe.
And most of the time you even get crucified as a correct word, I guess, because of what you're doing. And because it's a face of people and use it. And because they're using it on their most crucial applications, they have this expectancy towards it. And that makes it a really hard job to do it, because you are contributing it on your own time.
Like I'm not getting paid to contribute to Node.js, but people don't understand that and assume that Node.js is the company and Node.js is the organization, but it's not. So it's extremely lonely that I can say that. Yeah, yeah, I get that. But if I look into the past, I can easily say that the changes that I did or any contributor to Node.js, that it impacts the world in a size that we can't comprehend.
This means that because of the performance improvements, we have the environment is a lot better, because we are releasing less carbon dioxide because of the usage of the computers. And because of the impact, because it's making the user everywhere. So that's really good. Yeah, it's crazy the scale making a change can have on things like that.
I remember years ago, I had a friend he worked for a company that sent out millions and millions of emails to, I can say, it was the Blackberry. And every time Blackberry had an email, every single one of their users, they had to design an email template and whatever. And they distilled it down to every single character would cost them an extra dollar in bandwidth in order to send it out from their data centers. And it's wild that such a small thing, including one extra character blown up by 200 million users or however many users they had, would cost them much.
You think about that in PERF. If a CPU is literally running 20% less because you've improved this common use case, that's major, both in terms of how much energy it's using as well as the amount of money people are spending to run this compute. Yeah, and just recently, I realized that because electron use is not just, even improving your health parsing or fast in operations by 5%, you directly have an impact to billion people because WhatsApp web, Discord, Microsoft Teams, they all use not just and their applications start a little bit faster because of the change that you did on that particular code. And that's what keeps not just going.
And because it's not a company owned, it's a community owned, it puts not just really crucial and important. Yeah, I was thinking how funny that is with, you know, like when you go have a piece of hardware and they have like a licenses section that show everything, often I'll go through the licenses of the Instagram app or the infotainment of my desktop. And I'll be like, there's people's names in this piece of hardware that we've had on the podcast. And I would just love to one day be able to like go to a friend or people who work on browsers and be like, hey, do you see how that text is aligned there?
I worked on that. And every single person in the world that is holding an iPhone and seeing left the line text or Flexbox or Grid or using a library to fetch data be like, I wrote that code and now it's being used by a good chunk of the world. So the funny thing is, around three, four months ago, I was applying for green card. And because I was under the O1 visa, which is an extraordinary talent visa type for US, I was applying for green card for extraordinary talent.
And one of the requirements is that your license software needs to be used by a really well-known company. So I basically spent two, three months googling every day trying to find my name on one of those licenses so I can show it to the reviewers of the US government. And I recently found out that Oracle is using Fastify and because Fastify uses FastQuery string, my name is included in the Oracle's one of the like the manuals. It's extremely funny.
And we were talking with Mateo in North Carolina a month ago. And I was like, hey, man, your name is on Oracle. And I was like, yeah, so what? You also used it.
And I was like, oh my god, this is really, really bad. Oh, that's great. This is the reality. Everyone uses open source like this and they put it nice.
But the weird thing is, no GS license is included in those pages. Oh, yeah. So even though a lot of people use Node GS, you will never see Node GS license in those functions. But they include 300 line NPM package license, but not include Node GS, which is sort of weird.
That's the reality. I remember years ago, there was a blog post from Daniel, who is the creator and maintainer of Curl. And just the amount of emails he gets from random people who don't understand what Curl is because they find it on their device or they find a file on their computer somewhere being like, you hack my computer. Yeah.
Yeah. That's really the reality of the situation. And even my own parents or my friends doesn't realize the impact of the things that I do. But at the end, whenever I put my hat to the pillow, I'm extremely satisfied because I did what I did, what I could do for the better of the world.
And that's something that no amount of money can pay or get to. What do you think about TypeScript support in Node? What are your personal thoughts on it? I know we have Dino has TypeScript support, but has TypeScript support.
Is that something that you think Node will ever adapt? So before answering what I think, it's best that we didn't include it yet, is that because TypeScript team doesn't want us to vendor TypeScript because TypeScript doesn't follow semantic versioning. So any changes can come in any version. And that's kind of really scary situation for Node.js, collaborators and maintainers.
On top of that, I think the benefits of TypeScript is there's no discussion there. With GS Lock and Infillisense, those kinds of benefits up until a certain point is also beneficial for it. And with, I think there's a new proposal on TC39 for making TypeScript types as a comment, so that you don't need to build and transpire your code to like, yes, I'm a CGS, that's also solved as well. I think without rendering TypeScript, we are extremely close to supporting TypeScript on Node.js, which is if you just need to install loaders, and you can just pass node minus minus loaders equals whatever that package name is, and you can run any TypeScript module right now.
And these changes are done by the Node.js module team, including several really smart people. And without rendering TypeScript, we could literally do anything because of all of those changes that's happening to the SM module right now. Yeah, when loaders came out, this was probably three years ago, but that's immediately what I thought. Like, maybe you can describe what loaders are because I think it's sort of an unknown feature of Node.js, and it can make things like supporting TypeScript maybe a bit easier than people think.
So, loaders are what we call, whenever you record, like import or require a module, before resolving that particular file, the contents of the file, we pass it to a hook, and you can tap in and mutate the contents of the file. So, if that file is a TypeScript, you can dynamically compile the TypeScript file into a JavaScript and then return JavaScript from the file without updating or building it. This is extremely beneficial because this means that the contents doesn't matter. It can be mutated or updated or changed dynamically by a third party application, which enables those kinds of improvements.
One of the loaders that I think, I'm not sure if it's being written or not, but I just found out that CSS modules, I didn't even realize it, CSS modules is a ECMAS spec, and the way that you can import CSS modules into JavaScript is via a loader. Is that right? Maybe it's as fast, what you're doing, yeah. I thought that was kind of thing.
If it's a CSS file or if it's something else, like you can basically do anything, you can just load a Mac OS application, you can load Rust code in Node.js. If you, so right now we don't have, we have an FFI, but it's an open PR, and you can basically compile Rust dynamically, and enable it as a module by using FFI with Node.js. Oh. So you can make it be run Rust, or whatever language that you want.
That's cool. Because that's, Scott just wrote a really cool loader for our own website. You want to talk about that? Or not a loader, Rust file?
Yeah. Yeah, just trying to get my hands dirty a little bit in Rust world. So I wrote like a script that is checking to make sure we have the right thing set up, the right ENV variables, duplicating from an example if the example has been updated, but the personal ENV, basically just trying to get some of the onboarding of the site, seeing the database types of things. Yeah.
Oh yeah. That's pretty cool. But you have to compile that and ship that with the repo, right? Because it's native.
So this FFI feature might help us. So you can basically, if there's a Rust compiler, if it's a Rust compiler in npm, you can basically call that compiler and make it an execute that function or through a child process or whatever. You don't even need to call FFI. And then return the response of that to the Azure.
Oh. So you could write a loader that requires a compiler that you npm installed and then return the compiled Rust from that. Yes. So basically because it's a sync, you can do that whenever you require a Rust function, it might compile it.
If it's not compiled, it can create a child process or a virtual threat, execute that. If it's a child process, it could execute it and return the STD out by the output of the loader. And then you would just get the response set. You would just get it.
Cool. So yeah, there's like, regarding this, the limit is yourself. Like you can still do anything. And that's what frightens me the most.
And also makes it. Yeah. And Node.js is people are going to groan when they hear us, but stay with us because I think it's actually really necessary. Node.js is getting a Kins Big File.
Is that what you're working on right now? Yes. Right now I'm working on a config file, which basically is a JSON file that you can pass any node options to dash. The main reason for this change is that there's a limit to what you can give to the arguments because of the operating system limitations.
So if you have a file system, if you have a config file, it's extremely easy. And because I recently added SimJSON, which is a supervised library written by Daniel Lemuric, it's really easy to parse and super-fast parse JSON file. Oh. And people are probably thinking, don't we already have a config file, which is the package.json.
Why do we need another config file? And also real quick, the idea with the config file is that you often have different NPM scripts, right? And you have a dev command and you have a production command and maybe you'll have another command that runs it with a bunch of different flags added to it. And like you said, that gets kind of unruly where you have all of these flags.
So with a config file, I can say, like, all right, I want debug, maybe a port value, anything specific to that type of running instance. Is that right? So, yeah, so the issue is that even though we bundle NPM, node project and NPM project is extremely different projects, and it has different goals. So let me just want to clarify that.
The other thing is that if so, because of how the ESM and CGS folders are implemented, if you execute a node index.js command, it will check for a package.json in your current directory up until to your root directory. So if you are under slash users, they're stuck, coding, blah, blah, it will make seven, eight different file system operations trying to find the package. This is mostly done because that package.json contains either type module or type.com.js. So you know how to load an ESM module, whether you should call an ESM module or a common.js module.
So if you have lots of configurations in here, we tend to do it. Yes, then it has support for having a key in package.json. Pre-tier has it, a lot of Volta has it, a lot of different packages has it. And if you make this package.json file really big, then it becomes a performance bottleneck because then you need to parse that huge JSON file and trying to find the values that you can most, which is name, type, exports, imports, and one more thing that I forgot.
So, and using a config file is extremely optional. We don't want, I don't want to affect any existing applications in terms of performance. So we shouldn't encourage people to use package.json as a result. We want to use package.json as a key value storage because that's not the intention of it.
I don't know the backstory of how package.json becomes like package.json but for the sake of it, it's important. But what you can do is that if you want to use a package.json, of course, you can just, you just need to run nodes with nodes, config file is equal to package.json. And if you have no specific key value pairs, then you can use it as a config file as well. But that's not the intention because we basically check for a JavaScript that we check for a JSON file and that JSON file is technically package.json is also JSON file.
So it can be used but if it's highly unlikely and I don't recommend it at all for performance reasons. And that, what was it? The JSON lib that you included? What was that called again?
SIMD JSON. Yeah, I'm just quickly Googling. Does that support JSON 5, which is not a standard but it's the better way of writing JSON where, because I've always wanted to be able to use trailing commas and put comments in my JSON in Node.js. But I've made issues many times specifically in the NPM repo and they've closed it saying it's too much breakage to the community.
So the thing is JSON is a specification, JSON 5 isn't as far as I know. And yeah, SIMD JSON doesn't support it because it goes for validity of the JSON file. So I think BioM and other implementers, they support these kind of things. They even support having faulty JSON but also continuing to parse the remaining values.
But we don't. And I don't think SIMD JSON supports it as well. But SIMD JSON is the fastest JSON parser I think in the world right now. So yeah.
6 gigabytes a second to minify JSON. Validate at 13 gigs a second. It's true. You think about podcast, people who run podcast studios or podcast apps, they have to download gigs and gigs of XML.
There's probably other people that have to download gigs and gigs of JSON as well. So SIMD JSON is written by, again, then you'll enumerate the professor that I've worked with on either your other parser. So it's an extremely optimized library. So they are currently using SIMD JSON with pick a JSON resolver.
So whenever you find the correct one, we parse that and retrieve the only important ones. So with that PR, I think the SMC JSON applications just got 5% faster. So that's the impact of a library like this. It's amazing.
It's amazing to have such a big impact with a small change. I know that's been a kind of a theme throughout this conversation. But yeah, wow. It's really, really awesome.
Yeah. Well, by the way, that PR was 120,000 lines of code because I included SIMD JSON as a dependency. The actual code change was around maybe 1,000 or 800 lines. But it's extremely small.
And we just need to look into the code base and try to find those signs of auto next. And even small things can have a pretty huge impact. But unfortunately, because companies doesn't want to spend money and they don't want to sponsor or whatever their internal legend is, we don't have that much improvement. Thank you so much for all of that stuff on Perfint General.
I mean, I think a lot of it is pretty eye-opening to the challenges that, you know, the no team or any developers face here. I think now where we want to go with the show is taking it in the part of SickPix and shameless plugs, where you can bring us a SickPix, which is something that you're just like right now in a shameless plug, something you want to plug, something you want people to check out. Okay. So for SickPix, I can say that I'm a huge fan of rust based linters, including bio, but there's also a new kid in town, which is OXC is written by Boschian and is maybe three times faster than bio, but doesn't have any format right now, but the linters extremely fast.
So I really like that. I've never heard of this. JavaScript oxidation compiler. I'll tell you what, I heard about it this morning while I was looking through Yagit's Twitter and found a retweet about this.
And I was like, I've never heard of this. Yeah. I'm a sucker for performance tool. So I feel like, you know, those kinds of things.
Yeah. I've been hearing just this week, everybody's talking about bio and now I'm hearing about this. I think it's a really cool because linters and formatters. And the good side is Boschian.
Yeah, the Boschian was a contributor to bio in the past in Rome. Okay. So yeah, but the performance, if you look into the benchmark, it's extremely good. And what I really like about the project is that it's extremely good.
Co-op saying that performance issues are considered as a bug in this project. That's what he said. And it's like, that tells something about the character of a project. And I really like respect.
Yeah. Yeah. If somebody can figure out how to parse and maybe replicate some of the ESLint configs, but in Rust, I don't know if that's possible. They did.
They did it. Yeah, Rome did it. And OXC. So Boschian is also like one by one moving ESLint rules to OXC.
I think he just did unicorn. Yes, then config unicorn. And there's like maybe 500 different rules or whatever. I don't know.
The impact is huge. And for it, I think the one of the lead engineers of view told wrote on Twitter saying that running OXC on the view repo story takes 50 milliseconds. Oh my gosh. It's enabled to rule.
It's like how is that possible? Not just 200 rules on 590. Yeah. 200 rules on 590.
That's true because I'm starting to see sometimes in my editor, specifically in our spell project, is sometimes I hit a hit save twice in order for the linter to catch up. It'll show me an old error. Yes. And I'm like, no, I fixed that.
And I get to hit save again. I don't know if that's in the saving or in the parsing of it. But that's awesome. Not just initialize.
So if you run a console code, hello, bro, don't know. It takes 150 milliseconds because of the eight bottleneck. Yeah. With trust applications, it's extremely optimized.
So you don't need to do it. And if you're a good engineer, it's like a question. Then you can basically go for IO bound, which is the limitation of the computer itself. And this is what's happening here.
Holy. So it's three times faster than a console log to let your entire project. Oh, that's awesome. This is why even though I really like yes and projects and they really revolutionize the industry and not just applications, they are unfortunately doomed to rock.
Not rock is correct, but like to lose that particular point because of how these super optimized applications are coming. But one bottleneck in the scenario is that because they're run on Rust and they don't have any just with context, all of the configurations in all of these projects, they're not pluggable. I can't add a new rule without contributing to bio mode. So hopefully they will find a solution, which is worth it.
Like even if he writes a rule in Rust, I'm okay with it. But yeah. Yeah. Yeah.
We had Nicholas Zacus, who's the author of Yistlint on and he's been he's like, we can't ever rewrite the whole thing in Rust because like you need to be able to let the community author the stuff in JavaScript, but there are many parts of Yistlint that can be rewritten in Rust, especially parsing all of the files and parsing the JavaScript. So this is what I did with PMPM. Five months ago, I started rewriting PMPM with Rust and I even it's called packet, PACQUT, which is under PMPM organization right now. But eventually, yeah, those kinds of projects, they need more than one collaborators to finish it.
And the question becomes, so for the case of Rust, even writing super fast Rust is extremely hard because right now, then if you're comparing single thread to different like single thread Rust and single thread Node.js is almost fast if you write the code really good. But if you write multi thread, then the question becomes how can I write a really performant multi thread application and then the real engineering comes in that you don't have to think about in any Node.js application because it's single thread. Yeah. I was doing a, I was trying to write a video encoding app and I was doing frame by frame of raw video.
And I mean, those are just massive vectors. When I went to do it single thread at in Rust, immediate frame rate bottlenecks and all sorts of things. And then okay, now I have to get into multi threading, which is something I've never had to think about as a web developer in my entire life. And it was shocking to me how much, how much, even a small dive into that world was able to unlock a lot more performance for me in my project, get me to 60 FPS what I was trying to hit.
And it was just, yeah, it was a, it was a nuts experience to actually see that type of world in action. So there is this particular gray area that most people don't know or don't want to contribute to that before going into multi trading. You could just use SIMD instructions, which is single instruction multiple data, which means that if you have a for loop and if you have a hundred elements instead of going one by one, if you're instruction, if your computer allows it, you can go like 16 by 16 by 16 and iterate it in the last amount of life on a single thread. So I think on Rust, SIMD is still experimental, which is an extremely bad case, but on C++, on those kind of things, this is what people use.
Video processing SIMD is extremely useful because of all of these really big matrices that you need to trigger some, anyways. Interesting. Yeah. Mine is definitely more of a, I'm exploring rather than trying to make it useful out of this, but yeah, interesting.
Cool. All right. Last thing is a shameless plug. What would you like to plug to the audience?
I don't know. Twitter account, if you care about any performance, I recommend following me. I'd like to write some blogs on my personal website, which is yagis.co. You could just follow there.
You're a great follow on Twitter, by the way. Yeah. Yeah. That's why I don't BS.
I only, she care about posting quantifying data. So that's why I get, but yeah, thank you. It's good to hear that because that's where the real engineering comes in. And I want to be known for the engineering, not for the BS.
Yeah. Appreciate that. I know it sometimes brings the heat, but it's a good follow. Well, and also for everyone listening, we have a Twitter list of all of our guests.
So we'll throw you on there as well. Thank you so much for coming on. Appreciate all your time. This was awesome.
And we'll catch you later. Thank you. Head on over to syntax.fm for a full archive of all of our shows. And don't forget to subscribe to your podcast player or drop a review if you like this show.