Supper Club × Messaging Queues and Workers with Armin Ronacher episode artwork

EPISODE · Jun 30, 2023 · 59 MIN

Supper Club × Messaging Queues and Workers with Armin Ronacher

from Syntax - Tasty Web Development Treats · host Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

In this supper club episode of Syntax, Wes and Scott talk with Armin Ronacher about his contributions to open source, queues and messaging in apps, scaling up a queue, and how it all works at Sentry. Show Notes 00:35 Welcome 01:49 Who is Armin Ronacher? Armin Ronacher mitsuhiko (Armin Ronacher) Armin Ronacher (@mitsuhiko) Armin Ronacher Armin Ronacher Apache Kafka 04:11 What are queues and what are they used for? 08:02 Do you listen or poll for updates in the queue? 12:49 Does this help when a provider goes down? 18:31 How do you architect a queue? 20:20 How does it scale up? 27:05 How does Sentry manage all the data flowing in from events? Redis Message Broker | Redis Enterprise Messaging that just works — RabbitMQ Using RabbitMQ — Celery 5.3.1 documentation 33:45 How do you visualize the data? 37:15 Edge case that Sentry had to fix 40:22 How are you using Rust? Rust Programming Language 43:32 Why is Python so popular in the AI space? 45:17 What do you think about JavaScript on the server? 48:02 Supper Club questions 50:44 How do you stay motivated with programming? ××× SIIIIICK ××× PIIIICKS ××× Bilderbuch Bilderbuch on Spotify Shameless Plugs Rye - An Experimental Package Management Solution for Python Tweet us your tasty treats Scott’s Instagram LevelUpTutorials Instagram Wes’ Instagram Wes’ Twitter Wes’ Facebook Scott’s Twitter Make sure to include @SyntaxFM in your tweets Wes Bos on Bluesky Scott on Bluesky Syntax on Bluesky

In this supper club episode of Syntax, Wes and Scott talk with Armin Ronacher about his contributions to open source, queues and messaging in apps, scaling up a queue, and how it all works at Sentry. Show Notes 00:35 Welcome 01:49 Who is Armin Ronacher? Armin Ronacher mitsuhiko (Armin Ronacher) Armin Ronacher (@mitsuhiko) Armin Ronacher Armin Ronacher Apache Kafka 04:11 What are queues and what are they used for? 08:02 Do you listen or poll for updates in the queue? 12:49 Does this help when a provider goes down? 18:31 How do you architect a queue? 20:20 How does it scale up? 27:05 How does Sentry manage all the data flowing in from events? Redis Message Broker | Redis Enterprise Messaging that just works — RabbitMQ Using RabbitMQ — Celery 5.3.1 documentation 33:45 How do you visualize the data? 37:15 Edge case that Sentry had to fix 40:22 How are you using Rust? Rust Programming Language 43:32 Why is Python so popular in the AI space? 45:17 What do you think about JavaScript on the server? 48:02 Supper Club questions 50:44 How do you stay motivated with programming? ××× SIIIIICK ××× PIIIICKS ××× Bilderbuch Bilderbuch on Spotify Shameless Plugs Rye - An Experimental Package Management Solution for Python Tweet us your tasty treats Scott’s Instagram LevelUpTutorials Instagram Wes’ Instagram Wes’ Twitter Wes’ Facebook Scott’s Twitter Make sure to include @SyntaxFM in your tweets Wes Bos on Bluesky Scott on Bluesky Syntax on Bluesky

NOW PLAYING

Supper Club × Messaging Queues and Workers with Armin Ronacher

0:00 59:07
of MATCHES

TRANSCRIPT · AUTO-GENERATED

I sure hope you're hungry. Cool, I'm starving. Wash those hands, pull up a chair, and secure that feedbag, because it's time to listen to Scott Dolinsky and Wes Boss attempt to use human language to converse with and kick the brains of other developers. I thought there was gonna be food, so buckle up and grab that o*** handle, because this ride is going to get wild.

This is the Syntax Supper Club. Welcome to Sentry, folks. We've got a very good episode for you today. We have Armin Ronaha.

How did I do? Perfect. Pretty good. Perfect.

Wow. I don't know if he's being nice, but pretty stoked about that. So Armin is principal architect at Sentry, and I've been following his work for quite a while, even before Syntax joins Sentry. And so the other day, I was in the chat, and I was like, we've got to do a show on message queues, and basically, how do you deal with getting lots of requests at once, where you maybe can't handle it all at once?

Or messaging queues is just kind of a general idea. That's something that we haven't really covered too much. So I was just like, hey, you know who gets a lot of requests and probably knows a lot of those things? Sentry.

So I was like, who at Sentry can talk to us about messaging? And that's kind of one of the things that I'm excited about joining Sentry is that you have access to some really cool people. So welcome, Armin. Thanks so much for coming on.

Yeah, thank you for having me. It's a good topic to talk about. I like queues. Oh, good.

That's good. So give us a quick rundown of who you are, what you do. I think you might be the first person on this podcast that has a Wikipedia page, which is unbelievable. I don't know if President Warner has one.

Oh, you're right. Maybe second. Yeah. Okay.

Yeah. So I like open source. That's kind of why I'm at Sentry, too. Most of my background is in Python.

This is also, we have a lot of Python at Sentry. I built a bunch of, let's call it, frameworks and utilities for developers. Originally, Python web frameworks. I built a Wispy library for building web apps in Python called Ractoic, and then I built a web framework on top of it called Flask, which was quite popular.

And then I built a bunch of template engines over here. I built one in Python called Ginger, and then it's second version called Ginger 2. I also built the PHV version of it called Twig, which later on turned into, I think, the Symphony project. And these days, mostly, I have a strong interest in Rust, and I also do, on the side now, sort of an attempt of fixing Python packaging.

But I'm for, I think, nine years now at Sentry, trying to make everything work in one way or another. And I'm based in Indiana, and we have the teams here, which generally produce data. So it's the SDKs, and it's also the ingestion pipes. So everything that is getting event data into Sentry is happening here, up to the point where it hits the database, and then other folks take over.

So I basically have nine years' worth of experience of feeding Sentry event data into the system in one form or another, both from the client SDK side and the QE side of things. That's awesome. Well, I'm glad to hear that. So it seems like you're the right person to talk about messaging queues.

So in most of the applications that I've built, and probably the same for Scott, and probably for a lot of people listening, is that often they're just sending a request. They're sitting there, they're waiting for the request to come back, and they sort of deal with that. And they've never gotten into sort of like queuing or things that can't happen immediately or hoping that something comes back at some point. So can you give us an idea of what are queues and what are they used for?

So the short version of which people I think get into queuing stuff is they build an app, it takes an HTTP request, and then it takes a while. And then for one reason or another, they really want to basically pretend that work is already done, but they're not done yet. And so the way this is typically described is someone goes to second-to-one, so they're like, how do I run in the background? This is, I think, how people go to queues.

And then usually the answer is, well, it sounds like the kind of problem to which you need a message queue. And the idea is that you take the kind of work that you want to do, you throw it in a message queue, and then some workers are going to pick it up and work on it, and then maybe eventually produce a result somewhere. So this is sort of the typical reasons. Rather than doing a thing right now, you want to postpone this problem for a little bit later or very much later.

And so the idea is basically take a bunch of work distributed to workers that are targeted to solve this kind of problem. And one of the reasons why you want to be doing this is because in a good situation, you free up the work that your HTTP request handles might be doing so that they can do HTTP request handling, and then when a huge inflow of data comes, they are happy because they mostly are done with their work, just getting the data into the queue and then sort of work off the backlog. As you have capacity, then you can kind of scale this up too. That's sort of the basic reason why you have a queue, I guess.

So when you have these processes that need to be moved into a queue, is that typically offloaded onto totally separate servers, totally separate infrastructure, or can that often be run with the same... In an incentive case or in a general case? Just in a general case, I'd say. I think in a general case, there are a bunch of different reasons why people build a queue.

The biggest one is just the initial one. I want to just have the ability to create a backlog. And then it's often the same code is. So particularly in Python, you have systems like Celery that sort of advocate or advertise the idea that you just have a function here that's going to invoke later with the data on the queue.

And it's a different process running, but you wouldn't really notice. You can use all of your same Python codes and that kind of stuff. And then in, I think, recent years, quite popular... I don't know if it's a good plan or not, but a very popular design element has become building ridiculous tiny microservices.

And so message is a really good way to sort of make them talk to each other in one form or another. And so then you end up in a situation where maybe you have sending stuff from Python, picking up a node, send it further to a go-back-hand, and just build yourself into a crazy unmaintainable mess this way. That's kind of why I'm glad I had you on, because at the end of the show here, I just have a whole bunch of questions. What are your thoughts on X?

Because I know from following you on Twitter and whatnot that you've got opinions on stuff. So I love when we have to go on like that. So I'm like, tell me what you really think. So with queues, you're able to, like, if you publish something into a queue, I would be able to, like, listen in my Node app for when something gets added to the queue, or do I pull it, or how does that work?

It depends a little bit on it, because, like, in... So first of all, you need to pick a system that... Well, you need a bunch of things. So first of all, you need a queue.

And so that is, for instance, it could be Reddit and queue. It could be Reddit and a bunch of different things with different qualities and benefits of it. And they behave in a certain way. And then you have new-fangled stuff like Kafka, which is also queue-ish in kind, but actually behaves quite a bit different.

And so the way you are on a low-level interactive thing is quite a bit different. But generally speaking, you have, in addition to your queue, some sort of utility that helps you manage this a little bit better. And so in Python, for instance, this is sort of what Centrality uses. You have Reddit and queue sitting behind an abstraction called Celery, which then sits on top of a sort of a lower-level library called Combo.

And the idea is that you don't really have to deal with all these integral parts of it, because there's a lot of things that you might want to deal with in a queue, like you need to serialize the data, you need to deserialize the data, there have to be some policies about where you route this kind of stuff, and what happens if the task doesn't get acknowledged, you want to retry it, you want to compose these kind of things together. There's a lot of stuff that you can do. And so depending on the ecosystem that you're sitting in, you might have this kind of thing going on. We have some sort of framework that helps you with this.

And so in Node, I'm actually sure what the most popular way is of talking to queues, but in a Python ecosystem, I would say that historically, Celery was the way to go, and typically you have some sort of decorators, like, hey, this function is a task, and then something else says, okay, I want to produce an item on the queue that eventually gets handled by this task, and sort of magic makes it pick up. Yeah, and let's talk more about what types of stuff people generally put into queues. So one example I have is Amazon. You buy something on Amazon, and I once had an expired card in there.

I tried to buy something, it's like, great, it works good, your order is done. And then, like, half an hour later, I got an email that says, hey, your credit card declined because it was expired, or whatever, you have to go in. So I thought, oh, that's interesting. They don't process the transaction while I'm sitting there waiting for the request to come back.

They'd probably throw it into some sort of queue, and then process them either as they have time, or I'm not sure really why they do that. Do you have any other examples of common stuff that gets thrown into a queue? I would say, like, the most common thing is usually anything that talks to an external service that is kind of fire and forget. So a classic example here is, if I want to send you an email, very often I just say, like, send this email, but put it in a queue.

Because I don't really care if it goes out straight away. I can think about it a little bit later. And my email delivery might depend on the availability of my own email server or something else going on. So, like, any sort of, I need to notify an external service, like my local email service or, like, some external service on, like, an outgoing thing, a queue is a good example of how you would probably relatively naturally try to solve it.

I think anything that's sort of related to external service, in particular on payments, very often goes through a queue. And on payments in particular, because it depends a little bit on how the abstraction goes with in Stripe, in particular, sort of high-seller office away from you. But in the past, I had to implement payment processing with a company called Global Collect, which I don't know if they're still around, but they made it very, there was no abstraction, it was very, very tricky. And so, even if you only did a credit card transaction, which typically would be processed immediately, that same kind of interface might also send you through PayPal.

And PayPal had this awesome payment flow at the time where, like, 90% of transactions would, like, go through immediately, but then 10% of them might go to a customer can wire money into a reference account. And it might take days to come back, right? And so then you want to keep the state somewhere, like, hey, the transaction's still not being done. And so you have to periodically check if the thing went through, so you might keep putting tasks into the queue until this thing eventually transfers.

And so you start to maintain some sort of external state machine with this, and then you have these tasks sitting there trying to do stuff. Is that also very helpful for when an external provider could possibly go down? Because, like, Amazon went down last week, and there was kind of two emails I got, which is, one, our service went down, anything that happened in the three-hour window is gone forever, and they're probably not using a queue. And then the other one was, as soon as Amazon went back up, I got a bunch of, like, delayed emails of, like, X, Y, and Z is now done, or it's processed, or it's saved.

And I thought, oh, interesting, they probably just filled up their queue, and then once whatever service they needed was back online, they were able to process it through. Is that what people do to avoid going offline? So queue is a good way of doing that. At the time that you put a thing in the queue, you have to figure out what it should do when it doesn't work.

For instance, payment transactions typically are the kind of thing that you want to give a bunch of tries until you give up, right? So, like, let's say you have 10-week tries, and then maybe you're spaced in multiple days. Very classical example, because even on credit cards, people might like to max out the credit cards by the end of the month, and so if you only give it a single try, then if the card is maxed out, you're just going to lose out on the transaction. So if you keep trying a little bit more, then maybe you try it for five more days, once a day, maybe you make it so that eventually the card has balance on it, and they can charge it.

So that's a classic case where, like, you would, even if the task fails on the queue, maybe you kind of put it back in one form or another, typically, like, how does that type of task system work? Is that usually just done through, like, a job that's scheduled at various times to process the queue? Is that... It depends, like, the queue is...

Postgres actually is a pretty decent queue for the kind of behavior where you want strong persistence. If you have these tasks that take a really long time to execute because you might have multiple days of retries, you can actually store it in Postgres. Postgres has a built-in sort of system that can be used for that. And these kind of things where you have, like, these long-running tasks, like jobs that are sort of addressed and, like, want to introspect and that kind of stuff, you would often use a system like this.

And usually you have some sort of extra components sitting around that helps you execute these long-scheduled things in Python, for instance, when you sell a repeat, it can also be used to periodically schedule tasks onto the queue, like you run this once in a minute, something like this. So there are very different ways in which you can do this. So even if queues are not naturally able to delay, like, retries, for instance, doesn't have much of it. It has a queue.

It has a list onto a list. You can sort of build your own queuing behavior. But then if you want to do things like execute things an hour in the future, you can have to reach for more abstract ways of implementing it yourself. And then there are certain systems on top that might help you with that.

I'm not sure right now what the latest flavor of queuing brokers on top of retries is, but there are very different ways in which you can implement it. And like Celerity, for instance, has solutions, depending on which queue you use, they will have something for you. You can almost feel like a cron job in your queue, would you say? Yeah, I think cron jobs on a queue is a very common kind of thing that you do.

Yeah, that kind of answers one of the questions I was going to ask next was, because a lot of the services you mentioned do seem like, you know, maybe very specialized, whether that is the Kafka, or you mentioned sometimes putting in Redis. Do people put queues into databases? So you did just mention, so that is kind of the answer. They do potentially put queues into databases sometimes as well.

You can think of a queue as like a very simple thing. Like you put an item in on the left, and then first in, first out. That sort of theory. But depending on what you want to do with the things on it, that problem turns really quickly, really fast.

And so there's a whole bunch of very basic queuing theory that is worth having in mind before you actually start going on an adventure of trying to build something. Because at scale, all of these things matter a lot. And so depending on what you want to do with this, the very basic things you have to keep in mind is like, is my task idempotent? That means if the task were to run twice, is it a problem if it runs a second time?

If it's idempotent, then a second time it will not create a different result. It may be able to run at all because it detected it already ran, or it will do the same kind of action, but not in a destructive way. But there are certain tasks that maybe are hard to implement. If you were to run them a second time, then it will actually count twice or something like this.

And so another very important point is like, if I'm accumulating a large backlog of items, because I cannot actually process all the items on the queue fast enough, what do I want to do then? Do I want to throw them away? Do I actually want to scale up and actually commit to processing this down? Do I want to slice the queue in half?

Or is it like, actually, I built such a big backlog, I want to eventually drain it down. But I want to skip ahead and process the items from right now. There are many different ways. And depending on how you implement all of this, certain things become possible or not.

And so depending on the kind of thing that you want to put in the queue, there are many different ways in which you can go down and which solutions are better and which ones are worse. And how do you typically architect something like that? Is it like a state machine? Is it a bunch of code?

Do you have like whiteboarding diagrams where you've got arrows pointing everywhere? So I think the problem essentially is that every problem, we have a hammer, every problem looks a nail, I guess. So we have a very specific kind of problem, which is we have a lot of incoming events. We need to process all of those.

And so all of our solutions, more or less, are built around the very fundamental part that backlogs are terrible. Because if I press pause on our system for like a minute and then I press play again, the amount of incoming events that have accumulated in this one minute is a sizable backlog. That's going to take a while to crunch down. And this is, at scale, like you have this kind of problem where like backlogs are really, really bad and you want to avoid them.

Whereas in many other systems, backlogs are actually what they kind of, this is why you build this. You build this so they can sweep up like maybe days worth of backlogs so they can sort of process them down one after another. And so it really depends on what is it that the problem looks like that you want to do and how you do it. And so even the process of solving it, I guess, comes to what is the specific problem that you have.

Because a lot of those things I can come down to, like they look like one of a couple of different types of problems. and then you don't have to go deep into it. You're just like, okay, this is this kind of problem, so I'm going to use this type of queue. And then from there, you maybe go more into that top of it.

But it's not that you have to over-engineer the whole queueing story. There's some very basic principles, and if your problem looks like one of those, then there's some best practices to think of. And do you typically configure concurrency? I'm thinking about like a serverless function.

Like, let's say I'm generating PDFs and all of a sudden my thing gets super popular and my queue goes from 8 in the backlog to 8,000. How do you typically do with that? Do you just go and turn the knob on your servers or do you change the queuing number? I mean, it's really dependent because Kafka is very hard to scale up.

And Kafka is not really a queue in that sense. But certain systems you can sort of almost naturally scale up. Like, Revit is a really decent kind of system to auto-scale up because there's one component that sort of gives us a task to a bunch of workers and you can just scale them up, right? Where it's tricky is if the end result of a task then feeds into another thing.

Because if you say, okay, I'm auto-scaling based on some sort of really primitive parameter like CPU load, you might go into a system where, well, now I have a backlog because my, whatever, credit cards, like a launchade, a lot of people put credit cards and all of a sudden you're going to spend all the time credit card processing, right? And so you're scaling up automatically the workers because you want to handle all of those credit card transactions as quickly as possible. And then let's say after you're done with the credit card transaction, you're creating another task and that other task is doing something else that's really slow. Let's say, I don't know, it, I haven't got to do it afterwards.

Maybe it just, I don't know, creates an account and provisions like virtual machines. I don't know what's happening. But there's something slow happening after the credit card. Now the problem you might have is you're burning down this backdrop of a credit card transaction and really quickly now because you actually scaled up the whole thing.

But then you overwhelm the next system in line and you actually turn out like that isn't possible to scale to the same amount. So scaling these kind of queues can be tricky because nothing is infinite. And the way you can think of backpressure is sort of like a bathtub. The water goes in and water goes out, right?

You have water flowing on the top and then water leaves through a hole in the bottom. And how, and you can always think like after that bathtub, there's another bathtub where like you have like a connected set of bathtubs, right? So for whatever reason, when the first bathtub empties out it goes to the next bathtub. Scaling up means making the first bathtub bigger, right?

It doesn't mean much more than that. It's just like, okay, I'm, I'm speaking of in this case, a second hole. It's like it drains quite as much, twice as fast. But if the next bathtub is smaller, then it doesn't help me to empty the entire first bathtub and the second one because the second one is going to overflow.

And then I could sort of make the second bathtub bigger so that it can hold the whole volume of the first bathtub and the second one. But eventually the size of my bathtub is finite. I don't want to have an infinite sized bathtub. And backlogs are kind of like this.

The idea that if I don't have backpressure, I have infinite sized bathtubs. And that's a problem because it means that the first inflow gets never fluddled. And that means I'm committing myself to all of the water in all the bathtubs. And that's a bad idea.

What you really have to do is you have to communicate eventually to the beginning of the system that, well, I'm actually overwhelmed. I don't want to deal with this right now. So that eventually the pain stops and someone just doesn't give you any more water. The problem with the cues is that you kind of get this idea to committing yourself to all this kind of work that you want to do.

But sometimes, and in fact, most of the time, it's really important to have a system in place that says, I'm actually overwhelmed. I don't want any more water. And that sort of very important backpressure design is often forgotten in this kind of system. And it's especially stupid if you have a system that sort of has really big cues.

Because if you design century from scratch and you would have an infinite cue and let's say you have an hour and a half worth of downtime and you send every single event into the system, you commit yourself to doing all this work. And it's like, before you can get any of the new events and you have to burn through an hour and a half worth of old events, because that's what the system did. You have accepted all this work already. And maybe that's not what you want.

Maybe you want to say, hey, I actually want to prioritize new events now and then I want to have a second system in place that burns through the backlog that we have accumulated. There are all these kinds of really important basic ideas of how you deal with this in the face of stuff not working well. And usually the answer is not to blindly scale it up. You kind of have to understand if it's just a few people working.

So what I'm getting is that there typically or can be multiple cues, right? You're not just looking at one big line, essentially. Yeah, usually you have like multiple things that are working in parallel and independent of each other, but then every once in a while is built from one cue to the next cue. Is there like an inflection point in which you would add another cue instead of tossing more resources at it?

Like what is that inflection point? Does that go along with the bathtub metaphor you were? I mean, if you have things that were completely independent of each other, you try to keep them at least on a full configuration point of view so that you can split them onto dependent cues. If you don't have to actually keep them independent or if you throw them on a big thing in Revit, for instance, it's, you can sort of observe.

Like this depends a little bit on how these different things interact. On Kafka, it's my fault complicated. There you generally have to separate this out from the beginning. You have to spend a lot of time thinking about how you're going to scale it up because Kafka doesn't have this kind of fair distribution of work that you have going on with Revit.

Because if I throw a thousand tasks into Revit, then they're going to be dished out to one of the other two workers as they become available. And in Kafka, I predetermine which worker is going to put which items. So if a worker doesn't make progress on any one of those items on a petition, none of the items that are in line afterwards are going to be processed either. So it depends very much on what to work with this.

Wow. So let's talk about how you do it at Sentry. Specifically, I'm curious, how many, do you even know how many events Sentry gets? Because I think about like, I write one incorrect console log or I have one error on my thing.

And if I have a thousand people visiting my website, it's sending many events to Sentry, right? Like how do you possibly handle that much traffic and data coming your way? So Sentry is interesting because if you look at an event in Sentry, there are actually different kinds of things that can be sent to us. There are errors, there are session replays, there's performance metrics, there's session data.

So they all are different. And so at any point in time, I think the load balancers for pure event ingestion, including all of these different kinds of things, handled around on a day, on a regular day peak, I think around 300,000 requests a second. A little bit more than that, I think. And this is why like, it looks really annoying because if you just wait a little bit, it's going to be a lot, right?

But not all of them are going to be immediate items that make it onto a queue. So as an example, every error typically makes it onto the queue, but not every error that makes it to our infrastructure system is kept. As an example, a customer is over their quota or didn't pay or anything like this. We don't want to send this event on voice because it will be pointless.

The way I will call this is we have a system in place where we extend our queue all the way to the client. So we write our own client SDKs and the client SDKs cooperate with the rest of the system to already implement backpressure management all the way to the client. So as an example, if I have a mobile app and I go viral with a mobile app, I might only pay $29.99 to a century or whatever our cheap price plan is. But that app is viral so it's installed on 2 million devices and I don't know, 0.1% of the crash.

That's going to be a lot of traffic to us and it will be more traffic to us than I'm probably willing to entertain for that small developer. And so what I actually do in that case is our injection system keeps track of what the quota is that every customer has. And if for a particular customer that is already over quota, I will communicate this all the way to the client in this case which will eventually stop sending until let's say 30 minutes in the future. So I can sort of communicate backpressure all the way to the client to make the pain stop.

Important rule number one is you kind of have to tell things to shut off. Because if I were to not have the system in place it would be way more than 300,000 per person. This is already after we lose a lot of events. In order for us to know how much we lose we actually count.

Every SDK when they don't send an event they count it up and they will periodically send us how much they didn't send so we can extrapolate for a customer what the true volume would be if we wouldn't be doing this kind of stuff. But once all of those events make it into the first point of the session which is a system called relay we split them up into different kinds of events. So errors will route differently two points in session metrics. And session metrics in that case they would be pre-aggregated on relay and then flushed out once in 10 seconds.

So we reduce the total amount of individual events that make it somewhere. In any one of those cases though once they go through all of the system they will end up on Kafka which is its own beast. But one of the things that Kafka is regular is basically goes to disk. So if we have for whatever reason an extended downtime between one of those systems we can buffer it effectively without limits until the disk goes and says like I'm full.

But we have a lot of space to keep data if something goes really wrong. Revit on the other hand we don't have that benefit because the way we operate Revit it doesn't scale for us anymore if we could have disk storage. So it has to work purely out of memory. And so we have to limit how many events can actually make it into Revit.

So there's another system in place later on that sort of tries to prevent that we put too many events into Revit so that Revit can operate properly and sort of kept in a reasonable state. And then all the events make it through a really elaborate processing pipeline where certain events go through immediately. Other events like mini dumps that might require downloading debug files that can be gigabytes in size. And so that means that for individual customer one crash comes in but the first time we want to handle that crash we might have to download two gigabytes worth of debug files.

And so then a worker has to pick that up do all this stuff hopefully keep the caches around. And so it gets quite elaborate because we don't know ahead of time how long an individual request is going to be. We can make some estimated some educated guesses that for instance a Python event will be quick because it doesn't require a lot of processing and we can make the educated guess that a C++ event will always be slow but the difference between cached and uncached on a lot of those events is multiple orders of magnitude. As a good example if you have a JavaScript event you can sort of think of it this way quite easily.

It's a spec trace and it's minified and it looks like garbage. So we have to find the source map for it. And in the worst case we have to go to the internet and download the source map because you have the minified JavaScript file sitting the internet so we fetch that and then the source map reference in it and it didn't upload it to us so we also have to go to your server and fetch it. And maybe the reason your website is crashing right now is because your server is overloaded so we come in and try to fetch even more from the server we try to get the source map and it's going to take 30 seconds.

That means that one event doing this keeps us busy for 30 seconds not doing anything valuable. That's very unpredictable for us. Once we have this stuff cached it will not take 30 seconds anymore it will take milliseconds. But the difference between milliseconds and 30 seconds is for this kind of system really, really annoying.

And so it makes it annoyingly hard to predict how it's going to be. Wow. With this much information coming in and out is there standard ways of visualizing and being able to see because any queue that I've worked on has been small enough where you can visualize it in the table. You can see here are the things that they're processing here's their status, whatever.

But with this many events you can't possibly do that. So what is an actual useful visualization? Yeah. The way you would ideally visualize and this is not how we do it but this is how we wish we could do it is a form of forward tracking where you could basically do this so basically the way you visualize this queue is basically just a bunch of numbers.

Time series over time it's like how many events do you have at certain points that you measured. And so ideally what you would be doing is you would sort of say like okay this is the time when the thing was first put into the queue like when I first saw it and it would say like okay I'm going to take I don't know timestamp module of 30 or something. So I have 30 buckets per hour like every two minutes I have a bucket and it would sort of track this and then I could in theory see 30 segments over time sort of making it through my system and see if one of the cohorts is doing worse than the other and then it kind of tracks what's my average latency is worth holding. There are ways in which you could sort of do that at scale quite nicely.

We definitely don't do that and part of the problem is we lose a lot of knowledge in the system. So I think the difference is that we do in Relay we do this pre-application so metrics that come in basically get flushed every 10 seconds into something. But we cannot really flush it every 10 seconds because it depends a little bit on the project. So let's say we collect your data and then after 10 seconds we want to flush this project all the metrics captured for this project on this particular Relay should be sent forward.

It could be that we cannot actually forward this project because we're waiting for the config of the project. Relay basically fetches for every customer's project the config to influence how it behaves. And so it might be that we have an error connection error between Relay and Sentry and then we cannot send this. And so we would have to in theory keep track of every single incoming metric of when it was sent to Relay and not just the 10 second bucket that we want to send to Relay.

But every time we sort of buffer things together we lose all this information how long something took because this package of stuff contains data where all this came from this minute and then you just came from this minute or something like this. But whenever we do the sort of batch processing where we read multiple items together we lose every information that was there from how old it is where it came from. So we have very, very questionable visibility into this really complex screen system. And with the power of hindsight and many things of building this I would probably spend more time in making it visualizable but it kind of is what it is and you have very cool tools only to deal with this.

Unbelievable. It just blows my mind how complex stuff can get when you start scaling. I can't imagine all the little edge cases. Like even just like I had a doctor's appointment that got moved and I got a text message about it being moved and then an hour later I got a text message for the old appointment as a reminder.

And I was like that's just one example of a, I don't even know if that's messaging cues but it's just one example of just an edge case that someone has not taken care of. I can't even imagine the type of stuff that goes, even just like sending an email notification I'm sure the logic behind that is. So I'll give you a really cool example you didn't pay enough. And the message really that we want to tell to the client is until the point in time comes where quota is available don't even try to hold on to those messages.

Throw them away. This is the contract that we have. It's like if the next 30 seconds you cannot send data after 30 seconds please don't send me 30 seconds old data. That's the contract.

That was an SDK that didn't uphold this contract. It was basically retrying. It said like okay I'm going to buffer these 30 seconds worth of data and then when I get quota I will send it. And the problem is this was a mobile app and so all of those devices were buffering everywhere and they were trying to send their ever-growing old event people coming in.

But we never saw this. The only way in which we figured out that this is happening is that on some of the largest customers the events that they saw on the dashboard were collectively older. Eventually everything was days old because we had all of those devices with the local buffers trying to send age-old event data. That was many years ago but this is why we have so much more data today on what these clients are doing because they are completely out of our control.

We deploy SDKs to them in one sense but we're hoping that the customers update those SDKs but they are an integral part of our curing system because they literally are EQ on the client and if they misbehave we can only guess what they mean to us but you can bring them in horrible states and then you're really screwed because they're just out there doing stuff to you. You can self-egos yourself if you do something stupid. You can't tell it to stop sending you stuff. That's actually crazy where a part of the service you have to just give it to somebody else and say run this for me and hope you got it right.

I guess that's how JavaScript works but at least JavaScript is a little bit more you refresh the page and you can at least update it. Moment is hard because it's multiple layers away from you. First of all, you can fix a bug in SDK and then you have to hope that your customer updates and puts it in their app and then you hope that everything your customer updates the app on the phone. to get the proof through the app market.

Yeah, so it takes a lot longer for this fix to go out. Oh, man. So the way we fixed this originally was we lied for this particular SDK version not to send the 429. We just lied and said, okay, actually, it went through just to drain out these distributed queues that we had everywhere.

Oh, that's wild. I had a question about Rust, and you mentioned that you're doing a lot of Rust right now. I was wondering, what kind of projects are things you're finding interesting in the Rust space right now and what type of work you're doing there? So at Century, Rust sort of came naturally, in a sense.

Like, it worked. Maybe it didn't work perfectly, but it was good enough to solve a particular problem. Our client-side CLI tool is written in Rust. It makes it very easy to distribute.

We have the core investment system written in Rust from performance reason. It's quite nice. Typing makes it nice. There's a bunch of reasons.

It's a pretty good choice, I would say. And then, historically, Rust at Century was in a space where we needed to do native crash reporting. And so dealing with binary data, dealing with native PDBs, dwarf files, source maps even, there Rust is just a really good language because Rust is written in Rust, and so there's a lot of tooling around compilers, compilers ecosystems. And the really only competitor in that space is C++, which we used earlier.

We used LLVM for this, and it's not nearly as nice to use. The developer experience of Rust is so much better than the developer experience of C++. And so I'm mostly paying attention to anything in that space, which is high throughput backend processing, data processing, that kind of stuff. I'm not paying too much attention to what Rust is doing in gaming or what it's doing.

And even WebAssembly is not, while I find it quite fascinating, I'm toying with it a lot, I'm not paying that much attention to it compared to say distributed tracing, queue, and services that kind of stuff. But I think what I find most interesting at the moment in the Rust ecosystem is that there's a growing set of reviving Python projects going on in the Rust system. So there's a library called Py03 and there's a project called Maturin, I think, Maturin, I don't know, Maturin, I have no idea how to pronounce it. But basically, these two things together let you write Python extension modules in Rust.

And that is actually quite interesting because Python is growing, I would say, mostly in the data science space or the data processing space and there you historically wrote a lot of stuff in C and C++, like Skiddy and NumPy all have a lot of C, C++ code in it. And now you can write Rust in it, which is a lot more fun. And so there's a growing number of tools in the Python ecosystem that are written in Rust and that I think is quite interesting. It lets you do a lot of interesting things where performance historically has no problem and where writing C++ wasn't that much fun.

And what do you think about Python in the AI space? It seems to be that almost all the AI stuff is written in Python. Do you think that will continue to be so? Or why is everything in AI written in Python?

I don't know why everything is written in Python, but there is definitely a lot of Python code in that space. And that's just the reality. And so I think it's probably mostly just an effect of where all of this stuff is coming from and that there were good libraries to do this kind of processing. And so it's a case of there was already stuff there.

And I feel like while there were some competitors for Python, and there still are, Julia and others didn't catch on quite as much as everybody was thinking. That's sort of my interpretation of what's going on. And so it's a fact that Python is used a lot in that space. I think that the words of Python are clearly not problematic enough to completely ruin that experience.

But I mean, I hate packaging Python. It makes me angry and I even started my own Python packager for that reason because the developer experience around Python is just really quiet from the last century in some sense. And I know that a lot of AI folks are also complaining about this. A lot of competing Python ecosystems now with Anaconda with the different kind of packager.

So some of the machine learning AI kind of stuff is definitely running into splitting into many small communities as a result of this. Let's hear it straight from you. What do you think about JavaScript and JavaScript on the server? Don't hold back on us.

I don't like to hate JavaScript, but I don't understand more than JavaScript anymore. And the problem with more than JavaScript is basically back in JavaScript is what I can tell is like your React server component is just, I'm so confused. It's so complex. I understand that it's supposed to make everything simple.

And I feel like if you're, there's a certain type of program where it really, really resonates. But if you've been writing applications in a certain way over multiple years, you have some sort of, there's certain things that you feel like, okay, as me writing this kind of application, that's the kind of problem that I keep in mind. Like, this is dual authentication. This is like how all the things, like there's, like you write things in a certain way because of best practices.

And then you look at React server components, which is, I guess, how you write back in JavaScript these days. And all of those things are just completely underexplored and unclear how to do it. And it's so complex and you try to integrate with this in sort of, in ways that are non-obvious. It's very complex.

I'm shocking you so. So like the hello world of React server components looks really nice, but I have no idea how it's going to work in practice. The growing complexity is just really, really high. And I think now there's also a little bit of churn in that space.

Like I definitely noticed that we have SDKs for Next.js and others. And there's a lot more breakage now than compared to historically this kind of stuff because of how quickly this ecosystem is growing. Yeah, I always wonder that because it does feel like there's updates all the time to your SDKs. And I wonder like just what type of workforce that takes to keep all that stuff up.

I mean, it's really exciting, but it's also shocking. I would love to see that at the end of the year as a roundup of the most broke frameworks in a century. Like I don't know if you can run stats on. I think we asked David something like this.

I would guess Next.js is like the one that has the most churn. Not necessarily because it's the most unstable, but it also has a lot of churn. And so like the box scale with utilization too, definitely, I think it's by far, it would be my guess. Should we get into supper club questions here, Scott?

Yep. All right, these are a set of questions we ask everybody who comes on the episode. First one is, what computer, mouse, and keyboard do you use? So I have a bunch of MacBooks.

I have a century-issued hardware, which is a 14-inch MicroPro and I have a 16-inch MicroPro. I also have a Windows computer, which is sort of a self-built thing. Keyboard, usually a Philco 10 kilos to the annoyance of my wife because it's very kicky-clacky. I think you can't make more kicky-clacky than it is.

And I actually mostly, unless I'm playing computer games, I'm going to use a Magic Touchpad thing from Apple as much. What about your text editor theme and font? Text editor is either Vim or now I use a little bit of Helix, but I have my own fork of it that's a little bit more like Vim or this is really code with Vim plugin. Font, I use MonoVisa because a friend of mine made it.

So I quite like the font. Theme, for Vim, I have a really old one I call Fruity, which I have for ages. And in VS Code, I think it's Night Owl or something. I don't know.

It's just pretty blue. Dark blue. Yeah, that's Sarah Drasner. We had her on the podcast.

Yeah, I think that's one. Night Owl is called. That's a really nice one. What about Terminal and Shell?

ZS-C-S-H. I think we call it this. ZS-H. ZS-H is in Canada.

And on NEC item 2, on Windows, the whatever Windows terminal thing is called these days, the new one. And I actually don't use Linux at all at the moment. But I used to just use MonoVisa. I'm surprised by that.

I definitely, the first question I was like, this guy definitely uses Linux, but no. Yeah, I don't know. There's too many other things to worry about, I guess. No, the problem is like, I mostly use my computer other than a little bit of open source hacking for sharing pictures and stuff with my wife.

And NEC is impossible to beat and NEC is also impossible to use. Like the Mac ecosystem of apps impossible to use on Linux. Yeah, totally. Yeah, totally.

Because you think, oh, it'd be great running the Linux system full time and the moment you're like, oh wait, everything that I actually use. I think the only sort of family-friendly ecosystem that actually works on Linux is Google stuff. And I've learned so hard so many times to Google things and no longer trust the company. Yeah, everybody's scared of it.

Yeah, topical too right now. I have a question here that's not on our list, but I'm very curious. So you have a GitHub that's one of those like walls of green where it's just constant. And obviously that's part of your job is that you have to commit code.

But do you have any advice for staying motivated on continuing both open source? I don't actually think that my, I don't think that my GitHub is particularly wall of green. Maybe this year again, but like historically it hasn't been. I'm very bad at, yeah, I think I'm very bad at keeping myself motivated.

Most of the greatest probably pull request reviews and not actually commit. I don't know. I'm very bad at staying motivated. Like I have probably many more projects that I started that I've never pushed anywhere.

I think I'm sort of embarrassed that I don't put projects on GitHub anymore unless I feel like they're going somewhere because I had a lot of time where I was just like dying pieces of uninteresting stuff on there. Yeah, I don't know. I find central problems really interesting still because it's like it has grown to be bigger problems. And they're sort of motivating out of principle.

Staying motivated for open source libraries is to be honest a function of adoption. And if you're going to get adoption for something or not doesn't correlate to how much you like the project. I have some projects that I would really like to work on but nobody else cares where do you go to stay up to date with stuff? I don't stay up to date with stuff.

I think it's a short answer. So I use Twitter and I use Reddit and I maintain a long mute list of topics on Twitter because it kind of like makes me not emotionally healthy to read about certain things there. But churn in certain ecosystems is really motivating. And a lot of modern open source projects are very pushy and it's like marketing-y and that kind of stuff.

And I just mute this out out of personal health and then I miss out on a lot of new things. So I don't know if that's a good answer but I really try not to in some ways. It's working for you. I have a pretty long mute list as well but I don't do it with tech topics because I'm anxious that I'm going to miss something.

It's more just the other stuff. Your job is to stay up to date with this kind of stuff. I can get a little bit of fun to that. I think there's specific creators and companies that I'll mute just because I know the vibe is too in-your-face marketing to me.

And it's like I'm not going to gain anything from that. But generally, yeah, my mute list is straight up topics I don't want to say. I have a theory of staying up to date because basically you get good at something but say it's server-side rendering it doesn't matter like a topic and it's going to get out of favor for eight years and in eight years it's going to go back to that So just write the wave Just wait for eight years. I like that.

That's great. All right. Last section we have here is sick picks and shameless plugs. Did you come prepared with a sick pick and shameless plug?

Yeah. So I guess the pick that I would make here is there's an Austrian fan called Builderbook. They just released a new song I think last Friday or something. And they kind of do like songs that are the kind of music they would listen to in the summer and I think they mostly release their albums around the summertime but they release the song again and I think the album is going to come next and it's like whenever the song is coming up I'm listening to the latest stuff they release.

It's really good. What kind of music is it? I have no idea. It's like I actually should Google this.

Art pop? I don't know. Honestly, I don't know that many bands that have that kind of sound so it's very hard to relate to anything. I just really like them.

They're not well known but have really I mean they also have half the lyrics are German so it doesn't really translate that well but it's more about the vibe the lyrics are pointless anyway. Yeah. Yeah. Oh Scott found it.

Oh yeah I found it. Hold on. We need to just say this to anyone listening. B-I-L-D-E-R-B-U-C-H.

Beautiful. And James Plug. James Plug, my package manager. It's called Python.

I have to write like the grain because we use Python a lot essentially and we use it in a specific way and all that infrastructure isn't there for my hobby projects. I kept maintaining on the site like just a way to make pip-suckless or getting Python binaries just on your system that doesn't involve Python and then I just run Python and I actually released it on GitHub for hours to look at but it's quite a bit older than that and got quite popular in the last month or so. And so if you want Python packaging to be fixed maybe look at the project chat at me and say if it's a good idea or not but it's reasonably fun to make it. That's great because I dipped into Python a couple weeks ago because I needed something that was a Python project and like it's wild that by default it just globally installs everything and sometimes there's a list of dependencies that you need but there's no like package is there like a package JSON equivalent in Python?

Yeah, there's now it's called it's called pyproject.toma but it's very, very light on I think the Python community doesn't like to it likes to have opinions but it doesn't have to very strongly so it's like a standard called pyproject.toma but rather than there being like a package JSON one tool that works with it Python now has 13 tools working with it and they're all in the world somewhere like this Hedge whatever like Hedge, PDM I realize that there's an enormous list of them and they all have slightly different interpretations of what you can do with this file format but technically it's there now Sounds familiar But it is wild it's really wild Python has a lot of opportunity to make a better developer experience I would say Sweet, I'm going to check that out because I was struggling with virtual embs a couple weeks ago and I finally just gave up and used the hosting service just trying to get the right version of Python and you had to run Python 3.7 and there's to alias it This is great because the way Rai works and this is why I built it is because I basically declare bankruptcy on anything so you install Rai and it manages Python for you in a way that you just say I want 3.7 and you get 3.7 That was always the hardest part of Python to me That's exactly what I need Try to emulate Rustup and Cargo for Python That's what I did Okay, definitely got to check this out because I have patience for figuring issues out with Node but when you're in Python this is not my space and then it's just like nothing works and stack traces everywhere It's so close to working now The whole ecosystem is converging onto standards and opinions and stuff but nothing wins Everybody has their own independent version of it and nothing is fully fully executed to being good I can't promise that this thing is going to eventually solve it but maybe it can We'll see Awesome, well thank you so much for coming on I appreciate all your time and insights into this You're welcome Alright, peace Head on over to syntax.fm for a full archive of all of our shows and don't forget to subscribe in your podcast player or drop a review if you like this show

No similar episodes found.

Kaizen Blueprint Aldo Chandra "Kaizen" is a Japanese term for continuous improvement. This podcast provides a blueprint to learn about health, wealth, relationships and everything else in between. Through our podcast, we strive to inspire, educate, and motivate our audience to cultivate a mindset of lifelong learning, productivity, and personal development. By sharing insights, strategies, and practical tips, we aim to guide listeners on their journey towards realizing their fullest potential, fostering success, and creating lasting positive change. Chewing the Fat with WorkForge WorkForge Bite-Sized Conversations for Building a Stronger Workforce Welcome to Chewing the Fat, a podcast delving deep into the world of food manufacturing. Dive into real conversations around critical topics like staffing, retention, onboarding, and career development in this essential industry. Subscribe now to gain insights from your peers, subject matter experts and more on the biggest issues facing food manufacturers today: -Hiring and retaining employees -Addressing the challenges of the Silver Tsunami -Improving time to productivity of new employees -Engaging employees from hire to retire And more... Tune in to Chewing the Fat, a WorkForge podcast, and join the conversation on how to build and sustain a resilient, high-performing workforce in food manufacturing. Darknet Discussions Darknet Discussions Welcome to "Darknet Discussions," the podcast that gets into the shadows of the internet to bring you the most intriguing, enlightening, and sometimes unsettling stories from the dark web. Hosted by seasoned darknet aficionados, each episode of "Darknet Discussions" explores the intricate dynamics of darknet markets, cybersecurity threats, and the digital underworld. Join us as we interview experts, discuss the latest trends in cybercrime, and shed light on the technologies that operate beneath the surface of everyday internet use. Also, we occasionally go off on a tangent about something completely unrelated. The Protocol CoinDesk Dive deep into the blockchain realm with The Protocol Podcast, where we unravel the intricate technologies powering cryptocurrencies like Bitcoin and Ethereum. Join us on a journey through the labyrinthine layers of blockchain innovation, as tech-savvy developers sculpt the future of finance and the decentralized web. Led by CoinDesk's adept journalists, we dissect the freshest news and project revelations, demystifying the mechanics and significance of it all for those hungry to grasp the inner workings of this dynamic and rapidly evolving industry.Meet your hosts: Brad Keoun, Sam Kessler, and Margaux Nijkerk…and tune in, techies!

Frequently Asked Questions

How long is this episode of Syntax - Tasty Web Development Treats?

This episode is 59 minutes long.

When was this Syntax - Tasty Web Development Treats episode published?

This episode was published on June 30, 2023.

What is this episode about?

In this supper club episode of Syntax, Wes and Scott talk with Armin Ronacher about his contributions to open source, queues and messaging in apps, scaling up a queue, and how it all works at Sentry. Show Notes 00:35 Welcome 01:49 Who is Armin...

Can I download this Syntax - Tasty Web Development Treats episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!