Welcome to the Change Log, Episode 0.5.5. I'm Adam Stavigak. And I'm Win Another One. This is the Change Log, we're talking about what's fresh and new in the world of open source.
If you find us on iTunes, we're also on the web at the ChangeLog.com. We're also on GitHub. Head to GitHub.com slash explore. You'll find some training repos, some feature repos from the blog, as well as our audio podcast.
And if you're on Twitter, follow Change Log show and me, Adam Stack. And I'm think Win P-E-N-G-W-Y-N-N. This episode is sponsored by GitHub Jobs. Head to the ChangeLog.com slash Jobs to get started.
If you'd like us to feature your job on these shows, like advertise on the ChangeLog, we'll post your job. And we'll take care of the rest. MOBY's looking for an iOS Android Windows mobile app developer. MOBY's backed by Marc Andreessen's Ning.
And they're looking for someone that is familiar with a mobile platform, preferably Java C++ experience. BSR, MS, and computer science is a plus. Drenstin, full-time in Palo Alto. Apply at lg.gd slash 9L.
Python is a big demand over at Urban Mapping. So they're the developer's core team of Map Fluence. They're hosted at Mapping Analytics platform. Looking for also a bachelor's of science computer science expert at Python and Django and RESTful web services.
Also, big plus if you know MapReduce, PIG, Cascading, Hadoop, there it is, all sorts of no-seal stuff. If you're interested, lg.gd slash 9E. Fun episode this week. Talk to Ilya Gregork over at PostRank.
Got the scoop on Goliath. They're invented non-blocking asynchronous Ruby framework built on top of the machine, which is really, really cool. That's a mouthful. It is a mouthful.
I got the scoop on why our PostRank numbers don't show any interaction with our feed. So it might be some things we can fix to fix up our Tumblr feed so that we can see who's interacting with our content. All 12 of you. We had a couple of design episodes.
But I have to comment on their design. Their design is phenomenal. PostRank, yeah, we got into that. Ilya said he started with a Photoshop background.
He was a designer first and got into development out of necessity and made a career out of it. He's a founder of PostRank. They do some really, really cool things around social media analytics and things, and some really high volume throughput. And they do it all in Ruby, who says Rails can't scale.
That's right. Who says that stuff? That's some other podcast. That's far, some other podcast.
Yeah. Well, we have to promote this week. Me? Me?
You? Oh, Redder RubyConf. Oh, yes, a little birdie, though. There's a special bare bones package that just want to sell a day, $109 bucks.
Get you into the conference if you don't eat anything. There you go. And we're also ordering another packet of stickers. So stay tuned to that as well.
Cool. If you are at Cutcuff this weekend, catch, I believe, Kenneth and Steve are going to be out there. And if you are at Redder RubyConf, as we mentioned, like I said, we'll be doing a special live episode on the 21st, looking forward to that. And stay tuned to some other great stuff this summer.
Cool. If I'm going to get to it, let's do it. Chatting today with Ilya Gregorik from PostRank. So Ilya, why don't you introduce yourself a little bit about your role at PostRank?
Sure. So I'm the founder, CTO, I guess, of PostRank. We're a fairly small company. I started up about 15 people at this point up in Waterloo, Canada.
And we're aggregating quite a bit of data from the social web, ended up building a framework called Goliath to do a lot of our API serving. So here we are today. I think your name in Ruby circles has become almost anonymous with performance and high performance Ruby scaling and things of that sort. So what's your journey to performance been like with Ruby and web frameworks?
Well, that's an interesting and loaded question. And as far as Ruby and performance and, you know, so I think a lot of that work, especially stuff that you read on my blog, has come around by necessity more so than anything. It certainly wasn't a motivated or coordinated move towards that. It's just when we started PostRank, our focus has been around aggregating lots and lots of data.
So today I guess is often called big data, archiving it, and then processing it for a variety of internal use in all sort of clients. And it just so happens that Ruby was kind of my favorite language at the time, so we chose it as the primary platform. And throughout that whole experience, we basically try to figure out, you know, how do we make use of Ruby? Because we were using it on a front end for stuff like Rails and everything else.
And we love the productivity that it enabled us to have in terms of developing new products and just iterating very fast, being able to reliably test and quickly test all this, all this stuff, you know, unit testing, integration, testing and all the rest. And we wanted to propagate all of that experience throughout our entire infrastructure. So that led to lots of interesting kind of optimization work in terms of, we needed to build fast crawlers to collect that data. So how do you do that with Ruby?
And that, frankly, that's what got me started. We made ways down this whole path of web servers and clients and all the rest. And then extending that to, okay, well, we downloaded this data now. We need to push it through five or six stages of processing.
So let's say you downloaded an RSS feed, which is something that smells like XML. It's not quite RSS. It's malformed XML at that point. Let's transform into something like JSON, which is something that we can actually work with.
And then let's run it through language analysis and all of these different steps. So just trying to coordinate all of those steps and how do you do that? What is the architecture that makes sense? What is the right choice of language or library for all of those things?
So long story short, I think almost everything you'll find, for example, on my blog is directly correlated to what we've been doing or at some point researching or trying to improve within our infrastructure. And that's, quite frankly, been more by necessity than any specific reason for it. Okay, I need to optimize the specific step of the infrastructure. You know, your blog, ekevita.com's been a great resource for me learning different tools in the Ruby stack.
And a set of those has been no SQL options. I think you've played with every one of them out there. Do you have a favorite? I do and I don't.
There's ones that we use and there's ones that we don't. As everybody else, I think at this point, quite fascinated with everything that's going on in the space. That's definitely been a bit of an explosion. And just trying to dig in beyond just a feature list, right?
And trying to really understand what's going on, what's the data schema? How does it actually affect how you are? Because ultimately, I think a lot of these solutions come down to, you really need to put a lot of thought up front in terms of what you're designing for or what you're optimizing for, because frankly, MySQL is probably the right answer. And I have sent up the use case as a sale for most people.
And as developers, we may not like that because it's not the shiny new thing, but usually that's when you align the business goals with what you're actually should be doing, that's usually the right solution. But having said that, we've had post-shrink specifically, we've deployed, oh, let's see. So we definitely have a lot of MySQL. We're running a fairly large, scaling up a fairly large Cassandra class throughout this point in time.
We're logging about 50 or 60 gigs of data into it every day today. We have MongoDB for some highly unstructured data and it's great for that. We have Redis for some of the data structure stuff. Definitely have Memcache.
So it's a mixed bag of tools. And I think you need to pick the right tools for the right job. It's not just a matter of having a favorite. You just need to know what each tool is good for.
Let's switch over and talk about Goliath, a new project that runs on top of that machine. So how did this project come about? Yeah, so Goliath is definitely not new from our perspective. And the background on this guy is we actually started work on, I guess, the first version of Goliath back in, oh boy, early 2008.
So this has actually been something, a framework that we've been using and iterating on for a while. And what we released recently is technically the version four of our internal API stack. And back when we started in 2008, one of the first things that we realized was the ecosystem around Ruby Web Service wasn't that great. If, I believe, effectively kind of a de facto deployment target.
And we wanted something that wouldn't lock us into the thread model. We wanted something that would give us higher concurrency. And we started looking around at the available alternatives. Thin was just coming around.
It wasn't, I wouldn't even call it in production, ready, mode at that point. And if you remember that guy, which later evolved into Node.js, of course, made some rounds. But none of the solutions were really there in terms of providing a full stack for testing, development, or even a sensible DSL at that point. They were all pretty raw.
So given all of that, we effectively started our own project around it. And the first version of Goliath started as just one file. It was very simple. It was fast.
It served just our needs and nothing else, as most projects start. And then over time, we've started iterating and made a lot of different mistakes along the way, hence the version 4 by the end. We had a mixed model where it was first fully evented. Then we went to a mix of threads and events, which was it worked.
But it was lots of lessons learned there. We did a complete rewrite with version 3, which is completely evented, didn't like where it actually ended up, and then ended up with version 4, which is the most recent one, which is the one we open sourced. Today, I'm going to call Goliath the 85, maybe approaching, the 90% solution. It's very simple to write a Hello World app from scratch.
That's very fast. That runs in a raw TCP socket and serves, I don't know, some insane amount of requests per second. It's fairly hard to get to an 80% solution. You really need to start to put some thought around how you handle all the edge cases in each space.
You handle all the how do you develop a good DSL around and all the rest. And then getting to 90% and 100% is very hard. That takes literally years. And I think Goliath is kind of getting to that point, even though it's new in terms of being as an open source project.
It's definitely been something that we've worked on and spent a lot of time working on for the past couple of years. So I score Goliath as a non-blocking framework. How much of a barrier to entry is that for the average Ruby SD think? Well, that's an interesting question.
I'm not sure that it's much more of a barrier than any other framework, because what we tried to do with Goliath is actually to simplify or hide almost the fact that it's completely asynchronous under the hood. So, of course, the first thing that you should think about when you hear asynchronous is what does that mean for the programming style? Usually when you think about asynchronous, you end up having to define callbacks and functions which fire at some later time when the event complete. So, Node.js is something that you guys have discussed at length on this show before, and that's definitely a great example of that.
With Goliath, we actually tried to take advantage of some of the features that Ruby 1.9 exposes to hide some of that complexity. And maybe I should step back here and say that the version three that we wrote internally for Goliath was actually completely asynchronous, and it was very much the same flavor as Node.js with all the libraries, the capitals in Ruby. And what we found, though, was after we ran with that for about six months, we found that the APIs that we were building were getting complicated enough such that the testing and the maintenance of them was becoming very, very expensive for us. The code became complex, but it was very hard to maintain in an ongoing basis.
So, we took a step back and said, look, this is not gonna scale, how do we solve this problem? And we started looking around and realized that Ruby 1.9 was this really nice feature called fibers, which are continuations. And if we were to do some extra work under the hood within the actual library, we could actually hide a lot of the complexity of these callbacks. So, we can, on behalf of the developer, effectively, instead of having to define a callback, we could do it for you and then make it look as if you have a completely synchronous API.
So, at the end of the day, when you look at a Goliath, when you look at the code that you write for a Goliath API, it looks completely synchronous. So, you could, in fact, take your Rails code and pretty much copy it over and not worry about having to define extra functions, callbacks, and all the rest. You have very logical flow, if else, you don't have to worry about callbacks and all this kind of stuff. So, our goal is to actually simplify.
It's actually that you don't have to think about it. And I think we succeeded at that because, you know, for new guys that start with us at post-rank, we just give them the framework. And they're pretty much oblivious to the fact that it's underneath is running on this asynchronous core. The only thing they have to pay attention to is, of course, the fact that they're using the right libraries.
So, they're not using a blocking library. So, let's talk about that for a moment. That's gonna be my next question. What's the Ruby landscape look like for non-blocking libraries?
It's pretty good. It's growing as it's, compared to Node.js, which is like non-blocking, you know, by default, right? And so, the whole ecosystem group around it has been non-blocking. So, Ruby, are we getting there?
Or is it still a lot of work to be done to take advantage of this style of programming? To be honest, I'm not sure how to answer that exactly, because I think, so, I think the most prevalently used framework within Ruby for doing this kind of programming is event machine. And event machine does have quite a bit of work and drivers that have been built around it for all of your common suspects. So, anything from MCASH to MySQL to Cassandra to everything else, HTTP clients and so forth.
So, as far as getting good coverage in terms of your most common apps, I think it's all there. And I think most of the clients are in good functioning state, and I haven't had so many problems with that. Now, it's interesting that you compare that to Node.js because, intentionally or not, I think when Ryan picked JavaScript, right, he basically made a break with everything. He basically said, look, we're gonna have to write completely new drivers for just about everything.
And there's been a lot of work that's been done in that space now. And I think now, if you're just starting with Node today, you already have a pretty good ecosystem of drivers, or virtually all of the major components that you would need. But in the process of doing so, because he completely broke away from any other language, he basically forced the user to always make the right choice, in some sense, because you can't really make a mistake of picking the wrong driver. Whereas in Ruby, if you're developing Ruby, he has to be very conscious of what it is that you're doing, because he could pull in some driver that all of a sudden is doing the wrong thing, and your performance goes up the door.
So I think both are comparable. There's obviously a reason why we chose to stick with developing Goliath. And fundamentally, I think there's no reason to break apart from the Ruby language, and force yourself down the JavaScript path. And I should say, I love JavaScript, there's nothing wrong with it, it's a great language.
But I just enjoy Ruby so much more. And the type of code that you can write with stuff like fibers and all the rest is to me much more readable and maintainable. And hence, hence are development and all of the work around Goliath. And the fact that we can reuse components like RSpec, Cucumber, and all the rest, the driver tests, and we have access to all of the Ruby standard library.
It's a double-edged sword, right? On one hand, you break apart from bad gems and libraries, which are blocking where they shouldn't be. But at the same time, you do have the full capability and library of all of the Ruby gems. So you just have to be a little bit more careful.
Speaking of the Ruby library and the standard library and the ecosystem of Ruby gems around it, as a community, how do you think we're adapting to the move to one nine? I'm actually really pleased to see that a lot more people are migrating. Just a couple of days ago, I saw some announcements from the Rails core saying that the next version of Rails will require Ruby one nine. So it's no longer a suggested option.
It's a required option. And I think that's obviously big news. And I think overall, even though it seems like it took a little bit longer than it should have to start moving the community to one nine, there seems to be a fairly big shift that has happened. I'm gonna say in the last six to eight months where more and more people are adopting one nine is their default platform.
And I think there's many different reasons for that. Some of it is just availability of better tooling around this, like RVM and everything else that just make it much, much easier to both develop and deploy against multiple run times. And then just the fact that more and more gem authors are paying attention to one nine now. So I've been running on one nine as my primary platform for almost a year and a half or two years at this point, I developed all my gems on one nine.
I only switched back to one eight to run the spec test. And I think that's becoming the default now. So I'm happy to say that we're getting there. So to read me for Goliath, you mentioned performance numbers on MRI, J Ruby and Ruby as how important was it to you to publish those and support Goliath on multiple Ruby stack?
So I think this is one area that I'd love to explore in the future with Goliath. So initially we developed Goliath to run on one nine MRI specifically, so the C Ruby. And we had a couple of dependencies in there which were specifically C extensions. So for example, Slin can only run on MRI because it uses the Mongrel parser and some C code under the hood.
And of course the Met machine itself is C++, C++ core. But Event Machine also has a Java version. So when we were developing Goliath, we tried to find the remove any bottlenecks that would not allow us to run on multiple runtimes. So we wanted to be able to run on J Ruby.
And part of the reason for that is MRI has a global interpreter lock. And you know, you're basically stuck to a single core which is the same story for Node.js and virtually other event and servers out there. But if you could imagine running Goliath on let's say J Ruby which doesn't have a global interpreter lock, then in theory, nothing stops us from spinning up a bunch of operating system or OS threads and running multiple reactors within the same process. And that of course opens up a lot of interesting opportunities for simplifying the deployment and doing all this kind of stuff.
So to be honest, it was when we're removing these bottlenecks, we're looking a little bit more to the future. So with the hope that as these alternative runtimes and I know many people wouldn't consider or would consider J Ruby to be their primary run time, not an alternative run time, as these systems develop, we can take advantage of the performance that they can offer us with Goliath. And for example, J Ruby is a very interesting one that I'm looking forward to investigating the future because at the moment, fibers which we depend on fairly heavily in Goliath are pretty slow in J Ruby. They are mapped directly to operating system levels threads.
So expensive to spin up and maintain. But there is some patches of work in J Ruby that should change that dramatically to the tune of making it even faster than kind of the lightweight processes that we have currently on MRI. And when that happens, it could well be the case that Goliath will run just several times faster on J Ruby than doesn't MRI. And I think that's a great story that we don't have to log ourselves to a specific runtime.
So you mentioned the readme suggesting that you stand this up behind an HA proxy or an NGINX equivalent, what do you guys run? Primarily HA proxy, that's kind of our primary weapon of choice, so we do have some NGINX processes deployed. The reason we prefer HA proxy is because it allows us to have much more control over the load balancing and all the other parameters. So more intelligence failover and all the rest.
And when we need additional features that NGINX can expose, for example, do J's of compression for us or something else, then we deploy it as needed. Talk a bit if you would, how you're using an approach rank? Goliath? Yes.
So Goliath, we have deployed for a number of different applications. One of the choices that we made very early on in terms of architecture was to build a lot of our own infrastructure within post-rank around the idea of web services. So instead of specifying or using some sort of an RPC mechanism, let's just use HTTP as a primary source. So everything should talk over JSON and over HTTP.
So we rely on a lot of very high-performance endpoints within our system, which are serving hundreds of requests a second for our own internal use and for our clients. So we share the same endpoints. So to do that, obviously, we need to something that is able to handle the concurrency and also to be able to handle features like HTTP pipelining, keep alive, to minimize the overhead. So internal services for request-response-style requests, we have streaming APIs.
So for example, if you've ever worked with the Twitter search API, you open a connection just feeds you data, JSON data. We have some of those deployed as well. So we're streaming data over Goliath. Goliath is also capable of doing streaming uploads, which is something that we added fairly recently, such that, for example, if a client is pushing you a, I don't know, let's say, a 5 megabyte image and you want to store that into S3, you don't have to buffer that in memory, which is what most web servers do today, at least in the Ruby space.
And then they give you the whole image and then you can push in to S3. Goliath actually allows you to progressively load that and push it directly to S3. So those will be the primary use cases. But between the keep alive support, pipelining, and the streaming APIs, we easily push tens of gigabytes of data through that stack every day.
So it's sort of a client library you're using and soon you're doing some sort of parallel network transport for each of these. So what's your basic favorite transport library? So a lot of the lecture, this is actually what you're asking, but a lot of the messaging and communication that we do in terms of coordinating web services within PostRank is done over in MQP. So for example, some of the HTTP streaming web services that we have, they quite literally act as direct front ends to MQP queues, where we would connect to some endpoints after all the data has been processed and just stream that data to our clients.
OK. So all of your HTTP transport is then just a long persistent connection streaming to sort of API? Right, yep. So PostRank, for those that don't know, is a way to show, among other things, a way to show what's popular on your particular blog.
We're dying to use this on the change log, but until we get off Tumblr, we can't. We've hit a snag. So PostRank uses the URLs that are in your feed to determine, I guess, what sort of participation your audience is having with your content by matching it to what's marked in delicious and other social. Then use, but Tumblr does not include the slug on the post items.
Right? So they have the energy at the end. So none of our content matches, so every day I get emails saying that my PostRank content is so sad, because nobody's marking our stuff. Well, we can probably fix that.
Actually, so the crazy thing that we do at PostRank is, as you mentioned, we aggregate this what we call engagement activity, which is effectively, anytime somebody shares or does something around a piece of content on the web, we want to know about it. So we aggregate, for example, every tweet that contains a URL, or every vote from Dig or Reddit or Hacker News and all these other sites, and every comment from all these sites as well. So one way to picture what we're doing is we're trying to assemble a fire hose of all the different fire hoses of the activities around all this content. And we don't collect that data for specific URLs that we care about.
We collect that data for all of the URLs. So as you can imagine, that's quite a bit of data. So even though the plugin that you're referring to, which is the top post widget that we have, is not picking up the right URL, we have all the tweets and everything else for content around the change lock show. So you can actually use their API and just send that all the URLs that you guys have created.
And you can get the actual metrics. Or you can actually get the full conversation as well. This is something that I alluded to earlier, where we're pushing a lot of data into Cassandra. That's what we're using it for.
We launched this project four or five months ago, where every activity that we collect, so for example, if somebody today shares a tweet with a link to the change lock, one of the change lock episodes, we'll actually store the content of that tweet and all the associated metadata about it and then allow you to look it up on the URL basis. So you can actually say, well, I have this URL, show me all the activity. So there's people bookmarking it on delicious, there's tweets, there's hacker news comments and all the rest and you can see that as just one stream. Now I've seen you guys hire from time to time to switch topics for a moment.
What would you tell the job candidate that was looking to go on it at post-strength or that maybe new to the Ruby community or new to even open source development? What, as an employer, do you look for an developer? Well, let's see. A GitHub account, that's always a good place to start and a blog, right?
At the end of the day, and I've interviewed a lot of students, specifically, so we're located in Waterloo, in Canada. And Waterloo has a fairly well-known computer science program, the University of Waterloo. So we interview a lot of co-op students for basically every semester, we have at least a couple. And honestly, one thing that always surprises me is I go through a pile of resumes, 50 to 100 each time, is the fact that out of those 50 or 100, they're all bright computer science students, very smart guys, usually guys, we're good at forwards.
Very few of them actually have something that they're passionate about. Very few of them have a blog or something that they've written or contributed to. Very few of them have a GitHub account. So frankly, my first pass over that stack of resumes is always just to look for, do you have a blog and do you have a GitHub account?
And usually, there's at least three or five that match. And I immediately put them to the side and I know that I'm gonna interview them, even without considering we're looking at the marks, because they're already showing something that most people don't. But overall, I think the best people that we've hired, they've all had a consistent streak of having projects that they're passionate about, that they contributed to, and having a history of open source contribution. So how did you come to Ruby and what language background did you come from?
I think as many people I started with, PHP and Perl, I actually, I was never much of a computer geek, if you will. I got them to web developments through web design. I was one of the Photoshop Wranglers for a while. And it effectively got into the whole programming world by learning HTML and then learning that my clients wanted more dynamic sites.
So I started doing PHP and then Perl and then before I knew it, I was in computer science. And then before I knew it, I was doing Ruby. So it's kind of a not path. You know, it's very similar to my own path.
And I tell folks that I feel like Merlin and living my life backwards started out on the front and keep going deeper into the sack and just trying to deliver on things that are in my head. And I think your blog just oozes that design. What sort of commonality do you see between design as a communication medium and programming as a communication medium? I think they're one and the same in many ways.
To me, presentation is at least 50% of the actual deliverable product, whatever that product may be. And depending on the context, that could be a nice packaging around your product. It could be a nice DSL project that you built. It could be a well-structured readme, right?
The ability to actually communicate something to another person that's kind of, I think is the most important aspect that you really have to pay attention to. What is the most important aspect? Because ultimately the process of design is more about subtraction than adding stuff. So you really need to be clear about what it is that you're trying to communicate, whatever it is that you're working on.
You open source project or a new design template. Do you have a programming hero? A programming hero. Honestly, there's probably too many.
Give us one. I don't say Linus. Give us one. I think one person that impressed me early on was Brad Fitzpatrick, so Life Journal, Memcache team, and all the rest.
And I can even say specifically why, but I remember reading some interviews very early on about just how he started Life Journal and the work that they were doing around Memcache, ProBall, and all the other projects that came out so that a lot of us don't even think about today, but run a lot of our infrastructure on. And how it was for him was always about just solving his own problem. He never started with some grandiose vision of, I need to build a really fast memory cache server. It's just I have the specific problem at my company.
I started this project on the whim because my friends said I should, and here I am just slagging it out. Are we in a golden age of web development and perhaps just don't know it? Golden age of web development. Has there been a better time to be a bit pusher on the web?
I think it's getting better and better, right? So when I think about the skill set that you have, I think it's incredibly valuable skill set as a web developer and I think it's only gonna get more and more important, especially with the spread of technologies like HTML5 and everything else. When I think about one area that I haven't done much work on and I really want to kind of get into is mobile and just based on my own observations and kind of research around that area, it seems like more and more larger organizations that have spent a lot of time and effort developing custom apps for each platform are now migrating to HTML5. Facebook is a great example.
Twitter, all these guys are converting their mobile clients to HTML5. And when you think of HTML5, of course, you're doing CSS JavaScript and all the rest. So I think it's only gonna get more and more important in some ways it's gonna get more complicated, but it's also gonna get more interesting as well. You know, every time I go to your side, I say the tagline of goal is a dream of the deadline and you're one of the most productive developers that I follow.
Are you goal-oriented? Definitely, yes. So how do you manage that workflow? Well, let's see.
I don't know if you've used the app. Oh, yeah. But I live and die by that thing. I don't think it's specific about remember the milk.
Sort of just, it's a great app built with it. It's very clean. It knows its purpose. It doesn't get in the way.
But I definitely love my checklist. Are you a GTD guy or you have your own workflow inside there? I am definitely familiar with all the GTD stuff. Over time, I think I realized that it's not the process, right?
I think a lot of people spend a lot of time focusing on how to improve your process inside of actually doing stuff. So I can't say I'm a die-hard GTD person, but I definitely followed my inbox to your rules and make sure that I review my goals for the day or for the weekend and so on and so forth. You know, if there's any advice I could give to my college age, something self, it would be that a little effort every day will always outshine these big bursts of productivity. What are some of the habits that you have that you think have made you more productive as a developer?
Well, I think it's exactly what you said. It's the small little thing, so that I'd up over time. I don't remember the exact quote, but the general message is we tend to overestimate what we can get done in a day and underestimate what we can get done in a week or a month. So it's not about doing heroic things on any given day as much as it's just having a clean path towards what's next thing I need to do to move this thing along.
So a couple of closing questions. Are you a VIM, a textmate, Emacs, or a beanie edit guy? So I don't have any religious allegiances to any of one of the editors. I do spend probably 50% of my time in VIM and textmates.
So I switch between the two quite a bit. This is where I outsource a lot of my discovery to my guests. So what one project do we need to post on the change log that we haven't covered yet? One project.
Does it count if I don't give you a project but instead of a technology? Sure. So I've been digging into Speedy. And I don't know if you've paid attention to this but about a year ago or so, Google released this project, or it's a study that they did around a new protocol that they were trying to define called Speedy.
And their goal was to see, how can we speed up the performance of loading web pages? The common web pages that we all visit, yahoo.com, MSN.com, or even Google.com by over 50%. And they took a low level approach and said, well, of course there's this Travis for customization and compression and all the rest, but what we can do at protocol level? And they basically came up with a whole bunch of ideas around, well, HTTP's maybe not the ideal transfer.
When it was designed at the beginning, we didn't pay much attention to latency. And later, we've introduced functionality like HTTP pipelining and keep alive and all the rest, which frankly don't even work most of the time. So this is a little known fact, but HTTP pipelining is disabled in all browsers except Opera. And even Opera only uses it after some in very weird edge cases where it can actually do so.
And that's primarily because a lot of the servers don't support pipelining or when they claim to support it, they don't actually do it properly. And then of course, all the cache servers in between, which tend to break this kind of stuff. So it's not a great protocol at the end, it turns out. So Speedy is about re-doing a lot of that work and basically building a new protocol instead of HTTP throughput.
And so they did this stuff about a year ago, released some numbers and basically showed that, yes, we could, given some of these optimizations that propose, we can actually get over 60% improvement in latency for delivering these web pages. They posted some source code, the client knows available in Chromium. And after that, I didn't see much coverage around this at all. And just recently, a thread popped up where they basically said that if you're running Chrome and you're talking to Google Web Services, then 90% of the traffic is going over Speedy, right?
So if you're a web developer today, there's high likelihood that you actually are using Chrome. And if you're using a Google Web Service Chancellor, you're not running over TCP, you're running over speed, which is really, really interesting. That's amazing. Yeah, exactly.
And I guess Google can actually do that because they control their own servers and they control the browser. So they're able to make this sort of change. But of course, it's not a proprietary protocol. The spec is out there.
So can we make use of that for our own web services? I'd love to make post-rank web pages a little 50% faster without actually modifying any of our UI code or anything in that respect. I'd love to just replace the web server and make it talk Speedy and off we go. Has anything materialized as far as an Apache module or anything like that to make it a little bit more palatable for the actual average developer?
Yeah, so they actually released an Apache module. So if you're, I'm not sure how, I actually haven't tried it with something like, let's say, passenger. I wonder if we can make that work. But what I've been digging into myself is I've been trying to build an actual parser for Speedy in Ruby, in peer Ruby.
And this was more for kind of my own education. I find that the best way for me to learn is to actually try and build something because I can read this back and I kind of not along and I think I understand it. And then I start to record and I realize that I didn't get it at all. So I'm actually working on one right now.
And it's both very simple and very interesting and how they've made some of the decisions around how the packet exchange should be done, the fact that you can send multiple streams over the same TCP channel and they can be intermixed and all the rest. So definitely a project or technology to look into a lot of other developers I think because even though it's a fairly low level web server type technology, I think it's something that we shouldn't be paying attention to because it's a significant improvement. You know, we've had pretty much the same transport stack for years. I can remember, I guess 15 years ago or so maybe more, having to download and install a PPP stack for my operating system just to connect to the internet.
So, you know, maybe we're due for the next evolution on top of TCP for basic dial tone of the web. Yeah, absolutely. And in fact, as I'm working on implementing this parser for SPDY, the crazy thought that's scrolling through my head is so one of the core concepts behind SPDY is that same channel, same TCP channel, can transport multiple data streams at the same time. So that means when the packet arrives, it actually tells you that I belong to the specific stream.
So you can request, for example, two images and data can be fashioned in parallel. Which is not something that you can do with HTTP because HTTP forces a strict fight send requests and then you have to wait until you fetch the first full image. And then the server will start sending you the second image. With SPDY, you can actually intermix that data.
So if you have a slow resource, it doesn't block everybody else. So it can make a request to a slow dynamic resource but then fetch quick images in parallel. So you take that and then you take a look at technologies like zero MQ, right? And zero MQ is trying to do something similar but something more generic.
They're saying, okay, look, TCP is great but we need message-oriented messaging. We shouldn't have to worry about parsing out when the message ends. All messaging should be message-oriented. And it also would be done as fast as possible.
And you should have all these different transports. It shouldn't matter if you're sending it over to TCP, UDP or Unix fight. So I think if you think about what SPDY is doing, zero MQ is doing, there's a really interesting opportunity there. It's a connected to and build something very interesting.
You could build a web server that is completely message oriented and you wouldn't need an HA proxy or an Nginx or anything else in between. You could just bring up a Ruby process. It would know where to connect. It would know how to parse that message without having to implement an entire parser and see just to parse out the boundaries of the message and respond quickly without having to register with anybody or say that I'm up or down.
Definitely exciting stuff. We learned about zero MQ on the Z shell interview. That was the first time we'd heard of it and got a quick look at it there. We needed somebody from the Chromium project to talk about SPDY, which when I first saw it, I guess what it first came out last year, some time I thought it was SPDY for those that are listening at home and don't have access to the show notes.
It's pronounced SPDY right here on the executive summary. Well, yeah, thanks for joining us. It's definitely been fascinating to talk about Goliath and this non-blocking async style of programming and some other things. Great, thanks a lot.