Hey, how's it going? This is Craig Cannon, and you're listening to Y Combinators Podcast. Today's episode is with Joao Battaglia and Luis Battaglia, co-founders of Fermat's Library. Fermat's Library is a platform for annotating papers.
Each week, they send out a paper annotated by their community. Some recent ones include Birds and Frogs by Freeman Dyson and von Neumann's First Computer Program by Donald Knuth. They've also built a Chrome extension for the archive called Librarian, which allows you to get direct links to references, do bibtext extraction, and make comments on papers. You can find them at Fermat'sLibrary.com.
All right, here we go. You guys are brothers, right? Yeah, we got it. Yeah, okay.
He's the only one. I'm two years younger. Okay, and what made you want to start Fermat's Library? So just for the people that don't know what it is, Fermat is a platform for annotating papers, and so if you want to think about it, you imagine a PDF view in your browser, and then you have annotations on the side that support LaTeX and Markdown, and so you can add annotations in parts of papers that you think are particularly tough to understand or you think you could add more content there.
But so it's something that we've done. The four of us that started Fermat, we all have a technical background, and so after college, we kept on reading papers, and every once in a while, we had this internal journal club where we'd read a paper and present it to the others. So I remember, for instance, presenting a few years back, presenting the Bitcoin paper to Louise and Mika, which don't have a CS background, and so you kind of have to go into, for instance, for the Bitcoin, you might have to go into, okay, what's a hash function? What's a public encryption?
So we were already doing this, and we knew that you also have this behavior offline in places like universities, and so we wanted to take that experience and bring it online. We thought there was a lot of content that you end up producing while you're trying to read a paper, which can be the most dense piece of content that the human can read sometimes, right? The language can be incredibly spartan, and sometimes there's a step in some paper that they say, oh, this should be obvious, but then you look at it and say, okay, I don't get it. And so we knew that there was a lot of content there that you end up producing while trying to understand a paper, and we wanted to bring that online.
At least you were in physics before. I studied physics together with Mika, and Joao and Timer went to MIT. Timer studied economics, and he studied CS. So a lot of the papers are around physics, math, economics, biology, CS, right?
Yeah, because you kind of like solved the cold start by just annotating yourself, right? And now it's more about getting the author in there. Exactly. That was kind of the growth act.
Our first paper was the Bitcoin paper. Yep. Still the most commented, right? Yeah, that one is a good number of comments.
It's been there for the longest, and it was quoted, or just there are a bunch of news sites that are pointed back to it. Oh, okay. It's like, okay, if you want to read it, go to the annotated version. But we had a few cool people come up there.
Yeah, Lawrence Lassig called it on the Bitcoin paper. A bunch of people from the Bitcoin community. But the larger goal with Vermont is to try to move things in the right direction, meaning move science towards what people call open science. And so that encompasses a number of things from open data, which means just sharing the data that you've used for publishing, or whatever research you might be publishing, and you want to share that and make that easily accessible to people so that if they want to replicate the results that you got or use it in their own research, they have an easy time doing that.
So that's open data. You also have just publishing the code that you've used or the algorithms that you've used, making those more easily available to people. There's also open publishing, which means just publishing in papers that are not behind, or in journals that are not behind paywalls. So there's a lot of things that are within open science, all of those.
And then there's also, so we want to push things in that direction and also try to build a platform that makes it easier for people to collaborate. And we think that there are a lot of things that could be happening nowadays where people could be collaborating, scientists could be collaborating remotely a lot more than they are, or that's at least the way we think. But it's starting to change where we've had the paper there. I think this is actually a trend where we're seeing more and more people collaborating online around papers.
So, for instance, there's this famous example around a problem called the Erdos discrepancy. And this problem is a famous problem that was posed by Paul Erdos, which is like this famous mathematician 80 years ago. And Terence Thau, the field's medalist, was trying to solve the problem. And he put it on his blog that he was trying a certain approach to solve the problem.
And then there was this guy from Germany that just wrote a comment there, like the size of a tweet. And he said that the Erdos problem had a Sudoku-like flavor. And that some of the machinery that they were using to solve the Sudoku problem could be used there. And that was actually the key to correct the problem.
And they ended up publishing a solution to the Erdos discrepancy problem, which was probably one of the biggest milestones in number theory in 2016. And that was all thanks to a comment on his blog and to the fact that they were collaborating online around solving that problem, which was also a polymath problem. The polymath project was a project started by these other fields' medalists called Tinkowers. And they were trying to – it was actually a social experiment to see if it was possible to solve math problems online and collaborating around math problems online.
And they were able to solve it thanks to that comment. Because you kind of see, right, you look at GitHub, and then you think of the impact that GitHub has had for open source. Open source, of course, existed much before GitHub, but it has really allowed a lot more people to come in and be able to get into open source and start contributing. And there are a number of other really interesting platforms.
You have Wikipedia, just for more general knowledge, or you have Stack Overflow, which is programmers helping each other. And we think that there could be something similar to that, but for science in general. Right, because did you listen to the Rogan with Peter Tia? No, parts of it.
Nico listened to that. Yeah, that was a really good one. And he talks about – I don't know if they're talking about the archive in particular around publishing papers, but he talks about having full-time staff just scrubbing the data looking for interesting information coming out. And again, in the context of Stack Overflow, that's the place where programmers find specific answers to problems.
Whereas with the archive, good luck. Good luck finding that stuff. And so have you guys thought about addressing discoverability in the context of particular fields? It's a really tough problem.
For instance, paper recommendations, it's really hard to – Because you're just doing one week right now, in addition to the browser extension. And we also have our tool that is used internally at universities and research groups for people that are reading papers together, and they add annotations. But for now, we have the weekly journal, so we release a paper every week that we select, and we annotate it, or somebody in the community annotates it. And then we have the archive extension that adds a bunch of features on top of archive, like bit-deck extraction, reference extraction, and comments.
And eventually, definitely, like recommendation engine, and making it easier to discover papers that are relevant to you, that's something we definitely want to add onto our archive extension. But it's a tough problem. It is. Initially, we started Fermat as a journal club.
And then we saw that people liked the interface, the commenting interface, and liked reading the annotations. So now we are starting to expand and turn Fermat into more of a platform. And that's why we decided to do the archive Chrome extension. Because archive, for people that don't know what it is, it's basically a place where papers leave before they go to journals in the form of preprints.
So they're like drafts before they go to journals. And what we did is we built a Chrome extension that basically allows people to see all the commenting interface on archive papers. And so you don't have to go to another website. You're just reading archive papers, and you see the comments on the site if you have the Chrome extension installed.
Well, and a lot of these papers don't even have comments. They don't. It's like, best case, you're emailing the author? Exactly.
Yeah, they don't have. So what archive does, it's basically just host papers. That's the core functionality of archive. And so one of the things that we noticed is that, especially for areas like machine learning and deep learning, archive is super important.
Because the new papers are coming out at such a high rate that people don't wait before the papers go to journals, before they start working on top of it and using the stuff that other people discover. So all the papers are published on archive. And so you need a way to distinguish good quality work from bad work if you are reading a paper on archive that hasn't been peer-reviewed or something about machine learning. And I think that's why the librarian extension is so important.
It feels such as machine learning. So does the librarian extension have a rating mechanism as well? Like, how do you distinguish good from bad work? Right now, it's only through the comments.
But we are actually thinking about implementing some sort of rating system for papers. We've been thinking about that for a while now. And it's not, we're probably going to run a few surveys to our audience. Because you could do it in a number of ways.
Like, rating a paper, you could do it. Obviously, there's likes or dislikes or uploads and downvotes. So you've got to just have an holistic rating for the whole paper. You could also imagine rating it on a number of different aspects of the paper.
It could be about, okay, how big is their data set if they're using some data set? Or what do you think about their method? So you could have a more complex rating system. And so we've been thinking about that a lot.
And we're just trying to figure out what makes the most sense there. But that's also definitely, like, we'd love to add that to our conversation. Yeah, so how do you think the collaboration plays out then? Because I understand how, you know, say, for instance, you're a physicist.
You start commenting on someone else's paper. You start a discussion. That creates a new project, right? Do you think you'll go further than there?
Like, are you talking about, like, forking and that kind of stuff? Yeah, that's, I think you could. There's a lot of things that you could do once you have a platform that has more people in it and that they're doing more stuff in it. And so that's why the way we've been growing Firmat is with a goal far in the future where we are a much broader platform.
And so right now, but right now we're focused mostly on solving problems that people have nowadays. And actually, we were largely inspired for our archive extension by the survey that the archive guys did where they had, I don't know how many people, but they surveyed the people that used archive and then published a paper where they described the problems that those people reported while using archive and the things that they most wanted to see, the features that they most wanted to see. And then the archive folks just said, hey, we're just going to be the platform to build upon and we're not going to do all these things that people would like us to do. But here it is.
This is what people want to see. If there's anybody else that wants to work on this, here are the results of the survey. And since then, they've actually done a pretty great job of, like, building an API and wanting to become more of a platform. And so there's a lot of ways that we envision that you could have collaboration around science.
And so, yeah, like, forking a paper or forking some type of research. Exactly, or data. There's a lot of things that you could do there. It's not something that we're focused on right now.
Right now, we're just trying to solve these problems that people find out and create a place where people can just post comments and discuss around a paper. An example of the problems that people mentioned was, like, for instance, reference extraction. So if you go to PDF, you have, at the bottom of the paper, you have the references that they used. And most of the times, when people want to search the references, they have to copy the text in the PDF, put it on Google, and try to find a link to the paper.
And one of the things that we did with our Chrome extension is we allowed that. They just click on a button in the Chrome extension, and then they see a list of references with links to the paper. So that was one of the features that was most requested by the archive users. And our idea was, initially, we wanted really to convince people to install the Chrome extension.
And so let's solve the hair-on-fire problems that they are describing here. And then once we have people using the Chrome extension, then we can expand into, like, open collaboration around papers, since they're already there. Yeah, so that was it. Do you guys know of anyone working on publishing negative results?
This is something I was fascinated with. And, like, basically, the problem is that, like, as an academic, you're not incentivized to publish negative results because you want to publish things that have high impact, so you can get a job or a tenure position or just get people to even care about your work, right? So they don't publish. Do you know anyone, like, working on that?
Yeah, and I know of researchers that are studying that field a lot. But, unfortunately, for some of these things, that's a very large problem. And people are becoming more aware of that. And with that, you get negative results.
You also have, like, people doing a lot of research into, like, p-value hacking. Yeah, explain that. Yeah, so p-value is essentially a standard that people use in order to know if the results that you have obtained out of some experiment that you've run are worthy of being published. And so that has worked, for the most part, that has worked fine until now.
Or, I mean, that's arguable. But people are looking into it and thinking, okay, should we do things differently and should we be much more stricter with what's considered the golden standard to publishing? And we've thought of doing things there with Vermont, just so that if you're looking at a paper, to have an idea, okay, how relevant is this paper? This is more specific for certain areas, like if you're talking about medicine or biology, where that is really important, like the statistical significance of results that you're presenting.
That's all, right? That's the most important thing. So we've thought of doing something with Vermont there, either via some API where you could, like, send us the DOI of a paper and we would send you, like, some information regarding the p-value or something, or with a Chrome extension where you'd see that information displayed very prominently, saying, hey, like, there might be some p-value hacking here or this is very solid research. Because there is a very big problem and people are realizing how prevalent it is, especially in things like economics and biology, nutrition, nutrition.
I mean, it came about, I was just talking to a friend who's doing a PhD at Cambridge in bio. Yeah, that's a big thing. Yeah, and only by attending a conference in the States did he realize that there was someone in Australia working on the exact same problem as him concurrently. And they're failing at the same types of experiments, but because they don't publish them, like, no one knows the results, no one knows the methods.
And essentially, like, these, you know, traveling salesman-type problems that people are so excited about quantum for, like, trying all these permutations are happening at a smaller scale, but no one's publishing anything. So, like, the progress isn't happening. Yeah, and part of it is just the way research is done and you come into it and you're trying to find some correlation, usually. Yeah, and you'll be trying to find some trend in the data.
And whether, you know, you're going to usually have that bias. You're trying to find some correlation in publishing that. And so, yeah, you might need to change things dramatically in order to get people to start publishing negative results, which are, like, could be incredibly useful for other researchers. Yeah, but there are a bunch of people working on that.
There's this researcher at Stanford. I'm forgetting his name. It's John, and then I forget his last name, but he actually just went on this podcast, EconTalk. Oh, really?
I love EconTalk. Yeah, so you should listen to that podcast. And actually, Timar has been talking to the professor. I think he's a professor at Stanford, and he has analyzed more this subject, but more relating to economics, I believe.
But, yeah, he's found a lot of the things that we're talking about here that are prevalent also in economics. Cool. Let's go to the Twitter questions. Sorry about that.
You guys are very popular on Twitter. So congrats on your great following. Let's see. Let's start with something broad.
Tanner Goblinstein asks, what are the most interesting papers you've read in the past couple of years that are not widely known? Yeah, that's interesting. I end up reading all sorts of papers from different areas. I can get the papers, actually.
It's just like a random walk. It's really good to be a random walk. It's funny. Or sometimes you'll think, for instance, a few months ago, I got, like, a fibbit to track my sleep.
And so I wanted to read papers about sleep. And so that just got me into, like, a random walk around, like, research around sleep. And then I found a bunch of interesting things. I ended up annotating a paper about a big study in Finland that was done in regards to the association between sleep and mortality.
There were a bunch of really interesting things that I learned from there, for instance, that, like, if you sleep less than seven hours, that's associated with higher mortality. But if you sleep more than eight hours, that is also associated with higher mortality. Really? Yeah.
So have you changed your life based on that? Yeah. Well, not that I was usually more on the end of not sleeping enough. But there's also another thing from that research that apparently sleep quality doesn't matter as much, at least for mortality, which is kind of counterintuitive.
But it seems that your sleep quality is very closely related to the amount of sleep that you're getting. So, like, seven hours of, like, okay sleep versus seven hours of great sleep, that's kind of hard to distinguish. So you, like, sleep on an airplane your whole life. Apparently.
Not as long. Yeah. Yeah. Apparently.
Maybe your life will be a little bit more miserable. But so it's hard sometimes to pick the favorites. But there's one, for instance, there's one that's also kind of random, but it's a paper published in the 90s about the Simpsons paradox and the hot hand phenomenon in basketball. So the hot hand phenomenon in basketball is, right, you think that, okay, because they just made a field goal, like the next one, they have a higher chance of making it.
And so there's this researcher that in the 90s looked at a data set from the Celtics to see if, for free throws, if that was true. And so before they had asked students at Stanford and Cornell, like 100 students, if they thought that, okay, if they just made the first free throw, is it for the second one? Are they higher? Did they have a higher chance of making it or not?
And there was something like 68 of the 100 students that were asked that agreed, and they thought that that was true. And these are, like, people from Stanford and Cornell. And so then they looked at this, and so what they found back in the 90s, what they found was that actually that seemed not to be the case, right, that from your second free throw, is not, you're not more likely to make it if you made the first one. But what they found is that you're just more likely to make it on your second one.
Objectively. Significantly, yeah. Okay. And so this was done in the 90s with, like, I don't know how many free throws, maybe, like 5,000.
They looked at some data from the Celtics. Across the Celtics. Yeah. And then I went and got a data set from Kaggle with, like, 600,000 free throws.
And I re-ran the same, right, re-ran the same algorithms that they ran for the study in the 90s and then looked at what the results were. And, yeah, and so the pattern is pretty clear that just on their second free throw, they're just much better at it, significantly, regardless of their first one. And, yeah, it doesn't matter as much. It doesn't matter if they made their first one or if they missed.
Yeah. Yeah. And then that paper kind of then tried to explain why people think that there is a hot hand phenomena, and that is related to the Simpsons paradox, which for people that don't know what the Simpsons paradox is, it's also really kind of changed my worldview a little bit once I learned more about the Simpsons paradox. But it's basically, what it says is that you can get two valid conclusions out of the same data depending on how you split it.
So an example is, for instance, that between 2000 and, like, 2013, the average or the median wage for high school dropouts in the U.S. is dropped. For high school graduates, it also dropped. For people with an undergrad degree, it dropped.
And for people with a graduate degree or higher, it also dropped. So across the board, for all of those segments, the median wage dropped. But in aggregate, it went up. And so you look at it and it's like, okay, what's going on here?
And it turns out is that what happens is that a lot more people got a degree. So they just shifted towards higher education. So that's why you get, on average, it going up. And then for each one of these segments, it goes down.
And so the Simpsons paradox is that depending on how you cut the data, you might get different results. But that could be valid. In this case, it's pretty easy to understand that you should be, like, what's the right way to look at this data. But in some other cases, it's not clear whether or not you should include this variable and cut the data in some different way.
And so relating it back, like, for this basketball issue, what it was is that if you looked, the results were different whether you looked on a player-by-player or if you looked at the aggregate. Once you collapse it all into the same table, you get different results rather than when you looked at it player-by-player. And so if you collapse it, I think, I forget exactly the way it went, but if you collapse it, it might have been that you indeed saw. You didn't see the hot-hand phenomenon, but if you look at it player-by-player, you saw it.
And so they're arguing that that's why people had the idea. That's why you get, like, 68 students out of 100 saying that they believe in the hot-hand phenomenon. Yeah, yeah, yeah. And so, yeah, so some of the papers, like, that's really random.
It's just, like, it's funny. You're getting these just, like, little tidbits of trivia. Yeah, absolutely. But has it been relevant to you in terms of physics?
I mean, you're basically working on software now, right? Yeah, but I also end up discovering really cool physics papers. So, for instance, my two favorite papers are actually, they were written by Freeman Dyson. One of them is when he proposed the concept of a Dyson sphere.
It's just one page. And he basically explained how advanced civilization would need more energy than the energy that we can generate on Earth. So we would have to go to a star and build a cap around the star to extract the energy of a star. But it's funny because it's, like, with really simple math and physics equations, he was able to derive, okay, is this sphere stable?
Is it going to eat indefinitely? And so it's a really interesting paper. And the other one that I really like is one about Feynman's derivation of Schrodinger equation, also written by Freeman Dyson. And it just shows, you know, Feynman's intuition about quantum mechanics.
And it's also really simple and easy to read, even if you don't have a physics background. But one of the things that I noticed from, like, trying to find papers and annotating all these papers was that, you know, in the 60s and all, like, through the 20th century, all these discoveries and all these papers were mostly, like, one, two pages. And, yeah, like, it's so funny. And also fairly simple to read.
But the discovery of the neutron is, like, maybe one column just. The discovery of the positron, like, the Dyson sphere paper, they're really, really short papers and fairly accessible. Why do you think they've gotten so long? Is it sort of like, you know, David Foster Wallace citing a million things because he doesn't have confidence or anything?
I think it's also a consequence of a field developing. You just have, you know, more complex questions, and so it's harder to write. They're also a little bit more detailed as to the methodology. And the format of papers has gotten a little bit more formal in that sense where people follow us in a very specific format.
And I think that has added onto it. But, yeah, nowadays they tend, like, the gravitation wave that we annotate. That's relatively, that's what, like, 15 pages? Maybe.
It would be interesting to analyze, like, the constraints in terms of size that journals were imposing, like, 50 or 60 years ago compared to what they're doing now. If they are, like, forcing people to write, they were forcing people to write shorter pages, shorter papers back then. Not sure. But, I mean, like, if the discovery of the positron paper was published today, I bet it wouldn't be just a single column.
Well, are they intended to be more reproducible now? Good question. Maybe. Maybe.
Yeah. I think, or maybe it's just more complex problems that they're attacking now. Yeah. It might be the case.
Yeah. Yeah. It's definitely not going back, it seems. You don't really see a trend anywhere of shorter papers.
But, yeah, it's interesting. Yeah. You go back to the 60s and 50s and it was pretty nice. Of course it is.
Yeah. All right. Cool. So let's go to another question.
Polaris7 asks, what are the necessary ingredients in a good and impactful science writing? This is also a good question. I don't think that I'm qualified to, or, like, I haven't published that many papers to know that. But one of the things that we noticed, or at least I noticed from reading papers, is that sometimes it's not like the discovery paper that is the most impactful paper.
So, for instance, I just remember when quantum electrodynamics was discovered, there were three guys working on that problem. So Feynman, Schwinger, and Tomonaga. And they were sort of working independently on that problem and publishing papers on quantum electrodynamics. And the most impactful paper was actually published by Freeman Dyson, who at the time took the time to analyze all the work and kind of unified the work of Feynman, Tomonaga, and Schwinger.
Wrote a paper that helped other researchers understand what quantum electrodynamics was back then and helped really spread their work. So it was actually a most impactful paper. So, in other words, clear writing. Exactly.
Yeah. Clear writing. Yeah. It's also, I mean, the question here is impactful scientific writing.
And so you have, of course, writing papers, and then you also have just scientific writing in the sense of making some concept more, explaining that to a more general audience. And so I think there's also, it's also the same where you want to make it clear and you want to make it accessible. But, for instance, even like something like the Bitcoin paper, where it is like, I mean, I studied photography in college, and even, like, it took me a few reads through it to actually get it. And it's a beautiful paper, but it's definitely not, it's a very Spartan language, and you want to read every sentence in it.
And so it can be very challenging to approach it. And I think definitely you always benefit if you can make it as clear and accessible as possible, because you never know, like, the audience that is going to end up reading your paper. Of course, you can expect other people in your field are going to read it, but sometimes things can be useful, especially, like, interactions between math and physics. Things can be useful in different fields.
And so I think it's always beneficial for science if you try to make it as accessible. What does it impact mean? Well, that's a question as well. Yeah.
Did you see that one? From Adam. Adam Babot asks, basically, the metrics for value bad. Yeah, exactly.
What does it impact mean? You know, if it's the number of citations that you get or just the number of people that, you know, learn about a certain subject because of a paper. So in that way, a review paper can have a really big impact compared to a discovery paper. And so it's one of the problems that we also think about a lot, these metrics and what are the incentives in science and what makes people, you know, want to publish a paper or, you know, why should people worry about clarifying a paper and making it understandable to as many people as possible?
Do they have the incentives to do that? How can you create incentives to do that? Right. And then sometimes, you know, if you're just, the metric is just number of citations.
Sometimes it's not aligned to making the paper understandable and comprehensible to a large audience. Right. I mean, is that a question that you guys have to tackle? Because, you know, on one hand, you want to illuminate these papers that people could potentially learn from.
And then on the other hand, you're running a site with content, right? And you want things that are going to capture attention. So I saw you had a Charlie Munger post on there, right? Mika annotated the Charlie Munger paper.
Okay. Our other co-founder. Yeah, yeah, yeah. So it's like squarely non-technical paper, but Charlie Munger has millions of fans across the world.
Exactly, yeah. So you kind of have to balance those two things. Yeah. And, yeah, it's not easy.
And citations are definitely a proxy, right? If the paper is getting cited a lot, it has some sort of importance. But it's definitely not perfect. And if you look at the most cited papers in these different fields, you might be surprised that they might not be the ones that you expect it to be.
I certainly remember looking at, like, the most cited papers in computer science. And they're definitely very impactful. But you might have some of them, I remember reading through those statements, some of them I'd never heard about before. And so, yeah, and sometimes very important, well, this is more specific for certain fields, very important concepts or discoveries never really get published in one paper that then gets a ton of citations.
That knowledge gets spread in some other way. And so there are, yeah, citations are not perfect. But I wouldn't say that we have a great answer for that, what's a better proxy and how you should go about it. And I don't think anybody really right now has a better answer to, or not that we've heard about.
But, yeah, it's an interesting problem. So we'll see what people start using in the future because, yeah, you can measure impacts or how many people are talking about it on social media. Or if you have code, you know, if you have a public repo, how many forks do you have on your repo? Yeah, or like, and then it depends on field by field, right?
So if you take bio, then bio papers can have a very direct, can be used very directly, say, in industry. You can publish a paper about a drug and then that can be used worldwide and save lives. So there, like, for that field, maybe there are a bunch of other metrics that you could use there to calculate the impact of a paper. But for the more traditional science, like physics and math, sorry, yeah, it's hard.
A question up top, Arshalan Yarvesi asks, it's basically about working in public and in the speed of publishing. They say, since scientific papers usually go through scrutiny and evaluation before getting published, how do you cope with not being always updated and up to speed in the world with daily news and contributions? It's kind of really what we were talking about before in relation to people publishing to the archive before they really test it out. Where do you guys fall in that dynamic of, like, publishing as soon as possible, like with something like machine learning where things are just getting put out all the time versus going through a peer review before getting something out?
And this kind of loops into peer review, which is a whole world unto itself that people are talking a lot about. Now, for us generally, or say for a weekly journal, we generally are not publishing the most recent research. And there is definitely, like, sometimes there's a lot of us having to catch up to even, I remember annotating a paper about, like, this machine learning algorithm to play one-on-one poker. And this was, like, out of my league, I had to go, like, spend a good amount of time there researching it and also figuring out, okay, how relevant is this?
I also don't because, you know, I'm not in the field, so it's hard for me to gauge, okay, what's the impact on this paper? So, yeah, sometimes it takes us a lot of reading up before we can actually say, okay, this is worth publicizing and having our audience, or it's worth our stamp of approval and saying, hey, you should read this. I think you'll like it. And it can take a while sometimes.
But in the future, like, looping back to peer review, that's also something that I think the system nowadays does not seem to be perfect. The way things work nowadays, and we would love to see either VioVermont or some other platform to try to tackle that and try to do something to make peer review a better system or to change it significantly. I think there's a lot of work left to be done there, which can have a very significant impact in science, right? That's part of, like, one of the most important aspects of science is just, okay, having a very skeptical mindset, looking at it with a very critical eye and seeing, okay, is this something that we can build upon?
Is this something that we're going to add to our foundations to build more science upon this? And so that's a very important aspect of science, and I think it's not perfect. then it could be better. So Anvil Rotterdam asks, have you ever thought about building a tool for annotating books?
Something like what Patrick Collison was talking about in the thread where he basically says, I'd pay a lot more for books if I could see the highlights, annotations, and marginalia of friends or people I follow. Yeah, I think it's actually a really good question. And we have a friend, Jess Riedel, from the Premature Institute, he's a researcher there that wrote about these on his blog. And I think that besides annotating academic papers, it also makes total sense to annotate books.
And especially kind of introductory books about science. And he gives this example of a book that is used by thousands of students to learn classical mechanics called Goldstein. They talk about this transformation called the Legendre Transform. And he does a bad job at explaining what it is.
But apart from that section, the rest of the book is awesome. It's really nice if you want to learn classical mechanics. But if I want to write a book that does a better job at explaining the Legendre transformation, it has to be net better than the Goldstein book so that anyone will adopt that book. Otherwise, people will just keep using the Goldstein book.
So it would make sense for books to be annotated and also be open source so that in that sense you would just commit a new chapter, a new explanation for that, and keep all the other chapters and then just change that bit instead of having to write a new book and then convince people to adopt your book just because of that. So I think it makes a lot of sense to do more introductory. And we've thought about that, the type of things that you could do. If you had some platform where you could have books that kept being updated and you could have, okay, this is the standard for learning calculus.
This is constantly being up to date. You're adding exercises to it. People are forking in. If you need more information about this and you're not understanding it, you could deep dive into it and you have a bunch of additional content that is attached to it.
It really feels like something that should exist. And we've thought about doing something with Vermont for that. Yeah, it's just so many things. But just in terms of copyright, are there massive issues there or is that possible?
So I think some of them, you might be facing some of the same challenges that Wikipedia is facing to an extent. Then, yeah, I think it would depend a lot on the format that is used. I do think there's, for something like this, you'd probably benefit from having some editor or like a team of editors to curate and to see, okay, what, like, should we add this, should we not, to an extent, to be a curating voice. In terms of copyright, yeah, you could run into some issues.
Well, some of these, especially the classic books on electromagnetism are like... They're out of copyright, yeah. Yeah, my impression is that these are maybe even like current books coming out, like popular fiction even, as annotated by a famous person. So, I mean, maybe if they gave away their notes for free and they were just a layer on top, but if you wanted to, you know, resell your own version of the book.
Yeah, that's interesting. There's also some, right, there's some legislation, well, there's fair use, right? You can use a piece of content if you're adding onto it or like, right, this is why you can have like a video on YouTube with a snippet from a movie if you're reviewing it. There's some precedent there for doing this type of thing.
But yeah, but for more general books, I also agree that it'd be amazing because we were just talking about this. We've talked about this for a while now, right? Because you read a book and the purpose of that book is not only for you to absorb all the knowledge that is there, but it's also to get you thinking about what's being talked about in the book and then you might reach some other conclusion and you might go on a tangent and then when you're reading it, that knowledge might never be shared with anybody else. You might just read it yourself and you think, okay, this just made me think about something else.
And it would be really, like there's a lot of knowledge that is being lost and it would be great if you could capture it in some way. The Amazon Kindle highlights site is one of the saddest things I've ever seen. Yeah, have you ever done that? We have Kindles.
Oh yeah, so there's a whole web interface for looking at all of your highlights and across all of your Kindle books. It's not good. So do you use it for anything? I mean, sometimes I go back.
So like the best way that I've found for me personally to retain is to buy the audio book and go through a book a couple times and then my retention goes way up. But occasionally I'll be just like, oh, what was that passage in whatever book? And I'll go back onto Amazon and you can dig through your highlights from your Kindle. I think I've seen like a startup that does that in a better way.
It includes all your highlights and organizes them. Yeah, I remember looking into this. But what I started doing is I also use Kindle and so I usually don't write annotations via Kindle somewhere or I usually don't use it for that. But if I'm reading a physical book over the past, whereas before maybe I would never write anything, now I try to like write a lot more there.
And then at some point, if I have time to try to go through the books, see where I wrote things and then write that in some notebook. And because there is like, just going through that exercise of looking at what you highlighted can be very helpful. Yeah, I mean, I was an English major in college so like I've forgotten more books than a lot of people ever read in college. And one of my professors actually recommended this which is basically take a five by seven index card and as you're reading the book, you're making little notes, right?
You're like, all right, this character does this or like this is an important point. And then at the end, you basically write a paragraph to your future self describing your memories of the book and what happens and like important ideas and that can really like trigger it for you to retain the past that. But I remember in school like back in Portugal, we all have to read this epic poem that is like, it's called the Lusiavish and it was written by a poet back in the day and it's about the Portuguese going from Portugal all the way to India and the Portuguese discoveries. And so I remember we had a version, you had the original version which is pretty thick and then we also had the version that had annotations on the side for each verse or not for all of them but for a lot of them and that makes such a big difference because you're reading in old Portuguese which by itself is already hard to tell and he's making references that you have no clue about so much historical context in every word.
The names of all, India was not called India so everything is different and you're reading it through the first time you go it sounds great, it rhymes but you don't understand a lot of the context behind it and if you go through it and you read through it and then on the side you have all this rich content that really only adds on to your experience and makes it much more memorable. You can map it out in your mind and create much more connections. It really enriches your experience and of course you have this because in this case this is an epic column that everybody has to read and so there's a large incentive to publishing the annotated version of this book that is no longer under copyright so you can have those type of things but for a lot of more recent books I think there would be you could benefit a lot from that to some extent where if you're reading through these few pages and you love what the author is talking about here you want to dig deeper into this topic that he's talking about right now there should be some place where you could do that but yeah there's just nobody has actually built this. I think that defaults toward the blogosphere for most people some people summarize like write Amazon reviews but then the thing there is and sometimes that content does exist but being able to find it easily having that like in your fingertips can make the whole difference even if you yeah maybe you could spend like a minute searching on Google and you'll find the content that you're looking for but it was right there you could just click it and it would pop up and you'd see it and it would be much more likely that you would end up reading the content those type of things make a big difference being right there.
Do you find that annotations sometimes are best done by someone who is not the author of a paper? What's interesting is that the authors of the paper sometimes you know they are not going to know where people are going to struggle and they're sending the paper often times. I remember when I was annotating the Ethereum white paper written by Vitalik I went through it and then I emailed him and it's super quick to reply and he replied back with some of the questions that he gets the most about Ethereum but when you're writing it you have no clue you've worked it out in your mind some steps you might skip because you just have internalized them by so much so you only get you only know where people are going to struggle once you put it out there and you start getting questions and so yeah so sometimes authors are not the best every time we talk with an author I think it's easier for them to answer questions about their papers than to annotate the paper but then if you have another person annotating a paper I think it's easier for them because we do authors we see that a lot just to ask me questions I'll answer them but sometimes I don't know how to enhance or add content to my own paper you guys can provide that service for sure you can reverse engineer clear papers it's kind of worth noting that this is a side project for you guys I have so many questions about how you go about building this thing that's definitely consuming a lot of your time it has to between finding papers making all those graphics and tweets and stuff that you guys do how do you find that balance what's your whole philosophy around this? Yeah it definitely takes its time it is something that we actively tried to do after college reading papers and staying up to date it's something that we tried to do anyway so we were already looking into research before it's just something that we would enjoy and then we found it good to have some sort of peer pressure amongst ourselves to present papers to each other because that really forces you to understand something well I think it was Limey he has some quote where you don't understand something until you can explain it to freshmen in college and so that's very true and so we tried to do that amongst each other and so then we got to Fermat and we thought okay maybe we can bring this online so we were already spending an healthy amount of time doing this type of stuff but with Fermat you have to like the first version of Fermat we kind of build it over the weekend and we try to just make it just put it out there as fast as possible and then it's mostly like late at night I'll be trying to fix bugs people in acronyms don't seem to think that it's a side project and everybody hurts on it so yeah so they're definitely bugs sorry about that we try to fix them when we have time yeah but it definitely takes its time but I think it's also something that all of us really like doing and I mean I start looking at Wikipedia articles about quantum computing and then I like spend three hours speaking on articles and articles and articles and then I found like five papers to annotate and I've produced like 10 or 15 tweets so it's something that we really enjoy doing and so it's I think that's the real genius of it right it's like basically figuring out a way to turn your I mean if you have a desire to turn your what would be your hobby anyway and having a forcing function because this type of thing is really easy to let go right because sometimes you might not feel like understanding a paper to the point where you could annotate it takes a while to get a good grip especially if it's not an area that you're super familiar with so it's definitely not the type of effort that you do on a Saturday night unless you add a forcing function that you know that in a couple weeks you're going to be putting this to a lot of people that's my favorite part of the podcast like with the software stuff it's pretty easy for me to just like it could be anyone in the room and we can do a podcast but when we do physics ones or anything or math or something I'm just like oh my god I have to take a couple days just read it I'm not obviously I couldn't even become an expert if I dedicated a week to it but I want to be conversant to a certain extent and that part's fun yeah you definitely feel the pressure when you're writing these annotations because people call you up and you're like okay this is wrong or you missed this and so when you're writing it you want to be really careful make sure that that what you're saying is correct and you know that you might have somebody that actually a college kid or whoever that is reading through that paper and then is going to use your annotation to help them understand so you have the responsibility we feel that responsibility towards those people to do a good job at it and when we put an annotation we want to stand by it and we want it to be of quality and it's funny it's like the more you annotate a paper this is like a circle and the more you annotate a paper there are more people there are that are at the edge of starting to understand what the paper is about so you start getting more and more questions because the circle expands and you just have more people that are starting to understand these topics about number theory or physics or whatever so you get more and more questions about the paper so it's like and then when do you stop explaining a certain concept so it's like you want to annotate a paper about number theory okay do you have to explain what a prime number is for instance or do you have to explain what a rational number is so it's really interesting once you start thinking about that like how deep you go well you've got to be careful about those videos then because if you get discovered on YouTube as an explainer series good luck people will start to see it yeah yeah yeah no we've done a few of those but yeah we've annotated a paper that it was I think it was a proof of the irrationality of the square root of two and then there was there was this I think it was 14 year old kid from Russia that because of that paper he came out with an alternative proof for that and he sent us that proof and I read the proof and it was apparently legit yeah and I told him to submit that to a journal a mat journal and I think he did it I haven't heard that from him but I should reach out to him to see if he actually was able to publish it so it's also nice to see you know we can inspire people sometimes to do these types of things and I also think especially with Twitter one of the things that we learned is that learning something learning a concept or learning a fact is really really addictive and we see that on Twitter almost every day people come back and we have hundreds of thousands of users that read our tweets and I think that's why people really like when they have a good teacher and when they can go to a class and really learn something I think the problem is that usually that requires a lot of effort from people who either have to go to a class or you have to read a book to learn something and I think what we're able to do with our Twitter account was to provide that same feeling acquiring a quantum of knowledge but at the cost of reading a tweet which is really easy for the reader sometimes it's really hard to make those tweets it requires a lot of reading and thinking how can you explain something with just these characters and an image maybe but you know once you get to that and once you're able to teach someone a fact or something people really like that and I think it's something that there should be more people exploring that on Twitter it's a very particular medium there's a lot of people that are attracted by that you might not a few years ago I would have been very surprised but now you have all these scientific be it explainers but you have people that have millions of followers and what they're falling for is for scientific content and they just want to learn so that's something very uplifting that we've learned that there's a lot of people out there that want to learn and it's too easy to get down on those people they're just like oh this is like basic fun facts or whatever but at the end of the day that's good people are excited to learn they want to learn and then you extrapolate it out a little bit more and you look at someone like Dan Carlin doing the Hardcore History Podcast look I think if you would objectively like written that down, you're like, all right, I'm going to produce 25 hours of content about the cons and people are going to be into it.
I would have told you no fucking way. And then you look at it and it's like millions and millions and millions of downloads. Yeah. That's pretty cool.
There's some things that you look at and it really catches you by surprise. I mean, this is parallel, but it's like Wikipedia, for instance, if somebody had pitched Wikipedia to me before Wikipedia existed, I would have never guessed that it would be possible. Yeah. Because right.
Like, how are you going to do this? Like no incentive, just, just people are going out of goodwill. They're going to add content to it and it's going to be good content reliable, things that you can use to learn. And that's just, right.
That's not something that you would initially think would fit with human nature. But people surprise you positively. Right. And the same goes for like Stack Overflow.
Like people just out of goodwill, they will go out and explain, you know, or try to help you solve your problems. Like there's, there's something to be said that like humans have like some, some, some untapped fountains of good will that, that we might not be leveraging as much as we could. You know, you see it bright spots here and there and like Wikipedia or Stack Overflow is some projects that if you pitched them to me before they existed, I would be very skeptical that they would be able to get to the point where they are today. Of all the parallel universes, we are in the universe where Wikipedia exists.
Exactly. There's got to be a lot of parallel universes where Wikipedia doesn't survive. Yeah. Yeah.
Well, I mean, it's like when you talk about you guys expanding, you almost don't have to over-engineer the incentive mechanism. You know, if you believe that it's true, like annotating more papers is objectively interesting. Exactly. Yeah, you, yeah, for sure.
We have people, I think, you know, we always have people that are going to be interested in consuming the content and reading. Then you have the other side, how do you create incentives for people to annotate the papers? That's a different, a different game. But yeah, some things is just that it takes some time and we are totally, when we started this, we, we, we knew that it would take time until people cared at all about what you were doing.
And then it takes even more time to make any sort of impact on the issues that we care about. But, but for a lot of these things, even say, if you look at archive archive was started in like, that's my, so it's like started August, 1991. And it has taken a long time to get to where it is today. And if you look at the graph of submissions for archive, it's completely almost linear.
It's, there's no startup exponential growth. It's like completely linear, but it's arguably one of the things that has had the most, or that has impacted the making of science or the distribution of science the most, but it just, it just took a while to grow. And it seems like it's just going to keep growing linearly, but sometimes that's, that's what you need. And so, so we are totally mindful of that.
And we know that like this might take a really long time until you can get to do what your, what our ultimate vision is and to build it out. But you know, some things that they just take some time. So do you feel, do you feel pressure to achieve like profitability or even like sustainability in the business? Not at all.
We never really thought about that because also probably because this is a side project, we never really thought about monetizing or achieving profitability. So it is like for some of these communities, you know, like, like it's a for-profit company and I think it does a great job at what it does. And I'm probably happy that it is a for-profit company because they're just more independent. And if, if they have a good leadership leadership that takes it in the right direction, it's great because they can, they don't need donations to ask for donations to keep going.
We compete as a nonprofit and they've been doing great. So it's possible to do it both ways. We've just, because we have very limited resources. We try to focus all of our attention in the areas that are the most important to work into what we're trying to achieve.
So, so right. So that means like we have to prioritize. And so meaning like our next step is going to be building the Chrome extension for archive versus doing anything else because we think that's, that's what has the biggest impact. So that's why we never dealt into profitability and we just pay the cost ourselves.
It's just server costs because we do all the work. So it's never something that has been in our minds a lot. And we think you could build these types of platforms, either for-profit or nonprofit. So yeah, just something we'll kind of defer it further down into the future.
It's a good question for us. It's good if archive survived if they were a startup, for instance. Yeah. Right.
If they were for profit. Yeah. Right. If they raise money with that kind of linear growth.
If they were not inside a university. Right. Yeah. It's a good question.
Yeah. I mean, plenty of companies without startup growth raise money and become profitable or sustainable. Right. You're just like, okay, what are you going to charge for?
Yeah. Because yeah, I mean, archive is great because it's open. Of course. Right.
And so many other journals may be dying out because they're not. Yeah. Absolutely. Yeah.
So one of the trends that we've also noticed is a lot of, a lot of people building journals on top of archive. And we are even collaborating with a few journals, one of them being the quantum journal, which is an overlay journal on top of archive on the quantum physics category. And what they do is basically, so what is a journal is just a list of links to papers. And so they don't have any hosting costs.
They just have a page where they just have the links to all the papers that they decided to publish and all the papers are on archive. So it's completely open. And what we, what we, our partnership with them is basically all the papers that have the Fermat's library commenting interface. But we're seeing more and more of these journals popping up.
So for instance, the Erdos discrepancy solution was published on one of these open journals called discrete analysis. And I think it's, it's totally possible that these open journals get to a point where they, they have, you know, a reputation like science or nature, as long as you convince people to, to, you know, publish their papers on these journals. It's, there's nothing about science or nature that, that, you know, is unique to them and that prevents these, these open journals to get to that point. Of course, it's also going to take time, but I think it's, it's totally possible.
Yeah, exactly. It is. I mean, a lot of people talk about this, right, where you have journals that put content behind paywalls and that might have been funded with public funds. And so, right, there's the whole discussion about that.
And it is a tricky system to get out of because it is sort of in a stable equilibrium to a sense, right? Because if you're a researcher, you need the publication in nature or whatever to get your postdoc position in a renowned university. And so you have incentives for, for the, for the status quo to persist. But there are a few ways that you could get out of it, right?
As we was mentioning, one way is for these open journals to start gaining more reputation, right? And to, so that if you're getting published in the sweep analysis, it's a big deal. It has, it has a lot of reputation attached to it. And once that starts to happen, like you get more and more people just putting it all out there and on archive and publishing it all in open journals.
The other ways that you could, that you could get out of the system would be to, for specific fields, like what we were talking about in machine learning, where you have an incentive to publish as fast as possible because the field is just moving so quickly. And then, and if you, and nowadays journals or big conferences, a lot of time for submitting it until it actually gets out there, If you're submitting to NIPs or whatever for machine learning, it takes a long time for it to, to actually be officially published. And so you also have an, you have that incentive that if it's open publication, you can move much, much faster. And so it is a sort of a tricky equilibrium to get out of.
And that's why these companies that make billions of dollars in revenue. And, and first one way to get out of it was just, and I think it's, it's one of the ways that they are probably, you know, it's probably the way to, to get these open, open journals to, to be as, as popular as nature or science is to convince really people that already have a 10 year or really famous scientists to publish on those journals. You already have your position, you already have your fields medal, your Nobel prize, just publish on, on, on an open journal. And that's what Terence Pyle did with the other discrepancy.
And I think that's what other people are doing. And Tim Gowers, which is the, the, it's a field medalist also, it's a mathematician, which founded the Discrete Analysis Open Journal. And I think he wrote a blog post a while ago. And his mission was to convince famous mathematicians and people in these situations to publish on open journals.
Yeah. Because for the, right, for the young researcher that is trying to get a position in an uber competitive field, then you need to, right? Because if you want to get your postdoc in a renowned university, you need to have that. So that's, that's what's keeping it alive.
So these big names endorsing the open journals. I think that's going to be the growth act to, to increase the reputations of these open journals. Absolutely. And it was interesting because it is, it is a problem.
And we definitely believe that that's the right direction. And while you're in the U S right, like while I was studying at MIT, you don't even realize it because if you're within the MIT network, everything is open. Yeah. Right.
So you just, you're accessing it. And when I was an undergrad, I didn't even realize. In other words, the research groups might not be able to afford all the journals. And so you just sometimes just have a lot of trouble accessing research.
Yeah. And so this is not in the U S it is like big institutions have access to it, but like in a lot of other parts of the world, the fact that a lot of research is being published in non-open journals, it has a significant impact. Especially when like legit CS papers are written by people who aren't associated with any university, right? They're just like hobbyists writing things.
Like why would they have a hundred journal subscriptions? Exactly. I remember even like researchers, other researchers in my research group, sometimes they would have to go through CERN to, to get VPN through CERN to, to get access to these papers. Yeah.
Or like I would have to email you and ask you to send me some PDFs. So if someone wants to contribute or help out, what can I do to help you guys? I think there are a few ways that you can help us out. You can annotate a paper on Vermont's library.
And so email us team at Vermont's library. Exactly. If you want to annotate a paper there, you can spread the word. And if you're at a university, then if you have a journal club, if you have a research group and you want to annotate papers and share them among your peers.
When you create an account in Vermont, like now you can also upload your own papers. You have that option and then you can share with whoever and you can create your own lists. And so we have people at universities that use this already, like be it for classes and students have to read papers. And so they will post annotations on Vermont or just within research groups and they all decide to read a paper.
So if you're at a university and if you want to use this, it's completely free. So you just need to sign up. Yeah, those are the two main ways that you can help us out. We're also taking cryptocurrency donations.
So there's that. But really like most of our costs are just separate costs. It's all, so we don't have to pay salaries to anybody. So yeah, that's about it.
That's the way to help us. Cool. All right. Thanks, guys.
Thank you for having us. All right. Thanks for listening. So as always, you can find the transcript and video at blog.ycommodator.com.
And if you have a second, it would be awesome to give us a rating and review wherever you find your podcast. See you next time.