🤗 All things transformers with Hugging Face

What this episode covers

Sash Rush, of Cornell Tech and Hugging Face, catches us up on all the things happening with Hugging Face and transformers. Last time we had Clem from Hugging Face on the show (episode 35), their transformers library wasn’t even a thing yet. Oh how things have changed! This time Sasha tells us all about Hugging Face’s open source NLP work, gives us an intro to the key components of transformers, and shares his perspective on the future of AI research conferences.Sponsors:DigitalOcean – DigitalOcean’s developer cloud makes it simple to launch in the cloud and scale up as you grow. They have an intuitive control panel, predictable pricing, team accounts, worldwide availability with a 99.99% uptime SLA, and 24/7/365 world-class support to back that up. Get your $100 credit at do.co/changelog. The Brave Browser – Browse the web up to 8x faster than Chrome and Safari, block ads and trackers by default, and reward your favorite creators with the built-in Basic Attention Token. Download Brave for free and give tipping a try right here on changelog.com. Fastly – Our bandwidth partner. Fastly powers fast, secure, and scalable digital experiences. Move beyond your content delivery network to their powerful edge cloud platform. Learn more at fastly.com. Featuring:Sasha Rush – Website, XChris Benson – Website, GitHub, LinkedIn, XDaniel Whitenack – Website, GitHub, XShow Notes:Giveaway details!! Check this blog post for all the details to win a free copy of Dracula PRO && 14 Habits of Highly Productive DevelopersHugging FaceTransformers libraryTokenizers libraryNLP (data and evaluation metrics) libraryPrevious Practical AI episode with Hugging FaceTechCrunch announcement about Hugging Face’s recent fundraisingThe annotated transformer2000+ models in Hugging Face’s model hubAttention is all you need paperMini Conf toolsUpcoming Events: Register for upcoming webinars here!

of MATCHES

TRANSCRIPT · AUTO-GENERATED

One thing that we think is really interesting is that unlike generic model hubs like TensorFlow is hub or PyTorch is hub, because our models are all of the same form, we can build a lot of tools and machinery around using them. So for instance, we have a visualizer that works for all of our models. You can just upload your own model and get really interesting visualization of its internal structure or this open source project called text attack, build adversarial attack system, and it's able to generically build attacks to any of our models in our hub. So because they all have the same interface, it allows people to do these really longitudinal research projects across everything that's going on in the hub itself.

And I should mention that now we have an inference API. On any of the pages, you can just type in some text and it will run against that model. And you can even call that from your own code directly without ever running anything on your machine, just run it on one of these servers. I'm even have a Twitter bot that we just put up last week where you can tweet that in a well-run a model against your tweet.

BAM with our ChangeLog is provided by Fastly, learn more at facet.com. We move fast and fix things here at ChangeLog because of Robar. Check them out at robor.com and we're hosted on LinnoCloud servers at lino.com slash ChangeLog. This episode is brought to you by DigitalOcean, DropLits, Managed Kubernetes, Managed Databases, Spaces, Object Storage, Volume Block Storage, Advanced Networking like Virtual Private Clouds and Cloud Firewalls, Developer Tooling with a robust API and CLI to make sure you can interact with your infrastructure the way you want to.

DigitalOcean is designed for developers and built for businesses. Join over 150,000 businesses that develop, manage and scale their applications with DigitalOcean and the dio.co.cio.cio. ChangeLog again, dio.cio.cio. Welcome to Practical AI, a weekly podcast that makes artificial intelligence practical, productive and accessible to everyone.

This is where conversations around AI, machine learning and data science happen. Join the community and Slack with us around various topics at the show at ChangeLog.com slash community and follow us on Twitter if you're at Practical AI event. Welcome to another episode of Practical AI. This is Daniel Weitenack.

I am a data scientist with SIL International and I'm joined as always by my co-host Chris Benson who is a principal AI strategist at Lockheed Martin. How are you doing Chris? I'm hanging in there. How are you doing Daniel?

Doing pretty good. As we talked about the last couple of weeks I've been ordering parts for an AI workstation computer and it's sitting next to me and it's running. Oh, nice. I'm successfully, or at least it appears that I'm successfully overfitting a model on the GPU.

I'll have to deal with that after recording, but it's running and it's not overheating yet. It's kind of stable at a, I think, reasonable temperature. So I'm happy on that front. So it's funny because as we're on this call, we're on Zoom and in the video you have the data center in the background.

So I just find it funny. A bunch of DGX and there you go. There you go. But that's not what you're using the world station.

No, mine is much smaller, although it's bigger than I thought because after I put the GPU and the case would not close. Oh, I guess that just is like airflow. Perfect. It solved itself.

Yeah. I'm just doing okay. I did something really stupid this morning. I was reminded that I'm a clutch.

I fell when I was running and it looks like I broke a rib and you'd think that I do something about that. But I'm lucky. I have a fourth year med student for a stepdaughter. So I called her up and we agreed because COVID is running rampant.

We were not going to have him go to the emergency room. So wow, she said the treatment would be the same either way. So I'm just kind of cranking through the day doing my thing. We're now we're recording.

We're having fun, man. You're really pushing through pain for AI podcast. There you go. You got to be practical.

Yeah, that's something. Okay. You got to be able to get yourself and screen a couple of times or whatever you need to do. Okay.

We'll do. Well, going from that note to something completely different as the show might say. We're really excited today because we have a follow up on a show that we did quite a while ago. Actually, this was episode 35.

So quite a while ago we had later on from hugging face to talk about what they were doing. And now we're very excited to have Sasha Rush joining us who is an associate professor at Cornell Tech and is also working at hugging face on a bunch of different things and involved in the Transformer Library. And so we're really excited to have you, Sasha, to hear more about hugging face. Oh, thanks.

Yeah, before we jump into all of that, could you just give us a little bit of a sense of your background and how you came into the field of AI and eventually into NLP and what you're doing now? Sure. Yeah, so I've been at Cornell Tech for the last year. And if you don't know about Cornell Tech, it's a new university.

It's about seven years old. But we've had buildings for the last two years. Our buildings are right in the center of New York City on an island in the middle of the East River. So every day we kind of take a little gondola over to the island and detources there.

Ah, how romantic. Yeah, it's a pretty fun place. And yes, I've been a professor here for the last year. Before that, I was a professor at Harvard for about four and a half years.

And before that, I was a postdoc at Facebook AI research also in New York. So that's my background. These days I have a lab here at Cornell Tech and I work with the team at Hugging Base who are in Brooklyn. So, thanks, everything's kind of centered in New York City.

Lots of interesting AI and machine learning going around here these days. So my background after I've been in college, I worked as a software engineer for about three years. I'm kind of a person who very much enjoys coding and I kind of have that as kind of the first part of my personality. I then went to graduate school to study natural language processing.

When I got into natural language processing, I think I really got into it because I was very interested in language and particularly kind of algorithms and data structures involved in studying and understanding how language works. At that time, I did a lot of machine learning, but machine learning wasn't kind of the primary way we studied language. There were all sorts of other aspects about kind of how computers and language interacted. And actually my dissertation was much more about, say, the optimization aspects of language in a discrete sense.

Kind of how you construct trees that represent different linguistic phenomena. And how these interact with kind of classical computer science algorithms. And when I graduated my PhD, I kind of graduated right into the beginning of really kind of intense deep learning for language. And doing my postdoc at Facebook, everyone was kind of intensely interested in how we could do translation, how we could do question answering, kind of completely from data using deep learning based systems.

So I kind of dive right into that world. I sat next to the folks who were working on boards at the time. And then it was written in Lua and a couple years later they converted it to Python and became PyTorch. So I've always been very fascinated by kind of the tools and structures that make it possible to do these sort of systems in a kind of open source way.

Some other things I've worked on in the past, I worked on a library called OpenNMT, which was an open source translation library written in PyTorch IntensorFlow. And we worked with a lot of translation companies, particularly in Europe, to try to build open source tools to let them build their own kind of custom Google Translate-like services. And that was a really fun project. I kind of tied together the research we were doing in my lab, which was on kind of questions of how to improve translation, how to speed it up, how to make it work on devices with questions of how open source world these were used.

So I'm kind of curious since you kind of alluded a little bit to one thing that's kind of happened in recent years in terms of how I guess people maybe used to think about NLP and still do for many tasks as far as like computational linguists have been thinking about these things for a very long time. But now there's been all of this focus on kind of extending these tasks to maybe generalized machine learning type problems. Could you give your perspective on kind of how that shift has happened and like what that's meant both in terms of momentum in the field and people getting involved in the field and all of that? What are your thoughts on that?

Yeah, let's see. So I think there's a couple of different perspectives. I don't want to make it seem like kind of data-driven or machine learning systems were kind of new to NLP. There's a long history of use of learning both in NLP, but also kind of learning systems about the NLP being used in other areas.

So I think it's a field that's always kind of interacted with these methods in a kind of open dialogue. I think the phenomenon we're seeing now is kind of more extreme and it's extreme for a couple of reasons. I mean, one is the sheer growth of all these fields. We're seeing kind of exponential growth in conference sizes and paper submissions in kind of usage of this technology, which I honestly think is a great problem to have, but it obviously brings with it a lot of challenges.

So they're kind of organizational questions of kind of running communities or kind of trying to kind of make progress in this world. I think the other question is what does it mean in terms of methods? We're seeing lots of interesting things along those lines. I think that people in the field are adapting to the challenges that kind of come kind of from the world around.

Like as researchers, we're interested in solving the problems that exist now and a lot of the problems in NLP are suddenly kind of data set problems. How do we construct interesting novel and difficult data sets? How do we analyze models to understand what they're doing and how they're structured and what they're learning? And kind of societal questions of how do we understand what biases they might have or what issues they might bring or even how they might learn, like from what signals are they picking up on?

And so there's no shortage of interesting research going on. It's just that what's interesting is maybe less so the kind of how do you make the benchmark problem go up next number of points. So I'm kind of curious. I've been thinking about listening to this and you know, we had Klemback on the show back.

I think it was episode 35. Yeah, going way back. It was before the Transformers Library came out, which we'll definitely talk about later. Yeah, totally.

I think that was what I was thinking about was the fact that when we were talking to Klem, we were really kind of focused on like social AI and chatbots and similar tools and approaches. And then in that time between talking to you today and talking to Klem, you know, Transformers came out and you guys really created the definitive Transformer Library. And you know, we've been talking about hugging face in the context of Transformers since then. And I guess how did hugging face make that transition?

What caused that? And it's an interesting turn for the history of the company. Yeah, so I mean, I guess I should give some perspective. So I've actually only worked with hugging face for about eight months now.

And honestly, I ended up working there because I was such a fan. I observed them in the same way that you did, which was as an external observer, seeing them make this transition so impressively from kind of working on chatbots to being this kind of open source powerhouse. And I guess as someone who I guess, I mean, who knows what it means in open source, but as a competitor, someone building his own libraries in this space, they were just doing it so much better than I was. And so I think that that always impressed me.

Now, I should say even before Transformers came out as an official library, I have memories of, well, I guess now we're getting into some of the technical terminology. When Bert came out as a paper, there was a kind of rush to port Bert to a PyTorch version. And I was working a little bit on this at my own pace and hugging face very, very quickly put out their own version of this. Maybe part of their chatbot library maybe was a separate thing.

And I think it was really useful just to have that immediately right after the research came. And so I was really appreciative of that even at the time. What's the state of hugging face now in terms of, I know that they raised a round of funding. It seems like from what I'm picking up on Twitter that the team is growing a little bit, but from chatting with you before, it seems like it's still also very distributed.

There might be some kind of creative relationships like, of course, you're in academia, but you're also with hugging face. So what's the state of hugging face team now and how's it growing to support this really rich ecosystem of tools? Yeah, so we have about 15 to 20 people, depending on how you count, where mainly focused or entirely focused on these open source tool development. The main library is Transformers, which we've talked about and kind of is the center of what we're developing.

But now there's also several other really interesting open source projects going on. So we have a project based on NLP data sets that now has almost 150 different open and data sets that you can easily browse and download and use. They're very efficient and easy to extend way. We also have a library of tokenizers that's written in Rust, a low level library that lets you do very fast tokenization and training.

And then all this is kind of joined together by a hub of different models and structures that people have uploaded. And if you go to the website, you kind of see this kind of really rich ecosystem of different models of different data sets and different tokenizers that have built this all together. Practically, it is an interesting question of what the company is like. I mentioned earlier that I've been there longer now, probably in COVID than not in COVID.

Yeah, that's true. But it's always been a distributed company. There's a team in Paris and a team in New York. It's about half and half.

But now we also have interns in California and interns in China and people in different places. So we mostly kind of communicate through Slack and through other distributed means. ChaneLogNews is the best way to keep up with the fast moving software world. We track, blog and contextualize the coolest projects, the best practices and the biggest stories each and every week.

Make ChaneLog.com your daily destination or hit the snooze button and subscribe to our weekly newsletter that hits inboxes on Sunday mornings. Join more than 15,000 enthusiastic readers. It'll cost you exactly zero dollars and you can subscribe right now at ChaneLog.com slash weekly. So I guess we've alluded to Transformers several times now and kind of talked around a little bit.

For those who are new to the topic, could you kind of define what is a transformer? I mean, it's been a big, big deal in recent months and has really changed general P, but a lot of people may not be familiar with or have not kept up to date. Could you kind of just give us our basic run through from the way you see it? Sure.

So I think the term transformer really kind of implicitly applies to different innovations. And both of these were actually connected to each other, but both pretty transformative in their own right. So I'll start with the first. So the first is the transformer as an architecture.

So this is the particular kind of development of a very specific type of architecture that came out. And the kind of dominant architecture in natural image processing for about five years had been recurrent neural networks, particularly the LSTM network, and was used basically for everything that we did in the field. And the transformer proposed a different and in fact kind of simpler architecture that instead of kind of reliant on these recurrent connections kind of connections over time, instead use a kind of random addressing style of architecture based on a mechanism called attention. And the way it works is that you basically have everything you've seen in the past ready to access at every point in time.

And the main kind of neural network step that you take is a kind of soft random addressing over all your previous history. And you use that in order to compute the next stage in your sequence. So instead of kind of keeping a fixed length vector that gets transformed over time, you keep around everything and you basically search through it at every stage in the process. And this architecture wasn't kind of new on its own right, but kind of demonstrating that it was more effective than recurrent neural networks.

And that particularly could scale to both kind of fast training and also very, very large models better than recurrent networks was kind of a big breakthrough in the field. And the first results showed kind of large improvements on translation accuracy. Just a quick question. You mentioned attention and you sort of defined it in the larger thing, but just because it's kind of a key aspect of that, can you talk about what part of that was attention just to differentiate it from the larger process?

Sure. Yeah, I should say the original transformer paper has the title attention is all you need. It's kind of the key aspect of what makes the transformer. Attention itself is actually quite simple and it's actually kind of very intuitively appealing idea.

So imagine you have a set of objects, say, five different objects, and you want to have a neural network decide which one of those objects you want to use. You might have a softmax layer where the softmax gives you a distribution, a probability distribution, over which aspect you want to pick. So which are the five things you should choose. You could just end there.

And if you end it there, we would just call it multi-class classification. What attention does is it uses that distribution, the probability of each of the five things, and feeds that probability back into the model itself. So it would give a weight to each of the five items and then feed them back in with that weight. So imagine I had a sentence like, the man walked with a dog and I want to predict the next word in that sentence.

Those previous five words would be the five items I'd want to choose from. And attention would say, how much weight should I give each of those previous five words when trying to decide on the next word? So maybe I'll give it 80% to man, 5% to the, etc. and use those in the next step of the process.

All the transformer is is a kind of repeated version of that game, or say, six to 24 different rounds, where each time you look back at what you've previously decided, use it to feed it back into your network and then use that to try to predict the next step along the line. So you mentioned that this architecture also, in addition to having this new structure, also allowed some performance benefits and scaling as well. I was wondering if you could just give a sense of, because I know this is something people see out there, and in particular, I think there was a thread on Twitter about how many parameters are in the hugging face model hub and all of that. So I was wondering if you could just give us a sense of, you know, what are the sort of scale of models that are out there, people hear about like bird and GPT-2, and now, of course, we're getting flooded with GPT-3 things.

What are the sort of scale of these models, both in terms of like parameters and also like the data needed to actually train them? It's a good question. And every of these numbers off hand, models range from 50 million parameters to tens of billions of parameters at the top end. In practice, some of the larger models, it's unclear how you would even use them, say a standard GPU hardware, but scale has been a big kind of main aspect of kind of transformers usage.

But actually, maybe let me pick that question to talk a little bit about the second main innovation of the transformer. I talked about the architecture, but I think it's important to also get a sense of the second innovation. I think it actually matters even more. This is a kind of innovation that started around the use of a model called Elmo.

There were a couple other variants, one called Cove, and then it's all kind of peaked with the release of a model called BERT. And the kind of idea behind these models is to take a neural network, in the case of BERT, a transformer, and to train it on a very simple task at a very, very massive data scale. So in the case of BERT, the task is similar to the one I described previously. You're given a bunch of words, and you randomly remove some of the words and try to predict them back.

The game that you can play yourself and try to get a sense of how easy or hard it is to do. Sometimes it's really easy, sometimes it's really challenging. But the point is in the task itself, the point is to give the model a task that would require it to know something about language in order to complete, and then train it at as big a scale as you can. So it's hard to give you a sense of this.

I mean, one thing that's nice about language is you can store a ton of it in very little space. So if you have all of Wikipedia, just basically fit it on your computer. And companies like Google basically have a non-trivial amount of all the text that's ever been produced. And so you can kind of take all that text, throw it into one of these models, and then train it on a simple task.

And it turns out that in the process of trying to complete this task, the model learns a lot about how language works. We say it learns very good features for language. So once you've done that, once you've kind of trained it on all the language that you have, you can then apply it to a much smaller task that you maybe have a small amount of supervised data for. So this idea, which people call pre-training, is kind of central to how a lot of them work these days, and also to how the transformers library is designed.

So yeah, I think that's such a great and important point is that people kind of get hung up on the size of these models. And it's kind of cool to talk about those things. And in some cases, annoying to work with them because they're so large, and in some cases, hard to perform inference with. But yeah, I guess what you're saying is, you know, the task that they're trained on is just intended to help them learn good features.

And then the task that you actually want to use them for involves some fine tuning or transfer learning. Is that right? Yeah, I mean, I don't want to claim that this is finished as an idea. I think a lot of the tasks we work on now will have a kind of fine tuning stage where you take the model and learn it for a given task.

OpenAI has a slightly different model of what they're trying to achieve, which is they're not super interested in fine tuning. They want to kind of just use the model directly, kind of feed it some more sentences and try to directly. Yeah, so there is this like, because I've seen, maybe you could kind of help us through some of this jargon. It seems like people talk about some of these models.

They just like, they have so much knowledge that you can perform a task that they just right off the bat. Like, I don't know if it's question answering or information retrieval or whatever it is, without really much fine tuning. Is that what you're kind of getting out of that other model? Well, you wanted to distinguish kind of two aspects.

I think that all the kind of state of art models on kind of standard benchmark tasks, all use some sort of fine tuning. That's like a become a very standard procedure and we kind of understand how that works. But if you do fine tuning, you still need some amount of supervised data. I guess it would say it's small to medium amounts, but you need something in domain for the task you're interested in.

I think there's a lot of recent excitement for kind of a crazy idea, which is kind of zero shot or one shot idea of just the model should know how to do your task immediately right off the bat. Yeah, I think that's where I was going because they throw around this idea of zero shot and to some degree it seems sort of like magical in many ways to people I think. Yeah, I don't want to say anything on record. It's on the research frontier.

Yeah, it might turn up. That's the way to do lots of language tasks, but I think still an open question. That's right. So turning to the transformer library itself, I'm kind of curious.

So in recognizing that you've only been at the company for a limited amount of time, do you have any insight into kind of the motivation that moved the company into this transformer library itself? Was it supporting the other operations or was it just something that was an opportunity that came up? What kind of took the company there as far as you're familiar? It's a question.

The graph of the usage of this library on GitHub kind of blows me away. It went from no users to about 30,000. So I think we just hit on something that was like, I guess when you kind of hit, maybe that changes the motive. Yeah, so maybe you could describe like along with that, what is the sort of main usage pattern that people are kind of grabbing onto Transformers for?

I know that there are multiple, of course, like quite a few of different things that you could use the library for. But what do you see as the sort of like the main thrust of what people are grabbing Transformers for? What is that and how is that being supported, I guess? Yeah, this is a great question.

And I think in some ways you guys may be hidden inside into this that I would be also interested to hear about. Let me start at the high level. One thing that fascinated me about kind of current usage of deep learning is that you have people who approach it from many different angles. And in one of our papers, we kind of broke this down into three different classes.

So we talk about there being architects, there being trainers, and then there being end users. And I think within the ecosystem, Transformers kind of has different meanings to all three of those people. So if you're a company like OpenAI or like AllenAI, a company that's kind of cutting edge of research training, you use Transformers or kind of related libraries to try to build the next architecture or the next pre-trained model. And that often means running these very large training jobs on multi-GPUs over many days, and then using Transformers as a way to distribute your model through our hub and make it easy for people to use it or to adapt it for their tasks.

If you are like an expert, but maybe not kind of at the front end of the frontier of research, another kind of use case is this kind of fine-tuning use case, where you have data for your company or for a given problem that you want to solve, and you bring that data into the library, use it in training mode to fine-tune on your data set. It may take a couple hours and require some GPUs, but out of that you get a really accurate model for the tasks you're interested in. But then at the other end, you have just end users who want to use the library as a way of kind of performing kind of standard NLP tasks. You might want to use it as a way to do summarization or translation or an identity recognition or question answering, and you can often just use it completely in inference mode, maybe not even using Python, just kind of taking up a pre-trained model, using it directly for your tasks in that kind of setting.

So I think all of these people are within the machine learning ecosystem, but they kind of have different end goals or different use cases, and I think we're kind of trying to aim to support any of those kind of outcomes. So I know you have a model hub, and I was wondering if you could kind of talk about what users can find there and start incorporating into their own projects, what does the growth of that hub look like, just what kind of ecosystem has developed around it? Yeah, so the model hub is kind of part of the open source library. If you want to use a model in the library, you say model.load, and you pull off, you just pull directly down from the model hub.

You can do that with any of the models that are there. We have kind of a set of models that kind of have brand names that are very often used, so those include models like Q2 or variants of BERT or RIBERT or new models like this model called BART or a model called T5, but then it also includes a long tale of other models from the community. So this includes models that are pre-trained to target, say, biomedical text or extraction from scientific documents, or models that are trained in many different languages by the communities interested in those languages themselves, or models that are experimental or try to do other things, or one popular aspect is models that are very small, models that you can run on your phone. So the idea of the model is how all of those have the same API and have the same easy way to use them.

And one thing that we think is really interesting is that unlike kind of generic model hubs like TensorFlow's hub or PyTorch's hub, because our models are all of the same form, we can build a lot of tools and machinery around using them. So for instance, we have a visualizer that works for all of our models. You can just upload your own model and get really interesting visualization of its internal structure. Or this open source project called, I think it's called TextToCAC, built an adversarial attack system that's able to generically build attacks to any of our models in our hub.

So because they all have the same interface, it allows people to do these really longitudinal research projects across everything that's going on in the hub itself. And I should mention that now we have an inference API. On any of the pages, you can just type in some text and it will run against that model. And you can even call that from your own code directly without ever running anything on your machine, just run it on one of these servers.

And we even have a Twitter bot that we just put up last week where you can kind of tweet that in and it will run a model against your tweet. Yeah, that's great. I was wondering, before we leave the topic of the open source projects, you also mentioned these other libraries, tokenizers, and NLP, which includes the data sets and evaluation metrics. How do those fit into the puzzle and maybe interact and influence one another?

Yeah, I mean, as of the day, our interest is in building open source NLP. And I think there will continue to be kind of new variants of transformers and new pre-trained models. But kind of, as I mentioned earlier, an increasing area of innovation in NLP is to try to find the right data sets to kind of challenge these models in interesting ways. And so there's a lot of energy in data set construction these days and a proliferation of really interesting data sets of different sizes and scopes.

And so Tom Wolf, who's our main open source engineer, got very passionate about building up open source data sets and built a library that makes it very easy to use these models in Python and really makes it extremely efficient to use kind of complex data sets directly within your code across many different aspects of NLP. And so you can have a website that you can go to where you can browse through any of these data sets and kind of use them in various tasks. And one nice aspect of this is that we have a lot of examples of how to use transformers. And they had a lot of kind of custom data set code just to run the examples.

But now that code is all kind of impacted out. You can just kind of pull it in from NLP and then run the examples kind of focusing on the machine learning parts. We deserve a better internet and a brave team to have the recipe for bringing it to us. Start with Google Chrome, keep the extensions, the dev tools, and the rendering engine that make Chrome great.

Rip out the Google bits. We don't need them. Mix in ad and tracker blocking by default. Quick access to the Tor network for true private browsing and an opt-in reward system.

So you can get paid to view privacy respecting ads and turn around and use those rewards to support your favorite web creators like us. Download Brave today using the link in the show notes and give debut and try on Change.com. So to take the conversation a slightly different direction for a moment, I know from talking before the show that you put together ICLR and kind of manage that process this morning and for which is a research conference. And I'm really interested at this point.

We're in the time of COVID-19 and so much has changed across all of work but particularly conferences, many of them are going online, becoming virtual like that. Really interested in what that was like and what your experience doing it this way was and what worked what didn't. I'm just curious because I think a lot of people are kind of waiting to see what conferences are turning into and do they want to continue to go down that route or something. Yeah.

This year I was the general chair of ICLR, the International Conference of Learning Representations. It's a big machine learning conference and really the only one focused completely on deep learning. It was interesting I had the chance of being the program chair for the conference last year where we had the conference in New Orleans and then this year as a general chair and by about December we were getting prepped and then by February, March it became increasingly clear that we weren't going to be able to have this conference live. So I think we were the first AI conference to really have to be completely virtual.

We had about a month and a half before the conference to really come up with something new and we had this wonderful team led by the program chair this year, Shakira Muhammad and we wanted to do something that kind of fit the spirit of the conference and so we sat down and wrote a website for the conference from scratch and we built a website that was based around this idea that everyone in the conference would be in kind of a zoom like a slack like chat room. We used an open source platform for that and that every paper would have its own page with a video of the work and a chat room for that paper. So people would be able to talk about it or discuss it within that setting itself. And in addition we built out a bunch of social gatherings that people could have and a kind of calendar for the whole event and the main challenge is how do you run a conference asynchronous in this way.

We didn't really think it was possible to have everyone in the same place at the same time and so we wanted to kind of use things like chat rooms that kind of feel more asynchronous particularly with an international audience and the conference itself was actually was really fun. We had actually a pretty large increase in attendance over past years. We had people from all over the world particularly from some places that would have been difficult to attend a conference in other years and a ton of engagement a lot of the posters were viewed from an amount of times and maybe about a hundred thousand messages over the chat system over a couple days. I think there were challenges.

I think it's hard to get the same kind of spirit of having coffee or kind of just chatting informally in this sort of event. Things like Twitter are helpful but don't have the same kind of image but there were also kind of nice things that we ran these kind of mentorship sessions where one person was able to chat with kind of 20 folks who were interested in mentorship and kind of one to many model that actually might have been difficult at a conference. So but kind of works actually pretty nicely over zoom. Anyway it was an experimental setup since then we open sourced all the tools that we built for the conference.

You can get it online. It's if you search for mini-conf and the software is used for about five or six other major conferences since then including ADCL this year which is a big NLP conference and ICML which is another machine learning conference venue. I don't think it's something we've cracked it but in the meantime it's nice to have something we built as a community. Yeah I attended the conference I clear and I was super impressed with everything that was put together especially given the time frame.

I know you you must have had some very late nights fueled by very much coffee so congratulations on in such a short time period putting together something that was so good. I know one of the things that I appreciated you know I've been to other research conferences in person and you know posters or talks or something like that there's just so much going on that it is hard to kind of do that like you can't go to this talk at the same time as this talk and it's hard to find that person afterwards and ask them some questions about their work maybe walk by the poster or something. So it was kind of nice to just scroll through and look at the different videos especially given the time zone differences and you know shoot the authors a message that they could respond to asynchronously so that that question didn't get lost or something like that. I found that extremely useful.

What are your thoughts on assuming maybe that at some point in the future research conferences will have an in-person component again? Do you see a sort of hybrid scenario developing because I know one of the things that like with NURPs and all of that was a struggle for so many years was where people getting visas as well which is such a shame is like so many people from Africa or from Asia that we're doing amazing work but couldn't actually be at the conference because of visa issues or cost issues or whatever it is so how do you see that that future happening? Yeah it's a question we're talking actually a lot about in at ICLR right now. I don't think we have an answer and I think a lot of it will depend on kind of what the world looks like in a couple years.

So one thing we're committed to at ICLR is having the conference at venues in other locations or kind of locations that have kind of not been visited as much in the past so one thing that was very disappointing was that this year's conference for ICLR was supposed to be in Ethiopia in Anasababa and we were all really disappointed that we couldn't make it out there it would have been a really interesting event so hopefully we'll continue to kind of have conferences in a wider range of locations but I think as I was saying earlier all these areas are experiencing such kind of hyper growth that kind of ways of kind of dealing with scale that doesn't lose a kind of sense of interaction is that kind of major challenge for the community so I think we need to kind of be creative about ways to handle that problem and ways of kind of maybe giving people the same experience that I think at least I feel like I have when I was a first graduate student that kind of inspired me to continue in the field so I don't know what that looks like maybe it looks like something more distributed with a virtual component. So kind of wondering also turning the corner a little bit on just inelp in general and you know you're doing the work that you're doing you're right at the center of the inelp world in that way and it's certainly you know Daniel and I talk all the time on these episodes about the fact that the last couple of years has felt like you know NLP has really come of age you know you might say golden age of NLP is how it feels like we're in and kind of before that you know we had seen like scene ends have their moment and stuff as we've arrived where we are so far in NLP you know what does the future look like to you what kind of big challenges are open and should be focused on you know what are your thoughts they are on from this point forward. That's a hard question. Big one.

Yeah in some ways as somebody's been working on NLP for a while it's been really neat I mean I think it's way better than I could have possibly expected seeing things like translation gets the point where it's now it's just inspiring to me like it's such a useful thing and how the work that does is awesome. So what are the challenges now? I think there's a bunch I think computer vision for all its successes has also had a lot of issues and there's a lot of conversation and I'll give out kind of how to avoid some of the issues or to kind of have those conversations earlier rather than later things like what we've seen with facial recognition as a technology and kind of questions about efficacy there is I think a kind of challenging point and we've somehow managed to solve a lot of the natural language processing questions without solving some of the computational linguistics questions like things work but we have no real sense of why and as a scientist that can be a little bit frustrating we don't really know what signals these models are using to make predictions and it's very hard to know or to even ask that sort of question in a falsifiable way why did this model classify the sentence in this way why did it decide to choose this decision I mean these models are at least from a public sense completely global and so it's kind of challenging to kind of do any analysis along those lines but then more kind of practically I think there's a lot of practical questions that are not solved yet you mentioned this idea of dealing with massive massive models it's not clear if we're going to need hardware that is a hundred times bigger to run these models or whether you can use pruning and distillation to make them super small or what does it mean to run it locally or does it just make it more reliant on a cloud systems I think these all become interesting kind of systems research questions in this short term awesome well we appreciate you taking a stab at the future predictions because I know I think we've said on the podcast before any of the predictions that we make I feel like are definitely going to be false because it's always something unexpected that happens but I appreciate you giving your perspective as being part of the kind of the center of all of this work and appreciate you taking time to talk with us and kind of explain a bit about the Transformers library and things that are going on in LP thank you so much for your contributions to the community as well in terms of helping you know conferences and really pushing forward open source so appreciate you taking time to join us and looking forward to digging in to all the great things that HuggingFace is releasing and is doing thanks so much thanks having you if you're listening to this in the month of July you got a shot at some free goodies we are doing a giveaway in celebration of our friend and open source whiz as an old roachist new book 14 habits of highly productive developers if you don't know Zenobine you may have heard of his wildly popular Dracula theme it's an awesome dark mode theme for text editors terminals etc and we have three bundles of Dracula Pro and 14 habits to give away for absolutely free that's a $60 value and there are three ways to enter you can be the reviewer the socializer and the recommender hit up the link in your show notes to get started there will be three lucky winners and you can be one of them thanks to our longtime sponsors Fastly, Linode, and Robar for their continued support, do breakmaster cylinder for our amazing beats and to you for listening to practical AI we appreciate your time and attention that's all for this week we'll talk to you next time

Share this episode

Similar Episodes

Milk Proteins without the Dairy - Adam Tarshis and Dr. Cory Tobin

Jun 9, 2026 ·50m

New Technology in Severe Burn Care - Dr. Katie Bush

Jun 1, 2026 ·31m

New Methods in Early Cancer Detection - Dr. Nate Montgomery

May 25, 2026 ·39m

Strategies in Combating Chronic Kidney Disease - Dr. Salvadore Viscomi

May 17, 2026 ·37m

AI and the Future of Healthcare -- Dr. Emilia Javorsky

May 8, 2026 ·39m

The First Environmental GE Organism Release - almost! Dr. Steven Lindow

Apr 28, 2026 ·25m

Similar Podcasts

PodQuesting Dwight J Randolph- WolfShield Media PodQuesting: -By WolfShield Media and Dwight J RandolphJoin us on an exciting journey to master the world of fiction podcasting! At PodQuesting, we document our quest to improve and innovate, sharing valuable insights, strategies, and behind-the-scenes tips along the way. Whether you're an experienced podcaster or just starting your first show, our podcast is your go-to resource for everything podcasting.Discover practical advice, creative techniques, and lessons from our own experiences as we explore the ever-evolving podcasting landscape. Ready to level up your skills and embark on this adventure with us? Tune in and join the quest!Have questions or feedback? Reach out to us at [email protected] and visit our website:WolfShield.Media The PFN Cincinnati Bengals Podcast Pro Football Network The PFN Cincinnati Bengals Podcast is where you can stay up-to-date with the latest news and analysis on the Cincinnati Bengals! Our hosts, industry experts Jay Morrison and Dallas Robinson, provide weekly coverage of all the latest rumors and updates about the Bengals. Don’t forget to follow the show to receive new episodes directly in your podcast feed and leave a rating and review to let us know your thoughts. The 48 Laws of Power by Robert Greene (Full Audiobook) Robert Greene Amoral, cunning, ruthless, and instructive, this multi-million-copy New York Times bestseller is the definitive manual for anyone interested in gaining, observing, or defending against ultimate control – from the author of The Laws of Human Nature.In the book that People magazine proclaimed “beguiling” and “fascinating,” Robert Greene and Joost Elffers have distilled three thousand years of the history of power into 48 essential laws by drawing from the philosophies of Machiavelli, Sun Tzu, and Carl Von Clausewitz and also from the lives of figures ranging from Henry Kissinger to P.T. Barnum.Some laws teach the need for prudence (“Law 1: Never Outshine the Master”), others teach the value of confidence (“Law 28: Enter Action with Boldness”), and many recommend absolute self-preservation (“Law 15: Crush Your Enemy Totally”). Every law, though, has one thing in common: an interest in t Mind Force Radio.com Mind Force Radio.com Natural Strength Night is an informative, humorous, sometimes a little raucous, good-time of myth busting and honest training information from the trenches. We strive to help everyone involved with old school strength training (without steroids) to not make some common training mistakes. Along with great information, you'll hear a fair share of steroid bashing, flamingo sightings, breaking goons, iron game history, and honest drug-free training information from various leaders and strength coaches in the field to help you get real results! If your primary training information comes from reading "Muscle & Fiction" magazine we'll help get you straightened out. If you love high-intensity strength training, dinosaur style training and just like lifting heavy weights ... or loved Jack Lalanne, Sandow, Grimek, Peary Rader's Iron Man magazine, Brad Steiner's articles, Stuart McRobert's Hardgainer, Iron Nation, Osmo Kiiha's The Iron Master, you will love the show.On The Rugged Individual, we

Frequently Asked Questions

How long is this episode of Changelog Master Feed?

This episode is 46 minutes long.

When was this Changelog Master Feed episode published?

This episode was published on July 27, 2020.

What is this episode about?

Sash Rush, of Cornell Tech and Hugging Face, catches us up on all the things happening with Hugging Face and transformers. Last time we had Clem from Hugging Face on the show (episode 35), their transformers library wasn’t even a thing yet. Oh how...

Is there a transcript available for this episode?

Yes, a full transcript is available for this episode. You can read the complete transcript on the episode page.

Can I download this Changelog Master Feed episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.