Taming arXiv with Natural Language Processing w/ John Bohannon - TWiML Talk #136 episode artwork

EPISODE · May 7, 2018 · 54 MIN

Taming arXiv with Natural Language Processing w/ John Bohannon - TWiML Talk #136

from The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) · host Sam Charrington

In this episode i'm joined by John Bohannan, Director of Science at AI startup Primer. As you all may know, a few weeks ago we released my interview with Google legend Jeff Dean, which, by the way, you should definitely check if you haven’t already. Anyway, in that interview, Jeff mentions the recent explosion of machine learning papers on arXiv, which I responded to jokingly by asking whether Google had already developed the AI system to help them summarize and track all of them. While Jeff didn’t have anything specific to offer, a listener reached out and let me know that John was in fact already working on this problem. In our conversation, John and I discuss his work on Primer Science, a tool that harvests content uploaded to arxiv, sorts it into natural topics using unsupervised learning, then gives relevant summaries of the activity happening in different innovation areas. We spend a good amount of time on the inner workings of Primer Science, including their data pipeline and some of the tools they use, how they determine “ground truth” for training their models, and the use of heuristics to supplement NLP in their processing. The notes for this show can be found at twimlai.com/talk/136

In this episode i'm joined by John Bohannan, Director of Science at AI startup Primer. As you all may know, a few weeks ago we released my interview with Google legend Jeff Dean, which, by the way, you should definitely check if you haven’t already. Anyway, in that interview, Jeff mentions the recent explosion of machine learning papers on arXiv, which I responded to jokingly by asking whether Google had already developed the AI system to help them summarize and track all of them. While Jeff didn’t have anything specific to offer, a listener reached out and let me know that John was in fact already working on this problem. In our conversation, John and I discuss his work on Primer Science, a tool that harvests content uploaded to arxiv, sorts it into natural topics using unsupervised learning, then gives relevant summaries of the activity happening in different innovation areas. We spend a good amount of time on the inner workings of Primer Science, including their data pipeline and some of the tools they use, how they determine “ground truth” for training their models, and the use of heuristics to supplement NLP in their processing. The notes for this show can be found at twimlai.com/talk/136

NOW PLAYING

Taming arXiv with Natural Language Processing w/ John Bohannon - TWiML Talk #136

0:00 54:17

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Frequently Asked Questions

How long is this episode of The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)?

This episode is 54 minutes long.

When was this The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) episode published?

This episode was published on May 7, 2018.

What is this episode about?

In this episode i'm joined by John Bohannan, Director of Science at AI startup Primer. As you all may know, a few weeks ago we released my interview with Google legend Jeff Dean, which, by the way, you should definitely check if you haven’t already....

Can I download this The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) episode?

Yes, you can download this episode by clicking the download button on the episode player, or subscribe to the podcast in your preferred podcast app for automatic downloads.
URL copied to clipboard!