Building the howto100m Video Corpus

Video annotation is an expensive and time-consuming process. As a consequence, the available video datasets are useful but small. The availability of machine transcribed explainer videos offers a unique opportunity to rapidly develop a useful, if dirty, corpus of videos that are "self annotating", as hosts explain the actions they are taking on the screen. This episode is a discussion of the HowTo100m dataset - a project which has assembled a video corpus of 136M video clips with captions covering 23k activities. Related Links The paper will be presented at ICCV 2019 @antoine77340 Antoine on Github Antoine's homepage

Episode metadata supplied by the publisher feed · Published Aug 19, 2019

Embed this episode

Attribution link and audio player

NOW PLAYING

Building the howto100m Video Corpus

0:00 22:38

1×

No transcript for this episode yet

We transcribe on demand. Request one and we'll notify you when it's ready — usually under 10 minutes.

Share this episode

Similar Episodes

No similar episodes found.

Similar Podcasts

No similar podcasts found.

Frequently Asked Questions

How long is this episode of Data Skeptic?

This episode is 22 minutes long.

When was this Data Skeptic episode published?

This episode was published on August 19, 2019.

Can I download this Data Skeptic episode?

Yes. Use the download control on the episode player to save the publisher-provided media file.

URL copied to clipboard!