Published 2019 | Version v2
Open dataset

YouTube-8M Dataset

Description

Description

YouTube-8M is a large-scale labeled video dataset that consists of millions of YouTube video IDs and with high-quality machine-generated & partially human-verified annotations from a diverse vocabulary of 3,800+ visual entities. 

It comprises two subsets:

8M Segments Dataset: 230K human-verified segment labels, 1000 classes, 5 segments/video
8M Dataset: May 2018 version (current): 6.1M videos, 3862 classes, 3.0 labels/video, 2.6B audio-visual features

Thus, it comes with precomputed audio-visual features from billions of frames and audio segments, designed to fit on a single hard disk. This makes it possible to train a strong baseline model on this dataset in less than a day on a single GPU! At the same time, the dataset's scale and diversity can enable deep exploration of complex audio-visual models that can take weeks to train even in a distributed fashion.

YouTube offers the YouTube8M dataset for download as TensorFlow Record files on their website. Starter code for the dataset can be found on their GitHubpage.

Variables

Name Description
id Video id
labels Video-level labels
segment_start_times Starting time of a given segment
segment_end_times End time of a given segment
segment_labels Label of the segment
segment_scores Segment is there (1) or not (0)
rgb RGB values on a frame-level
audio Audio values on a frame-level
mean_rgb Average of all RGB features for the video
mean_audio Average of all audio features for the video