YouTube-8M Dataset
Creators
- 1. Google Research
Description
Description
YouTube-8M is a large-scale labeled video dataset that consists of millions of YouTube video IDs and with high-quality machine-generated & partially human-verified annotations from a diverse vocabulary of 3,800+ visual entities.
It comprises two subsets:
8M Segments Dataset: 230K human-verified segment labels, 1000 classes, 5 segments/video
8M Dataset: May 2018 version (current): 6.1M videos, 3862 classes, 3.0 labels/video, 2.6B audio-visual features
Thus, it comes with precomputed audio-visual features from billions of frames and audio segments, designed to fit on a single hard disk. This makes it possible to train a strong baseline model on this dataset in less than a day on a single GPU! At the same time, the dataset's scale and diversity can enable deep exploration of complex audio-visual models that can take weeks to train even in a distributed fashion.
YouTube offers the YouTube8M dataset for download as TensorFlow Record files on their website. Starter code for the dataset can be found on their GitHubpage.
Variables
| Name | Description |
|---|---|
| id | Video id |
| labels | Video-level labels |
| segment_start_times | Starting time of a given segment |
| segment_end_times | End time of a given segment |
| segment_labels | Label of the segment |
| segment_scores | Segment is there (1) or not (0) |
| rgb | RGB values on a frame-level |
| audio | Audio values on a frame-level |
| mean_rgb | Average of all RGB features for the video |
| mean_audio | Average of all audio features for the video |
Details
| Resource type | Open dataset |
| Title | YouTube-8M Dataset |
| Creators |
|
| Research Fields | Business Administration Economics Psychology Sociology Political Science Economic & Social History Communication Sciences Educational Research Other |
| Size | 1530 GB |
| Formats | TensorFlow Record Files |
| License(s) | Creative Commons Attribution 4.0 International |
| External Resource | https://research.google.com/youtube8m/download.html |
| Companies | Google YouTube |
| Industries | Social Media |
| Dates of collection | June 1, 2019 |