IMDb Movie Reviews Dataset
Creators
- 1. Stanford University
Description
Description
The IMDb Movie Reviews dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as positive or negative. The providers also include an additional 50,000 unlabeled documents for unsupervised learning.
The dataset contains an even number of positive and negative reviews. Only highly polarizing reviews are considered. A negative review has a score ≤ 4 out of 10, and a positive review has a score ≥ 7 out of 10. No more than 30 reviews are included per movie. The dataset also contains an additional 50,000 unlabeled documents for unsupervised learning. See the README file contained in the release for more details.
The data is split into a train (25k reviews) and test (25k reviews) set. A preview file cannot be provided - please download the data directly from the data provider's website.
When using the dataset, please cite: Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). Learning Word Vectors for Sentiment Analysis. The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011).
Files
Files
(4.0 kB)
Name | Size | Download all |
---|---|---|
md5:caa4504e603928cc7c7a1abf204bc64f
|
4.0 kB | Download |
Variables
Name | Description |
---|---|
id | Review ID |
rating | Star rating of a review on a 1-10 scale |
pos / neg / unsup | The polarity of the review (positive / negative). No polarity score is provided for reviews for unsupervised learning. |
text | Review text |
Details
Resource type | Open dataset |
Title | IMDb Movie Reviews Dataset |
Creators |
|
Research Fields | Business Administration Economics Psychology Sociology Political Science Economic & Social History Communication Sciences Educational Research Other |
Size | 0.08 GB |
License(s) | Custom license |
External Resource | https://ai.stanford.edu/~amaas/data/sentiment/ |