Published 2015
| Version v2
Yelp Business Review & Images Dataset
Creators
Description
Description
The Yelp dataset is a subset of businesses, reviews, and user data for use in personal, educational, and academic purposes. It contains 6.9M online reviews for 150k businesses. It also includes more than 200,000 images related to the reviews.
The data consists of multiple sub datasets:
- Yelp Business data: Contains business data including location data, attributes, and categories.
- Yelp Review data: Contains full review text data including the user_id that wrote the review and the business_id the review is written for.
- Yelp User data: User data including the user's friend mapping and all the metadata associated with the user.
- Yelp Checkin data: Checkins on a business.
- Yelp Tip data: Tips written by a user on a business. Tips are shorter than reviews and tend to convey quick suggestions.
- Yelp Photo data: Contains photo data including the caption and classification (one of "food", "drink", "menu", "inside" or "outside").
Available as JSON files, use can use it to teach students about databases, to learn NLP, or for sample production data while you learn how to make mobile apps.
Variables
Name | Description |
---|---|
business_id | 22 character unique string business id |
name | The business's name |
address | The full address of the business |
city | The city where the business is |
state | 2 character state code, if applicable |
postal code | The postal code of the business |
latitude | Latitude of the reviewed business |
longitude | Longitude of the reviewed business |
stars | Star rating of the business, rounded to half-stars |
review_count | Number of reviews of the business |
is_open | 0 or 1 for closed or open business, respectively |
attributes | Business attributes to values, e.g., RestaurantsTakeOut and BusinessParking |
categories | An array of strings of business categories, e.g, "Mexican", "Burgers", "Gastropubs" |
hours | An object of key day to value hours, e.g., "Monday": "10:00-21:00" |
review_id | 22 character unique review id |
user_id | 22 character unique user id |
stars | Star rating provided in a rating |
date | Review date, formatted YYYY-MM-DD |
text | The review itself |
useful | Number of useful votes received |
funny | Number of funny votes received |
cool | Number of cool votes received |
name | The user's first name |
review_count | The number of reviews the user has written |
yelping_since | When the user joined Yelp, formatted like YYYY-MM-DD |
friends | An array of the user's friend as user_ids |
useful | Number of useful votes sent by the user |
funny | Number of funny votes sent by the user |
cool | Number of cool votes sent by the user |
fans | Number of fans the user has |
elite | The years the user was elite |
average_stars | Average rating of all reviews provided by a user |
compliment_hot | Number of hot compliments received by the user |
compliment_more | Number of more compliments received by the user |
compliment_profile | Number of profile compliments received by the user |
compliment_cute | Number of cute compliments received by the user |
compliment_list | Number of list compliments received by the user |
compliment_note | Number of note compliments received by the user |
compliment_plain | Number of plain compliments received by the user |
compliment_cool | Number of cool compliments received by the user |
compliment_funny | Number of funny compliments received by the user |
compliment_writer | Number of writer compliments received by the user |
compliment_photos | Number of photo compliments received by the user |
date | A comma-separated list of timestamps for each checkin, each with format YYYY-MM-DD HH:MM:SS |
text | Text of the tip |
date | When the tip was written, formatted like YYYY-MM-DD |
compliment_count | How many compliments a tip has |
photo_id | 22 character unique photo id |
caption | The photo caption, if any |
label | The category the photo belongs to, if any, e.g., "food" |
Details
Resource type | Open dataset |
Title | Yelp Business Review & Images Dataset |
Creators |
|
Research Fields | Business Administration Economics Psychology Sociology Political Science Economic & Social History Communication Sciences Educational Research |
Size | 8.9 GB ; 6,990,280 reviews ; 200,100 pictures |
Formats | JSON format (.json) Comma-separated values (CSV) (.csv) |
License(s) | Custom License by Yelp |
External Resource | https://www.yelp.com/dataset/download |
Companies | Yelp |
Industries | Social Media |