Published 2020 | Version v2
Open dataset

Instagram Influencer Posts and Image Dataset

  • 1. University of California Los Angeles
  • 2. Sungkyunkwan University

Description

Description

This dataset contains 33,935 Instagram influencers who are classified into the following nine categories including beauty, family, fashion, fitness, food, interior, pet, travel, and other. We collect 300 posts per influencer so that there are 33,935x330 = 10,180,500 Instagram posts in the dataset. 

The dataset includes two types of files, post metadata and image files. 

1) Post metadata files are in JSON format and contain the following information: caption, usertags, hashtags, timestamp, sponsorship, likes, comments, etc. Its size is at about 37GB.

2) Image files are in JPEG format and the dataset contains 12,933,406 image files since a post can have more than one image file. The total size of these image files is 189GB.

If a post has only one image file then the JSON file and the corresponding image files have the same name. However, if a post has more than one image then the JSON file and corresponding image files have different names. Therefore, we also provide a JSON-Image_mapping file that shows a list of image files that corresponds to post metadata.

If you want to use this dataset, please cite it accordingly. The data can be accessed on the respective website link below.

"Multimodal Post Attentive Profiling for Influencer Marketing," Seungbae Kim, Jyun-Yu Jiang, Masaki Nakada, Jinyoung Han and Wei Wang. In Proceedings of The Web Conference (WWW '20), ACM, 2020.

Variables

Name Description
caption Caption of an Instagram post
usertags User names tagged in a post
hashtags Hashtags related to a post
sponsorship Sponsorship information related to the post
likes Number of likes of the post
comments Comments related to a post
post_image_link Link between post and image files