Published 2010 | Version v2

Twitter Dataset

  • 1. Texas A&M University

Description

Description

This dataset is a collection of scraped public twitter updates used in coordination with an academic project to study the geolocation data related to twittering. We provide both training set and test set in the paper You Are Where You Tweet: A Content-Based Approach to Geo-locating Twitter Users in CIKM 2010. The training set contains 115,886 Twitter users and 3,844,612 updates from the users. All the locations of the users are self-labeled in United States in city-level granularity. The test set contains 5,136 Twitter users and 5,156,047 tweets from the users. All the locations of users are uploaded from their smart phones with the form of "UT: Latitude,Longitude".

Files

test_set_tweets.txt__100lines.txt

Files (30.0 kB)

Name Size Download all
md5:30b9e00a2b3aff3b8e69435840930256
11.2 kB Preview Download
md5:a3a9a5283f39d1046504c9202f520b8f
3.4 kB Preview Download
md5:9aaa0f281b66cf17669298dd2b05984f
13.3 kB Preview Download
md5:8fd8e37784a6d0bb58726effa90dd36d
2.1 kB Preview Download

Details

Resource type Open dataset
Title Twitter Dataset
Creators
  • Cheng, Zhiyuan1
  • Caverlee, James1
  • Lee, Kyumin1
  • Research Fields Business Administration Economics Psychology Sociology Political Science Economic & Social History Communication Sciences Educational Research Other
    Size 30.0 kB
    License(s) Creative Commons Attribution Noncommercial 3.0 United States License
    External Resource https://archive.org/details/twitter_cikm_2010
    Companies Twitter
    Dates of collection September 2009 – January 2010