Published October 20, 2023 | Version v1

Tweet Annotation Sensitivity Experiment 1

  • 1. LMU Munich
  • 2. Munich Center for Machine Learning
  • 3. University of Maryland
  • 4. RTI International

Description

Description

We drew a stratified sample of 20 tweets, that were pre-annotated in a study by Davidson et al. (2017) for Hate Speech / Offensive Language / Neither. The stratification was done with respect to majority-voted class and level of disagreement.

We then recruited 1000 Prolific workers to annotate each of the 20 tweets. Annotators were randomly selected into one of six experimental conditions, as shown in the following figures. In these conditions, they were asked to assign the labels Hate Speech / Offensive Language / Neither.

In addition, we collected a variety of demographic variables (e.g. age and gender) and some para data (e.g. duration of the whole task, duration per screen).

Variables

Name Description
id annotator ID
age Age
gender Gender 1: Female 2: Male 3: Something Else 4: Prefer not to say
afam African-American 0: No 1: Yes
asian Asian-American 0: No 1: Yes
hispanic Hispanic 0: No 1: Yes
white White 0: No 1: Yes
race_other Other race/ethnicity 0: No 1: Yes
race_not_say Prefer not to say race/ethnicity 0: No 1: Yes
education Highest educational attainment 1: Less than high school 2: High school 3: Some college 4: College graduate 5: Master's degree or professional degree (Law, Medicine, MPH, etc.) 6: Doctoral degree (PhD, DPH, EdD, etc.)
sexuality Sexuality 1: Gay or Lesbian 2: Bisexual 3: Straight 4: Something Else
english English first language? 0: No 1: Yes
tw_use Twitter Use 1: Most days 2: Most weeks, but not every day 3: A few times a month 4: A few times a year 5: Less often 6: Never
social_media_use Social Media Use 1: Most days 2: Most weeks, but not every day 3: A few times a month 4: A few times a year 5: Less often 0: Never
prolific_hours Prolific hours worked last month
task_fun Coding work was: fun 0: No 1: Yes
task_interesting Coding work was: interesting 0: No 1: Yes
task_boring Coding work was: boring 0: No 1: Yes
task_repetitive Coding work was: repetitive 0: No 1: Yes
task_important Coding work was: important 0: No 1: Yes
task_depressing Coding work was: depressing 0: No 1: Yes
task_offensive Coding work was: offensive 0: No 1: Yes
another_tweettask Likelihood to do another Tweet related task not at all: Not at all likely somewhat: Somewhat likely very: Very likely
another_hatetask Likelihood to do another Hate Speech related task not at all: Not at all likely somewhat: Somewhat likely very: Very likely
page_history Order in which annotator saw pages
date_of_first_access Datetime of first access
date_of_last_access Datetime of last access
duration_sec Task duration in seconds
version Version of annotation task A: Version A B: Version B C: Version C D: Version D E: Version E F: Version F
tw1-20 Label assigned to Tweet 1-20 hate speech: Hate Speech offensive language: Offensive Language neither: Neither HS nor OL NA: Missing or "don't know"
tw_duration_1-20 Annotation duration in milliseconds Tweet 1-20
num_approvals Prolific data: number of previous task approvals of annotator
num_rejections Prolific data: number of previous task rejections of annotator
prolific_score Annotator quality score by Prolific
countryofbirth Prolific data: Annotator country of birth
currentcountryofresidence Prolific data: Annotator country of residence
employmentstatus Prolific data: Annotator Employment Status Full-timePart-time Unemployed (and job-seeking) Due to start a new job within the next month Not in paid work (e.g. homemaker, retired or disabled) Other DATA EXPIRED
firstlanguage Prolific data: Annotator first language
nationality Prolific data: Nationality
studentstatus Prolific data: Student status Yes No DATA EXPIRED

Details

Resource type Funded research project dataset
Title Tweet Annotation Sensitivity Experiment 1
Creators
  • Beck, Jacob1, 2 ORCID icon
  • Eckman, Stephanie3 ORCID icon
  • Chew, Rob4 ORCID icon
  • Kreuter, Frauke1, 3, 2 ORCID icon
  • Research Fields Other Psychology
    Size 1.09 MB
    Formats Comma-separated values (CSV) (.csv)
    External Resource https://huggingface.co/datasets/soda-lmu/tweet-annotation-sensitivity-1/blob/main/dataset_1st_study.csv
    Countries United States
    Dates of collection December 2021

    Additional Details

    Related works

    Is cited by
    Conference paper: 10.1007/978-3-031-21707-4_19 (DOI)