Published September 14, 2024 | Version v1

Tweet Annotation Sensitivity Experiment 2

Description

Description

The dataset contains tweet data annotations of hate speech (HS) and offensive language (OL) in five experimental conditions. The tweet data was sampled from the corpus created by Davidson et al. (2017). We selected 3,000 Tweets for our annotation. We developed five experimental conditions that varied the annotation task structure, as shown in the following figure. All tweets were annotated in each condition.

  • Condition A presented the tweet and three options on a single screen: hate speech, offensive language, or neither. Annotators could select one or both of hate speech, offensive language, or indicate that neither applied.

  • Conditions B and C split the annotation of a single tweet across two screens.

    • For Condition B, the first screen prompted the annotator to indicate whether the tweet contained hate speech. On the following screen, they were shown the tweet again and asked whether it contained offensive language.
    • Condition C was similar to Condition B, but flipped the order of hate speech and offensive language for each tweet.
  • In Conditions D and E, the two tasks are treated independently with annotators being asked to first annotate all tweets for one task, followed by annotating all tweets again for the second task.

    • Annotators assigned Condition D were first asked to annotate hate speech for all their assigned tweets, and then asked to annotate offensive language for the same set of tweets.
    • Condition E worked the same way, but started with the offensive language annotation task followed by the hate speech annotation task.

We recruited US-based annotators from the crowdsourcing platform Prolific during November and December 2022. Each annotator annotated up to 50 tweets. The dataset also contains demographic information about the annotators. Annotators received a fixed hourly wage in excess of the US federal minimum wage after completing the task.

Variables

Name Description
case_id case ID
duration_seconds duration of connection to task in seconds
last_screen last question answered
device device type
ethn_hispanic Hispanic race/ethnicity
ethn_white White race/ethnicity
ethn_afr_american African-American race/ethnicity
ethn_asian Asian race/ethnicity
ethn_sth_else race/ethnicity something else
ethn_prefer_not race/ethnicity prefer not to say
age age
education education attainment 1: Less than high school 2: High school 3: Some college 4: College graduate 5: Master's degree or professional degree (law, medicine, MPH, etc.) 6: Doctoral degree (PhD, DPH, EdD, etc.)
english_fl English as first language
twitter_use Twitter use frequency 1: Most days 2: Most weeks, but not every day 3: A few times a month 4: A few times a year 5: Less often 6: Never
socmedia_use social media use frequency 1: Most days 2: Most weeks, but not every day 3: A few times a month 4: A few times a year 5: Less often 6: Never
prolific_hours workload on the platform prolific in hours in the last month
task_fun task perception: fun
task_interesting task perception: interesting
task_boring task perception: boring
task_repetitive task perception: repetitive
task_important task perception: important
task_depressing task perception: depressing
task_offensive task perception: offensive
repeat_tweet_coding likelihood for another tweet task 1: Not at all likely 2: Somewhat likely 3: Very likely
repeat_hs_coding likelihood for another hate speech task 1: Not at all likely 2: Somewhat likely 3: Very likely
target_online_harassment targeted by hateful online behavior
target_other_harassment targeted by other hateful behavior
party_affiliation party identification 1: Republican 2: Democrat 3: Independent
societal_relevance_hs relevance perception of hate speech 1: Not at all likely 2: Somewhat likely 3: Very likely
annotator_id annotator ID
condition experimental conditions (A-E)
tweet_batch tweet ID in batch
hate_speech hate speech annotation
offensive_language offensive language annotation
tweet_id tweet ID
orig_label_hs number of persons who annotated the tweet as hate speech in the original dataset from Davidson et al. (2017)
orig_label_ol number of persons who annotated the tweet as offensive language in the original dataset from Davidson et al. (2017)
orig_label_ne number of persons who annotated the tweet as neither in the original dataset from Davidson et al. (2017)
tweet_hashed tweet with usernames hashed

Details

Resource type Funded research project dataset
Title Tweet Annotation Sensitivity Experiment 2
Creators
  • Beck, Jacob1, 2 ORCID icon
  • Eckman, Stephanie3 ORCID icon
  • Chew, Rob4 ORCID icon
  • Kreuter, Frauke1, 3, 2 ORCID icon
  • Ma, Bolei1 ORCID icon
  • Kern, Christoph1 ORCID icon
  • Research Fields Other Psychology
    Size 23.3 MB
    Formats Comma-separated values (CSV) (.csv)
    External Resource https://huggingface.co/datasets/soda-lmu/tweet-annotation-sensitivity-2/blob/main/publication_dataset.csv
    Countries United States
    Dates of collection December 21, 2022