Tweet Annotation Sensitivity Experiment 2

Beck, Jacob; Eckman, Stephanie; Chew, Rob; Kreuter, Frauke; Ma, Bolei; Kern, Christoph

Published September 14, 2024 | Version v1

Funded research project dataset Metadata-only

Tweet Annotation Sensitivity Experiment 2

1. LMU Munich
2. Munich Center for Machine Learning
3. University of Maryland
4. RTI International

Description

The dataset contains tweet data annotations of hate speech (HS) and offensive language (OL) in five experimental conditions. The tweet data was sampled from the corpus created by Davidson et al. (2017). We selected 3,000 Tweets for our annotation. We developed five experimental conditions that varied the annotation task structure, as shown in the following figure. All tweets were annotated in each condition.

Condition A presented the tweet and three options on a single screen: hate speech, offensive language, or neither. Annotators could select one or both of hate speech, offensive language, or indicate that neither applied.
Conditions B and C split the annotation of a single tweet across two screens.
- For Condition B, the first screen prompted the annotator to indicate whether the tweet contained hate speech. On the following screen, they were shown the tweet again and asked whether it contained offensive language.
- Condition C was similar to Condition B, but flipped the order of hate speech and offensive language for each tweet.
In Conditions D and E, the two tasks are treated independently with annotators being asked to first annotate all tweets for one task, followed by annotating all tweets again for the second task.
- Annotators assigned Condition D were first asked to annotate hate speech for all their assigned tweets, and then asked to annotate offensive language for the same set of tweets.
- Condition E worked the same way, but started with the offensive language annotation task followed by the hate speech annotation task.

We recruited US-based annotators from the crowdsourcing platform Prolific during November and December 2022. Each annotator annotated up to 50 tweets. The dataset also contains demographic information about the annotators. Annotators received a fixed hourly wage in excess of the US federal minimum wage after completing the task.

Name	Description
case_id	case ID
duration_seconds	duration of connection to task in seconds
last_screen	last question answered
device	device type
ethn_hispanic	Hispanic race/ethnicity
ethn_white	White race/ethnicity
ethn_afr_american	African-American race/ethnicity
ethn_asian	Asian race/ethnicity
ethn_sth_else	race/ethnicity something else
ethn_prefer_not	race/ethnicity prefer not to say
age	age
education	education attainment 1: Less than high school 2: High school 3: Some college 4: College graduate 5: Master's degree or professional degree (law, medicine, MPH, etc.) 6: Doctoral degree (PhD, DPH, EdD, etc.)
english_fl	English as first language
twitter_use	Twitter use frequency 1: Most days 2: Most weeks, but not every day 3: A few times a month 4: A few times a year 5: Less often 6: Never
socmedia_use	social media use frequency 1: Most days 2: Most weeks, but not every day 3: A few times a month 4: A few times a year 5: Less often 6: Never
prolific_hours	workload on the platform prolific in hours in the last month
task_fun	task perception: fun
task_interesting	task perception: interesting
task_boring	task perception: boring
task_repetitive	task perception: repetitive
task_important	task perception: important
task_depressing	task perception: depressing
task_offensive	task perception: offensive
repeat_tweet_coding	likelihood for another tweet task 1: Not at all likely 2: Somewhat likely 3: Very likely
repeat_hs_coding	likelihood for another hate speech task 1: Not at all likely 2: Somewhat likely 3: Very likely
target_online_harassment	targeted by hateful online behavior
target_other_harassment	targeted by other hateful behavior
party_affiliation	party identification 1: Republican 2: Democrat 3: Independent
societal_relevance_hs	relevance perception of hate speech 1: Not at all likely 2: Somewhat likely 3: Very likely
annotator_id	annotator ID
condition	experimental conditions (A-E)
tweet_batch	tweet ID in batch
hate_speech	hate speech annotation
offensive_language	offensive language annotation
tweet_id	tweet ID
orig_label_hs	number of persons who annotated the tweet as hate speech in the original dataset from Davidson et al. (2017)
orig_label_ol	number of persons who annotated the tweet as offensive language in the original dataset from Davidson et al. (2017)
orig_label_ne	number of persons who annotated the tweet as neither in the original dataset from Davidson et al. (2017)
tweet_hashed	tweet with usernames hashed

Resource type	Funded research project dataset
Title	Tweet Annotation Sensitivity Experiment 2
Creators	Beck, Jacob^{1, 2} Eckman, Stephanie³ Chew, Rob⁴ Kreuter, Frauke^{1, 3, 2} Ma, Bolei¹ Kern, Christoph¹
Research Fields	Other Psychology
Size	23.3 MB
Formats	Comma-separated values (CSV) (.csv)
External Resource	https://huggingface.co/datasets/soda-lmu/tweet-annotation-sensitivity-2/blob/main/publication_dataset.csv
Countries	United States
Dates of collection	December 21, 2022

	All versions	This version
Views	96	96
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Tweet Annotation Sensitivity Experiment 2

Description

Variables

Details

Tweet Annotation Sensitivity Experiment 2

Creators

Description

Description

Variables

Details