Published 2019 | Version v3

MUStARD: Multimodal Sarcasm Detection Dataset

Description

Description

We release the MUStARD dataset which is a multimodal video corpus for research in automated sarcasm discovery. The dataset is compiled from popular TV shows including Friends, The Golden Girls, The Big Bang Theory, and Sarcasmaholics Anonymous. MUStARD consists of audiovisual utterances annotated with sarcasm labels. Each utterance is accompanied by its context, which provides additional information on the scenario where the utterance occurs.

Files

bert-input.txt__100lines.txt

Files (11.9 kB)

Name Size Download all
md5:acd065175cb3b51f7880790d1575f20b
8.4 kB Preview Download
md5:2014df5b783f6fc7be77c3c51b44df1c
3.6 kB Preview Download

Variables

Name Description
utterance The text of the target utterance to classify.
speaker Speaker of the target utterance.
context List of utterances (in chronological order) preceding the target utterance.
context_speakers Respective speakers of the context utterances.
sarcasm Binary label for sarcasm tag.

Details

Resource type Open dataset
Title MUStARD: Multimodal Sarcasm Detection Dataset
Creators
  • Castro, Santiago
  • Hazarika, Devamanyu
  • Pérez-Rosas, Verónica
  • Zimmermann, Roger
  • Mihalcea, Rada
  • Poria, Soujanya
  • Size 11.9 kB
    Formats JSON format (.json)
    License(s) no license information available
    External Resource https://github.com/soujanyaporia/MUStARD#mustard-multimodal-sarcasm-detection-dataset