Published 2023 | Version v2
Open dataset

2020 U.S. Election Emails

Description

Description

This is a preliminary release of the code and data associated with the research paper "Manipulative tactics are the norm in political emails: Evidence from 300K emails from the 2020 U.S. election cycle".

The corpus contains emails from over 3,000 political campaigns and organizations in the 2020 election cycle in the U.S. The corpus aims to be comprehensive and includes coverage of emails from the candidates in prominent federal and state races as well as political organizations such as Political Action Committees (PACs) and political parties active in the 2020 cycle. We automated the process of signing up to receive emails from the websites of the political campaigns and organizations. For each entity's website, if the bot discovered an email sign-up form, it filled it in with the information of a fictional recipient The entire dataset contains 317,366 emails.

Variables

Name Description
uid_email A unique id for each email
uid_inbox A unique id for the email's inbox
name The name of the entity (candidate or organization) we signed up for. Note that as we document in our accompanying research paper, signing up for a particular entity results in email leaks to other entities who may also send emails
source Indicates the entity's type: "ballotpedia-campaign" if the email was from a candidate running for office and "orgs" if the email was from an organization
office_sought For candidates, this describes the office pertaining to the election. E.g., "President of the United States". For organizations, this is not applicable and thus a missing value
party_affiliation For candidates, this describes the candidate's political party. E.g., "Democratic Party" and "Republican Party". For organizations, this is not applicable and thus a missing value
incumbent For candidates, this indicates whether the candidate was an incumbent in the election (i.e., "yes" or "no"). For organizations, this is not applicable and thus a missing value
office_level For candidates, this describes what level the office was sought at (i.e., "Federal" or "State"). For organizations, this is not applicable and thus a missing value
district_type For candidates, this describes the office's district. For organizations, this is not applicable and thus a missing value
state For candidates running for a state office, this describes the state. For organizations, this is not applicable and thus a missing value
type For organizations, this describes the type of organization. For candidates, this is not applicable and thus a missing value
subtype For organizations, this describes the subtype of the organization. We do not have this field for all organizations. For candidates, this is not applicable and thus a missing value
final_website The entity's website on which we signed up on to receive emails
crawl_date The date we signed up to receive emails from the entity
from_name The sender's name displayed in the "from" field of each email
from_address The sender's email address
date The date we received the email in ET
day The day we received the email (Mon/Tue etc) in ET
hour The hour we received the email (00-23) in ET
subject The subject of the email
body_text The email body in text format. If an email contained both HTML and plain text parts, we extracted text from the HTML part only

Details

Resource type Open dataset
Title 2020 U.S. Election Emails
Alternative title Princeton Corpus of Political Emails
Creators
  • Mathur, Arunesh
  • Wang, Angelina
  • Schwemmer, Carsten
  • Hamin, Maia
  • Stewart, Brandon M
  • Narayana, Arvind
  • Formats CSV
    License(s) Custom license
    External Resource https://electionemails2020.org/#data
    Countries United States
    Dates of collection Dec 3, 2019 – Nov 3, 2020