2020 U.S. Election Emails
Description
Description
This is a preliminary release of the code and data associated with the research paper "Manipulative tactics are the norm in political emails: Evidence from 300K emails from the 2020 U.S. election cycle".
The corpus contains emails from over 3,000 political campaigns and organizations in the 2020 election cycle in the U.S. The corpus aims to be comprehensive and includes coverage of emails from the candidates in prominent federal and state races as well as political organizations such as Political Action Committees (PACs) and political parties active in the 2020 cycle. We automated the process of signing up to receive emails from the websites of the political campaigns and organizations. For each entity's website, if the bot discovered an email sign-up form, it filled it in with the information of a fictional recipient The entire dataset contains 317,366 emails.
Variables
| Name | Description |
|---|---|
| uid_email | A unique id for each email |
| uid_inbox | A unique id for the email's inbox |
| name | The name of the entity (candidate or organization) we signed up for. Note that as we document in our accompanying research paper, signing up for a particular entity results in email leaks to other entities who may also send emails |
| source | Indicates the entity's type: "ballotpedia-campaign" if the email was from a candidate running for office and "orgs" if the email was from an organization |
| office_sought | For candidates, this describes the office pertaining to the election. E.g., "President of the United States". For organizations, this is not applicable and thus a missing value |
| party_affiliation | For candidates, this describes the candidate's political party. E.g., "Democratic Party" and "Republican Party". For organizations, this is not applicable and thus a missing value |
| incumbent | For candidates, this indicates whether the candidate was an incumbent in the election (i.e., "yes" or "no"). For organizations, this is not applicable and thus a missing value |
| office_level | For candidates, this describes what level the office was sought at (i.e., "Federal" or "State"). For organizations, this is not applicable and thus a missing value |
| district_type | For candidates, this describes the office's district. For organizations, this is not applicable and thus a missing value |
| state | For candidates running for a state office, this describes the state. For organizations, this is not applicable and thus a missing value |
| type | For organizations, this describes the type of organization. For candidates, this is not applicable and thus a missing value |
| subtype | For organizations, this describes the subtype of the organization. We do not have this field for all organizations. For candidates, this is not applicable and thus a missing value |
| final_website | The entity's website on which we signed up on to receive emails |
| crawl_date | The date we signed up to receive emails from the entity |
| from_name | The sender's name displayed in the "from" field of each email |
| from_address | The sender's email address |
| date | The date we received the email in ET |
| day | The day we received the email (Mon/Tue etc) in ET |
| hour | The hour we received the email (00-23) in ET |
| subject | The subject of the email |
| body_text | The email body in text format. If an email contained both HTML and plain text parts, we extracted text from the HTML part only |
Details
| Resource type | Open dataset |
| Title | 2020 U.S. Election Emails |
| Alternative title | Princeton Corpus of Political Emails |
| Creators |
|
| Formats | CSV |
| License(s) | Custom license |
| External Resource | https://electionemails2020.org/#data |
| Countries | United States |
| Dates of collection | Dec 3, 2019 – Nov 3, 2020 |