SpeakGer: A meta-data enriched speech corpus of German state and federal parliaments

  • 1. TU Dortmund University

Published: September 22, 2023

Version v2

Description

A dataset of German parliament debates covering 74 years of plenary protocols across all 16 state parliaments of Germany as well as the German Bundestag. The debates are separated into individual speeches which are enriched with meta data identifying the speaker as a member of the parliament (mp). 

When using this data set, please cite the original paper "Lange, K.-R., Jentsch, C. (2023). SpeakGer: A meta-data enriched speech corpus of German state and federal parliaments. Proceedings of the 3rd Workshop on Computational Linguistics for Political Text Analysis@KONVENS 2023.".

The meta data is separated into two different types: time-specific meta-data that contains only information for a legislative period but can change over time (e.g. the party or constituency of an mp) and meta-data that is considered fixed, such as the birth date or the name of a speaker. The former information are stored aong with the speeches as it is considered temporal information of that point in time, but are additionally stored in the file all_mps_mapping.csv if there is the need to double-check something. The rest of the meta-data are stored in the file all_mps_meta.csv. The meta-data from this file can be matched with a speech by comparing the speaker ID-variable "MPID". The speeches of each parliament are saved in a csv format. Along with the speeches, they contain the following meta-data:

  • Period: int. The period in which the speech took place
  • Session: int. The session in which the speech took place
  • Chair: boolean. The information if the speaker was the chair of the plenary session
  • Interjection: boolean. The information if the speech is a comment or an interjection from the crowd
  • Party: list (e.g. ["cdu"] or ["cdu", "fdp"] when having more than one speaker during an interjection). List of the party of the speaker or the parties whom the comment/interjection references
  • Consituency: string. The consituency of the speaker in the current legislative period
  • MPID: int. The ID of the speaker, which can be used to get more meta-data from the file all_mps_meta.csv

The file all_mps_meta.csv contains the following meta information:

  • MPID: int. The ID of the speaker, which can be used to match the mp with his/her speeches.
  • WikipediaLink: The Link to the mps Wikipedia page
  • WikiDataLink: The Link to the mps WikiData page
  • Name: string. The full name of the mp.
  • Last Name: string. The last name of the mp, found on WikiData. If no last name is given on WikiData, the full name was heuristically cut at the last space to get the information neccessary for splitting the speeches.
  • Born: string, format: YYYY-MM-DD. Birth date of the mp. If an exact birth date is found on WikiData, this exact date is used. Otherwise, a day in the year of birth given on Wikipedia is used.
  • SexOrGender: string. Information on the sex or gender of the mp. Disclaimer: This infomation was taken from WikiData, which does not seem to differentiate between sex or gender.
  • Occupation: list. Occupation(s) of the mp.
  • Religion: string. Religious believes of the mp.
  • AbgeordnetenwatchID: int. ID of the mp on the website Abgeordnetenwatch

Files

all_mps_mapping.csv

Files (9.8 GB)

Name Size Download all
md5:969e708ff62f419a9cbd71fff222d08d
2.3 MB Preview Download
md5:5224c2cf5e55d4a13cebadb7f222f95a
2.5 MB Preview Download
md5:0adb6881c9c64124f6530d00ec5d9e93
698.8 MB Preview Download
md5:4bece6c975c614b61f634723a8e45f1e
721.4 MB Preview Download
md5:ba1f1895be5f99b87d060fc02d9e2ce6
547.0 MB Preview Download
md5:b02e7f906ad8fd02a9df55ed0438981e
279.1 MB Preview Download
md5:052889efe221b21959c2ceb72b2a907f
390.8 MB Preview Download
md5:88e9b29c29f36c37bb8c7e69cb0fc0f6
1.9 GB Preview Download
md5:512cd8df0bcc78283143f28a3ec4bee0
477.5 MB Preview Download
md5:2cb227a3bcca26de2e5a866ef5747c1e
739.9 MB Preview Download
md5:c282f4375e70006c0ad5109a8b1bee0f
388.0 MB Preview Download
md5:686e9c3ac03cd1d86ff18e2be512c61b
508.0 MB Preview Download
md5:d838c3ef704d0263a4a7b15de8d9bbfe
819.5 MB Preview Download
md5:567e697744b995a80a4cd965d15281ba
486.3 MB Preview Download
md5:13c8d3f3ef85177a9b94c828aefca768
336.8 MB Preview Download
md5:f10cd49589f1cab031cb4680e749a616
259.8 MB Preview Download
md5:bdeea7f59ac1744d5794875760aba8e1
338.0 MB Preview Download
md5:b692358f79e967ea568fe81f630ee515
601.8 MB Preview Download
md5:f9ba624598399a92a19c33223489b025
343.7 MB Preview Download

Details

Resource type Open dataset
Title SpeakGer: A meta-data enriched speech corpus of German state and federal parliaments
Alternative title SpeakGer
Creators
  • Lange, Kai-Robin1 ORCID icon
  • Jentsch, Carsten1 ORCID icon
  • Research Fields Political Science
    Size More than 15,000,000 speeches ; 10GB
    Formats Comma-separated values (CSV) (.csv)
    License(s) Creative Commons Attribution 4.0 International
    Countries Germany