SpeakGer: A meta-data enriched speech corpus of German state and federal parliaments

  • 1. TU Dortmund University

Published: September 22, 2023

Version v2


A dataset of German parliament debates covering 74 years of plenary protocols across all 16 state parliaments of Germany as well as the German Bundestag. The debates are separated into individual speeches which are enriched with meta data identifying the speaker as a member of the parliament (mp). 

When using this data set, please cite the original paper "Lange, K.-R., Jentsch, C. (2023). SpeakGer: A meta-data enriched speech corpus of German state and federal parliaments. Proceedings of the 3rd Workshop on Computational Linguistics for Political Text Analysis@KONVENS 2023.".

The meta data is separated into two different types: time-specific meta-data that contains only information for a legislative period but can change over time (e.g. the party or constituency of an mp) and meta-data that is considered fixed, such as the birth date or the name of a speaker. The former information are stored aong with the speeches as it is considered temporal information of that point in time, but are additionally stored in the file all_mps_mapping.csv if there is the need to double-check something. The rest of the meta-data are stored in the file all_mps_meta.csv. The meta-data from this file can be matched with a speech by comparing the speaker ID-variable "MPID". The speeches of each parliament are saved in a csv format. Along with the speeches, they contain the following meta-data:

  • Period: int. The period in which the speech took place
  • Session: int. The session in which the speech took place
  • Chair: boolean. The information if the speaker was the chair of the plenary session
  • Interjection: boolean. The information if the speech is a comment or an interjection from the crowd
  • Party: list (e.g. ["cdu"] or ["cdu", "fdp"] when having more than one speaker during an interjection). List of the party of the speaker or the parties whom the comment/interjection references
  • Consituency: string. The consituency of the speaker in the current legislative period
  • MPID: int. The ID of the speaker, which can be used to get more meta-data from the file all_mps_meta.csv

The file all_mps_meta.csv contains the following meta information:

  • MPID: int. The ID of the speaker, which can be used to match the mp with his/her speeches.
  • WikipediaLink: The Link to the mps Wikipedia page
  • WikiDataLink: The Link to the mps WikiData page
  • Name: string. The full name of the mp.
  • Last Name: string. The last name of the mp, found on WikiData. If no last name is given on WikiData, the full name was heuristically cut at the last space to get the information neccessary for splitting the speeches.
  • Born: string, format: YYYY-MM-DD. Birth date of the mp. If an exact birth date is found on WikiData, this exact date is used. Otherwise, a day in the year of birth given on Wikipedia is used.
  • SexOrGender: string. Information on the sex or gender of the mp. Disclaimer: This infomation was taken from WikiData, which does not seem to differentiate between sex or gender.
  • Occupation: list. Occupation(s) of the mp.
  • Religion: string. Religious believes of the mp.
  • AbgeordnetenwatchID: int. ID of the mp on the website Abgeordnetenwatch



Files (9.8 GB)

Name Size Download all
2.3 MB Preview Download
2.5 MB Preview Download
698.8 MB Preview Download
721.4 MB Preview Download
547.0 MB Preview Download
279.1 MB Preview Download
390.8 MB Preview Download
1.9 GB Preview Download
477.5 MB Preview Download
739.9 MB Preview Download
388.0 MB Preview Download
508.0 MB Preview Download
819.5 MB Preview Download
486.3 MB Preview Download
336.8 MB Preview Download
259.8 MB Preview Download
338.0 MB Preview Download
601.8 MB Preview Download
343.7 MB Preview Download


Resource type Open dataset
Title SpeakGer: A meta-data enriched speech corpus of German state and federal parliaments
Alternative title SpeakGer
  • Lange, Kai-Robin1 ORCID icon
  • Jentsch, Carsten1 ORCID icon
  • Research Fields Political Science
    Size More than 15,000,000 speeches ; 10GB
    Formats Comma-separated values (CSV) (.csv)
    License(s) Creative Commons Attribution 4.0 International
    Countries Germany