SpeakGer: A meta-data enriched speech corpus of German state and federal parliaments
Published: September 22, 2023
Version v2
Description
A dataset of German parliament debates covering 74 years of plenary protocols across all 16 state parliaments of Germany as well as the German Bundestag. The debates are separated into individual speeches which are enriched with meta data identifying the speaker as a member of the parliament (mp).
When using this data set, please cite the original paper "Lange, K.-R., Jentsch, C. (2023). SpeakGer: A meta-data enriched speech corpus of German state and federal parliaments. Proceedings of the 3rd Workshop on Computational Linguistics for Political Text Analysis@KONVENS 2023.".
The meta data is separated into two different types: time-specific meta-data that contains only information for a legislative period but can change over time (e.g. the party or constituency of an mp) and meta-data that is considered fixed, such as the birth date or the name of a speaker. The former information are stored aong with the speeches as it is considered temporal information of that point in time, but are additionally stored in the file all_mps_mapping.csv if there is the need to double-check something. The rest of the meta-data are stored in the file all_mps_meta.csv. The meta-data from this file can be matched with a speech by comparing the speaker ID-variable "MPID". The speeches of each parliament are saved in a csv format. Along with the speeches, they contain the following meta-data:
- Period: int. The period in which the speech took place
- Session: int. The session in which the speech took place
- Chair: boolean. The information if the speaker was the chair of the plenary session
- Interjection: boolean. The information if the speech is a comment or an interjection from the crowd
- Party: list (e.g. ["cdu"] or ["cdu", "fdp"] when having more than one speaker during an interjection). List of the party of the speaker or the parties whom the comment/interjection references
- Consituency: string. The consituency of the speaker in the current legislative period
- MPID: int. The ID of the speaker, which can be used to get more meta-data from the file all_mps_meta.csv
The file all_mps_meta.csv contains the following meta information:
- MPID: int. The ID of the speaker, which can be used to match the mp with his/her speeches.
- WikipediaLink: The Link to the mps Wikipedia page
- WikiDataLink: The Link to the mps WikiData page
- Name: string. The full name of the mp.
- Last Name: string. The last name of the mp, found on WikiData. If no last name is given on WikiData, the full name was heuristically cut at the last space to get the information neccessary for splitting the speeches.
- Born: string, format: YYYY-MM-DD. Birth date of the mp. If an exact birth date is found on WikiData, this exact date is used. Otherwise, a day in the year of birth given on Wikipedia is used.
- SexOrGender: string. Information on the sex or gender of the mp. Disclaimer: This infomation was taken from WikiData, which does not seem to differentiate between sex or gender.
- Occupation: list. Occupation(s) of the mp.
- Religion: string. Religious believes of the mp.
- AbgeordnetenwatchID: int. ID of the mp on the website Abgeordnetenwatch
Files
all_mps_mapping.csv
Files
(9.8 GB)
Name | Size | Download all |
---|---|---|
md5:969e708ff62f419a9cbd71fff222d08d
|
2.3 MB | Preview Download |
md5:5224c2cf5e55d4a13cebadb7f222f95a
|
2.5 MB | Preview Download |
md5:0adb6881c9c64124f6530d00ec5d9e93
|
698.8 MB | Preview Download |
md5:4bece6c975c614b61f634723a8e45f1e
|
721.4 MB | Preview Download |
md5:ba1f1895be5f99b87d060fc02d9e2ce6
|
547.0 MB | Preview Download |
md5:b02e7f906ad8fd02a9df55ed0438981e
|
279.1 MB | Preview Download |
md5:052889efe221b21959c2ceb72b2a907f
|
390.8 MB | Preview Download |
md5:88e9b29c29f36c37bb8c7e69cb0fc0f6
|
1.9 GB | Preview Download |
md5:512cd8df0bcc78283143f28a3ec4bee0
|
477.5 MB | Preview Download |
md5:2cb227a3bcca26de2e5a866ef5747c1e
|
739.9 MB | Preview Download |
md5:c282f4375e70006c0ad5109a8b1bee0f
|
388.0 MB | Preview Download |
md5:686e9c3ac03cd1d86ff18e2be512c61b
|
508.0 MB | Preview Download |
md5:d838c3ef704d0263a4a7b15de8d9bbfe
|
819.5 MB | Preview Download |
md5:567e697744b995a80a4cd965d15281ba
|
486.3 MB | Preview Download |
md5:13c8d3f3ef85177a9b94c828aefca768
|
336.8 MB | Preview Download |
md5:f10cd49589f1cab031cb4680e749a616
|
259.8 MB | Preview Download |
md5:bdeea7f59ac1744d5794875760aba8e1
|
338.0 MB | Preview Download |
md5:b692358f79e967ea568fe81f630ee515
|
601.8 MB | Preview Download |
md5:f9ba624598399a92a19c33223489b025
|
343.7 MB | Preview Download |
Details
Resource type | Open dataset |
Title | SpeakGer: A meta-data enriched speech corpus of German state and federal parliaments |
Alternative title | SpeakGer |
Creators |
|
Research Fields | Political Science |
Size | More than 15,000,000 speeches ; 10GB |
Formats | Comma-separated values (CSV) (.csv) |
License(s) | Creative Commons Attribution 4.0 International |
Countries | Germany |