Published December 2023 | Version v3

"LaCour!" - a multilingual dataset of hearing transcripts with 2 million+ tokens

Creators

Description

Description

Dataset Summary 

This dataset contains transcribed court hearings sourced from official hearings of the European Court of Human Rights (https://www.echr.coe.int/webcasts-of-hearings). The hearings are 154 selected webcasts (videos) from 2012-2022 in their original language (no interpretation). With manual annotation for language labels and automatic processing of the extracted audio with pyannote and whisper-large-v2, the resulting dataset contains 4000 speaker turns and 88920 individual lines. The dataset contains two subsets, the transcripts and the metadata with linked documents. The transcripts are additionally available as .txt or .xml.

Languages 

The largest amounts in the transcripts are: English, French

A smaller portion also contains the following languages:

Russian, Spanish, Croatian, Italian, Portuguese, Turkish, Polish, Lithuanian, German, Ukrainian, Hungarian, Dutch, Albanian, Romanian, Serbian

The collected metadata is: English

Dataset Structure 

Data Instances 

Each instance in transcripts represents an entire segment of a transcript, similar to a conversation turn in a dialog.

{ 'id': 0, 'webcast_id': '1021112_29112017', 'segment_id': 0, 'speaker_name': 'UNK', 'speaker_role': 'Announcer', 'data': {   'begin': [12.479999542236328],   'end': [13.359999656677246],   'language': ['fr'],   'text': ['La Cour!']  } }

Each instance in documents represents a information on a document in hudoc associated with a hearing and the metadata associated with a hearing. The actual document is linked and can also be found in hudocwith the case_id. Note: hearing_type states the type of the hearing, type states the type of the document. If the hearing is a "Grand Chamber hearing", the "CHAMBER" document refers to a different hearing.

{

 'id': 16,

 'webcast_id': '1232311_02102012',

 'hearing_title': 'Michaud v. France (nos. 12323/11)',

 'hearing_date': '2012-10-02 00:00:00',

 'hearing_type': 'Chamber hearing',

 'application_number': ['12323/11'],

 'case_id': '001-115377',

 'case_name': 'CASE OF MICHAUD v. FRANCE',

 'case_url': 'https://hudoc.echr.coe.int/eng?i=001-115377',

 'ecli': 'ECLI:CE:ECHR:2012:1206JUD001232311',

 'type': 'CHAMBER',

 'document_date': '2012-12-06 00:00:00',

 'importance': 1,

 'articles': ['8', '8-1', '8-2', '34', '35'],

 'respondent_government': ['FRA'],

 'issue': 'Decision of the National Bar Council of 12 July 2007 "adopting regulations on internal procedures for implementing the obligation to combat money laundering and terrorist financing, and an internal supervisory mechanism to guarantee compliance with those procedures" ; Article 21-1 of the Law of 31 December 1971 ; Law no. 2004-130 of 11 February 2004 ; Monetary and Financial Code',

 'strasbourg_caselaw': 'André and Other v. France, no 18603/03, 24 July 2008;Bosphorus Hava Yollari Turizm ve Ticaret Anonim Sirketi v. Ireland [GC], no 45036/98, ECHR 2005-VI;[...]',

 'external_sources': 'Directive 91/308/EEC, 10 June 1991;Article 6 of the Treaty on European Union;Charter of Fundamental Rights of the European Union;Articles 169, 170, 173, 175, 177, 184 and 189 of the Treaty establishing the European Community;Recommendations 12 and 16 of the financial action task  force ("FATF") on money laundering;Council of Europe Convention on Laundering, Search, Seizure and Confiscation of the Proceeds from Crime and on the Financing of Terrorism  (16 May 2005)',

 'conclusion': 'Remainder inadmissible;No violation of Article 8 - Right to respect for private and family life (Article 8-1 - Respect for correspondence;Respect for private life)',

 'separate_opinion': True

 }

Files

lacour_linked_documents.json

Files (1.9 MB)

Name Size Download all
md5:a1c39d25396287df8cf01bc002bbf947
1.7 MB Preview Download
md5:d0c6d01137b5d37a0b1080e942a95222
223.2 kB Preview Download

Variables

Name Description
id Transcript identifier
webcast_id The identifier for the hearing
segment_id The identifier of the current speaker segment in the current hearing
speaker_name The name of the speaker (not given for Applicant, Government or Third Party)
speaker_role The role/party the speaker represents (Announcer for announcements, Judge for judges, JudgeP for judge president, Applicant for representatives of the applicant, Government for representatives of the respondent government, ThirdParty for representatives of third party interveners)
data Sequence of the following fields: 1) begin: the timestamp for begin of line (in seconds) 2) end: the timestamp for end of line (in seconds) 3) language: the language spoken (in ISO 639-1) 4) text: the spoken line
id Document identifier
hearing_title The title of the hearing
hearing_date The date of the hearing
hearing_type The type of hearing (Grand Chamber, Chamber or Grand Chamber Judgment Hearing)
application_number The application numbers which are associated with the hearing and case
case_id The id of the case
case_name The name of the case
case_url The direct link to the document
ecli The ECLI (European Case Law Identifier)
type The type of document
document_date The date of the document
importance The importance score of the case (1 is the highest importance, key case)
articles The concerning articles of the Convention of Human Rights
respondent_government The code of the respondent government(s) (in ISO-3166 Alpha-3)
issue The references to the issue of the case
strasbourg_caselaw The list of cases in the ECHR which are relevant to the current case
external_sources The relevant references outside of the ECHR
conclusion The short textual description of the conclusion
separate_opinion The indicator if there is a separate opinion

Details

Resource type Open dataset
Title "LaCour!" - a multilingual dataset of hearing transcripts with 2 million+ tokens
Creators
  • Habernal, Ivan
  • Research Fields Political Science
    Size 1.9 MB
    Formats Text (generally ASCII or ISO 8859-n) (.txt) XML (.xml)
    License(s) Creative Commons Attribution Share Alike 4.0 International
    External Resource https://huggingface.co/datasets/TrustHLT/LaCour