Preamble 1.0
Please use the following text to cite this item or export to a predefined format:
Hladká, Barbora and Mírovský, Jiří, 2022,
Preamble 1.0, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11234/1-4912.
Authors
Item identifier
Project URL
Date issued
2022-10-14
Size
10173 words,
522 items
Description
Preamble 1.0 is a multilingual annotated corpus of the preamble of the EU REGULATION 2020/2092 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL. The corpus consists of four language versions of the preamble (Czech, English, French, Polish), each of them annotated with sentence subjects.
The data were annotated in the Brat tool (https://brat.nlplab.org/) and are distributed in the Brat native format, i.e. each annotated preamble is represented by the original plain text and a stand-off annotation file.
Acknowledgement
4EU+ European University Alliance
Project code:2021_F3_10
Project name:@SWitCH: Crash Course on Data Analytics for Students of Social Studies and Humanities
Subject(s)
Collections
This item isPublicly Available
and licensed under:
Files in this item
- Name
- README.TXT
- Size
- 2.92 KB
- Format
- text/plain
- Description
- Text
- MD5
- bdf82b43c155d3bf117d509e8080d449

============ Preamble 1.0 ============ Authors ======= Barbora Hladká (hladka@ufal.mff.cuni.cz) Jiří Mírovský (mirovsky@ufal.mff.cuni.cz) Introduction ============ Preamble 1.0 is a multilingual annotated corpus of the preamble of the EU REGULATION 2020/2092 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 16 December 2020 on a general regime of conditionality for the protection of the Union budget. The corpus consists of four language versions of the preamble (source texts downloaded from the following web pages): Czech (https://eur-lex.europa.eu/legal-content/CS/TXT/PDF/?uri=CELEX:32020R2092) English (https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32020R2092) French (https://eur-lex.europa.eu/legal-content/FR/TXT/PDF/?uri=CELEX:32020R2092) Polish (https://eur-lex.europa.eu/legal-content/PL/TXT/PDF/?uri=CELEX:32020R2092) The language selection is based on languages used in the course NPFL134 (Data Analytics for Students of Social Studies and Humanities) at the Institute of Formal and Applied Linguistics in the summer semester of 2022 (https://ufal.mff.cuni.cz/courses/npfl134). The annotation comprises of annotation of subjects, while an annotated subject is always only a single word, i.e., in sentence "European leaders said that...", only "leaders" is annotated as a subject. In accordance with this rule, which follows the Universal Dependencies framework, each member of a coordinated subject is annotated separately, articles are not included. The annotations come as a result of the work of the NPFL134 course students; each preamble was independently annotated by two students and their automatically unified results were subsequently curated by an arbiter. The four language versions contain the following numbers of words and annotated subjects: Czech: 2283 words, 143 annotated subjects English: 2687 words, 139 annotated subjects French: 2941 words, 133 annotated subjects Polish: 2262 words, 107 annotated subjects Data Format ======== . . .
- Name
- Preamble1.0.zip
- Size
- 34.14 KB
- Format
- application/zip
- Description
- Zip
- MD5
- 6d9825ad77f165a80011e331be0a5e61


