This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

Annotated Corpus of Czech Case Law for Segmentation Tasks

Please use the following text to cite this item or export to a predefined format:
Harašta, Jakub; Šavelka, Jaromír; Kasl, František and Míšek, Jakub, 2019, Annotated Corpus of Czech Case Law for Segmentation Tasks, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11372/LRT-2901.
Date issued
2019-05-23
Size
350 articles
Language(s)
Description
Annotated corpus of 350 decision of Czech top-tier courts (Supreme Court, Supreme Administrative Court, Constitutional Court). 280 decisions were annotated by one trained annotator and then manually adjudicated by one trained curator. 70 decisions were annotated by two trained annotators and then manually adjudicated by one trained curator. Adjudication was conducted destructively, therefore dataset contains only the correct annotations and does not contain all original annotations. Corpus was developed as training and testing material for text segmentation tasks. Dataset contains decision segmented into Header, Procedural History, Submission/Rejoinder, Court Argumentation, Footer, Footnotes, and Dissenting Opinion. Segmentation allows to treat different parts of text differently even if it contains similar linguistic or other features.
Acknowledgement
This item isPublicly Available
and licensed under:
 Files in this item
Name
corpus.json
Size
13.83 MB
Format
application/octet-stream
Description
Corpus (gold)
MD5
5de341ef2545591f10b5283ef322386e
Preview
  File Preview
Name
ReadMe.pdf
Size
712.6 KB
Format
application/pdf
Description
ReadMe
MD5
ea46ea2b87576df5daedaecfcaf5e9e7
Preview
  File Preview