We present the Czech Court Decisions Dataset (CCDD) -- a dataset of 300 manually annotated court decisions published by The Supreme Court of the Czech Republic and the Constitutional Court of the Czech Republic.
The Czech Legal Text Treebank (CLTT) is a collection of 1133 manually annotated dependency trees. CLTT consists of two legal documents: The Accounting Act (563/1991 Coll., as amended) and Decree on Double-entry Accounting for undertakers (500/2002 Coll., as amended).
The Czech Legal Text Treebank 2.0 (CLTT 2.0) annotates the same texts as the CLTT 1.0. These texts come from the legal domain and they are manually syntactically annotated. The CLTT 2.0 annotation on the syntactic layer is more elaborate than in the CLTT 1.0 from various aspects. In addition, new annotation layers were added to the data: (i) the layer of accounting entities, and (ii) the layer of semantic entity relations.
Environmental impact assessment (EIA) is the formal process used to predict the environmental consequences of a plan. We present a rule-based extraction system to mine Czech EIA documents. The extraction rules work with a set of documents enriched with morphological information and manually created vocabularies of terms supposed to be extracted from the documents, e.g. basic information about the project (address, ID company, ...), data on the impacts and outcomes (waste substances, endangered species, ...), a final opinion. The documents Notice of Intent contains the section BI2 with the information on the scope (capacity) of the plan.
KUK 0.0 is a pilot version of a corpus of Czech legal and administrative texts designated as data for manual and automatic assessment of accessibility (comprehensibility or clarity) of Czech legal texts.
KUK 1.0 is a corpus of Czech legal and administrative texts accompanied by extensive metadata information for automatic assessment of accessibility (comprehensibility or clarity) of Czech legal texts. It is a successor of corpus KUK 0.0, which was published in 2023 (http://hdl.handle.net/11234/1-5363).
KUK 1.0 enhances the texts from KUK 0.0 by automatic analysis in the Universal Dependencies framework (using UDPipe 2.0) and by automatic marking of named entities (using NameTag 3.0), and adds new texts used in KUKY 1.0 corpus.
KUKY is a curated selection of 224 Czech administrative and legal documents for readability research, stored in two JSON files.
The documents come partly from public databases (Office of the Ombudsman, courts) and from private sources (letters, public local administration announcements). Some documents come in documented draft-revision pairs.
They are manually enriched with a two-level annotation: "Relevance Stoplight" and "Speech Acts". This annotation mimics the way a plain-language expert scrutinizes a document before redesigning it for better readability: first, they closely read the entire document and detect problematic passages ("Relevance Stoplight"), classifying them as either incomprehensible or superfluous, or approving them as relevant.
In a second step, the editor works with the relevant text according to a genre-specific template ("Speech Acts").
At the metadata level, the documents are graded with respect to their readability, as perceived by experienced plain legal writing teachers.
Migrant Stories is a corpus of 1017 short biographic narratives of migrants supplemented with meta information about countries of origin/destination, the migrant gender, GDP per capita of the respective countries, etc. The corpus has been compiled as a teaching material for data analysis.