What's New
corpus

Description:
We present a large corpus of Czech parliament plenary sessions. The corpus
consists of approximately 444 hours of speech data and corresponding text
transcriptions. The whole corpus has been segmented to short audio ...
This item contains 1 file (36.88
GB).
Publicly Available


lexicalConceptualResource

Description:
The SynSemClass synonym verb lexicon is a result of a project investigating semantic ‘equivalence’ of verb senses and their valency behavior in parallel Czech-English language resources, i.e., relating verb meanings with ...
This item contains 2 files (8.91
MB).
Publicly Available




corpus

Description:
COSTRA 1.0 is a dataset of Czech complex sentence transformations. The dataset is intended for the study of sentence-level embeddings beyond simple word alternations or standard paraphrasing.
The dataset consist of 4,262 ...
This item contains 1 file (116.75
KB).
Publicly Available


Most Viewed Items
Top Last Week
corpus

Description:
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and ...
This item contains 3 files (417.2
MB).
Publicly Available


toolService

Description:
Tokenizer, POS Tagger, Lemmatizer and Parser models for 90 treebanks of 60 languages of Universal Depenencies 2.4 Treebanks, created solely using UD 2.4 data (http://hdl.handle.net/11234/1-2988). The model documentation ...
This item contains 92 files (2.39
GB).
Publicly Available




languageDescription

Description:
Automatic segmentation, tokenization and morphological and syntactic annotations of raw texts in 45 languages, generated by UDPipe (http://ufal.mff.cuni.cz/udpipe), together with word embeddings of dimension 100 computed ...
This item contains 47 files (629.67
GB).
Publicly Available



