Nově přidané
corpus

Popis:
We present a large corpus of Czech parliament plenary sessions. The corpus
consists of approximately 444 hours of speech data and corresponding text
transcriptions. The whole corpus has been segmented to short audio ...
Tento záznam obsahuje 1 soubor (36.88
GB).
Publicly Available


lexicalConceptualResource

Popis:
The SynSemClass synonym verb lexicon is a result of a project investigating semantic ‘equivalence’ of verb senses and their valency behavior in parallel Czech-English language resources, i.e., relating verb meanings with ...
Tento záznam obsahuje 2 souborů (8.91
MB).
Publicly Available




corpus

Popis:
COSTRA 1.0 is a dataset of Czech complex sentence transformations. The dataset is intended for the study of sentence-level embeddings beyond simple word alternations or standard paraphrasing.
The dataset consist of 4,262 ...
Tento záznam obsahuje 1 soubor (116.75
KB).
Publicly Available


Nejnavštěvovanější záznamy
Za poslední týden
corpus

Popis:
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and ...
Tento záznam obsahuje 3 souborů (417.2
MB).
Publicly Available


toolService

Popis:
Tokenizer, POS Tagger, Lemmatizer and Parser models for 90 treebanks of 60 languages of Universal Depenencies 2.4 Treebanks, created solely using UD 2.4 data (http://hdl.handle.net/11234/1-2988). The model documentation ...
Tento záznam obsahuje 92 souborů (2.39
GB).
Publicly Available




languageDescription

Popis:
Automatic segmentation, tokenization and morphological and syntactic annotations of raw texts in 45 languages, generated by UDPipe (http://ufal.mff.cuni.cz/udpipe), together with word embeddings of dimension 100 computed ...
Tento záznam obsahuje 47 souborů (629.67
GB).
Publicly Available



