The Czech Etymological Lexicon, version 1.0, contains 10,502 Czech words, each annotated with a sequence of ISO 639-3 language codes representing its etymological origin. The dataset is provided in a simple tab-separated format, with the first column containing the lemma and the second listing the language codes separated by commas.
Example entry:
architekt deu,lat,ell loan
The word architekt originated from Greek, and came to Czech through Latin and German.
The third column indicates whether the word is a loanword (marked as "loan") or a native word (marked as "native"). Note that "native" refers to inherited words as opposed to loanwords.
The language sequences were extracted from the printed dictionary REJZEK, Jiří. Český etymologický slovník [Czech etymological dictionary]. LEDA, 2015. The extraction of language sequences from the entries in the original dictionary was fully automated and, therefore, may contain imperfections. Please refer to the original dictionary for highly precise information.
DeriNet is a lexical network modeling derivational and compositional relations in Czech. The nodes of the network represent Czech lexemes, while the edges capture word-formational relations between derived words and their base word(s). The current version, DeriNet 2.3, introduces several key improvements over version 2.2:
(a) the set of 1,040,126 lexemes is aligned with the latest version of MorfFlex CZ (version 2.1),
(b) 5,781 derivational trees containing loanwords are enriched with etymological information specifying their origins, adopted from the Czech Etymological Lexicon,
(c) 8,867 new derivational and 1,262 new compound relations have been identified, resulting in a total of 791,771 derivational and 7,598 compound relations, and
(d) the morphological segmentation and classification of morphs have been significantly enhanced.