DeriNet is a lexical network which models derivational and compositional relations in the lexicon of Czech. Nodes of the network correspond to Czech lexemes, while edges represent word-formational relations between a derived word and its base word / words.
The present version, DeriNet 2.2, contains:
- 1,040,127 lexemes (sampled from the MorfFlex CZ 2.0 dictionary), connected by
- 782,904 derivational,
- 50,511 orthographic variant,
- 6,336 compounding,
- 288 univerbation, and
- 135 conversion relations.
Compared to the previous version, version 2.1 contains an overhaul of the compounding annotation scheme, 4384 extra compounds, 83 more affixoid lexemes serving as bases for compounding, more parts of speech serving as bases for compounding (adverbs, pronouns, numerals), and several minor corrections of derivational relations.
The article looks into the basic word-formation relations - semantic motivation and formal foundation - and the tension between them. It presents three cases of word-formative analogy, i.e. imitating of a productive word-formative model by the copying of a morfematic structure of derivatives, regardless of the completeness or incompleteness of the derivative line. The article reflects experience gained during the work on Dictionary of affixes used in Czech.
The item contains a list of 2,058 noun/verb conversion pairs along with related formations (word-formation paradigms) provided with linguistic features, including semantic categories that characterize semantic relations between the noun and the verb in each conversion pair. Semantic categories were assigned manually by two human annotators based on a set of sentences containing the noun and the verb from individual conversion pairs. In addition to the list of paradigms, the item contains a set of 739 files (a separate file for each conversion pair) annotated by the annotators in parallel and a set of 2,058 files containing the final annotation, which is included in the list of paradigms.