This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

DeriNet 2.0

Please use the following text to cite this item or export to a predefined format:
Vidra, Jonáš; Žabokrtský, Zdeněk; Kyjánek, Lukáš; Ševčíková, Magda and Dohnalová, Šárka, 2019, DeriNet 2.0, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-2995.
Date issued
2019-05-30
Size
1027665 entries,
1024922 words
Language(s)
Description
DeriNet is a lexical network which models derivational relations in the lexicon of Czech. Nodes of the network correspond to Czech lexemes, while edges represent derivational or compositional relations between a derived word and its base word / words. The present version, DeriNet 2.0, contains 1,027,665 lexemes (sampled from the MorfFlex dictionary) connected by 808682 derivational and 600 compositional links. Compared to previous versions, version 2.0 uses a new format and contains new types of annotations: compounding, annotation of several morphological and other categories of lexemes, identification of root morphs of 244,198 lexemes, semantic labelling of 151,005 relations using five labels and identification of 13 fictitious lexemes.
Acknowledgement
This item isPublicly Available
and licensed under:
 Files in this item
Name
derinet-2-0.tsv
Size
146.21 MB
Format
application/octet-stream
Description
A tab-separated-values version of DeriNet 2.0, encoded as UTF-8, with Unix line endings. See the project homepage for documentation of the columns.
MD5
1ea9bc62699c96b00f52a25198f6d4ed
Preview
  File Preview