This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

MorfFlex CZ 161115

Please use the following text to cite this item or export to a predefined format:
Hajič, Jan and Hlaváčová, Jaroslava, 2016, MorfFlex CZ 161115, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-1834.
Date issued
2016-11-15
Size
125478706 lexicalTypes
Language(s)
Description
Czech morphological dictionary developed originally by Jan Hajič as a spelling checker and lemmatization dictionary. Currently it contains full morphological information for each covered wordform, as well as some derivational, semantic and named entity information.
 Files in this item
Name
morfflex-cz.2016-11-15.utf8.lemmaID_suff-tag-form.tab.csv.xz
Size
200.5 MB
Format
application/x-xz
Description
Full (morphologically analyzed) wordlist for Czech language, with lemma (which includes sense suffix (-<number>) and semantic/synt. suffixes and comments in PDT format, full positional tag in PDT format, and form (3 fields). Fields are tab separated, always filled by non-empty string, lines end with linefeed only, and coding is UTF-8.
MD5
b71172a9310463ee8d87efaf177b68d0
Preview
  File Preview
    • morfflex-cz.2016-11-15.utf8.lemmaID_suff-tag-form.tab.csv6 GB
Name
morfflex-cz.2016-11-15.utf8.conll09.tab.csv.xz
Size
284.74 MB
Format
application/x-xz
Description
Full (morphologically analyzed) wordlist for Czech language, with form, lemma (without sense suffix and without semantic/synt. suffixes), CoNLL-2009 Shared Task format major POS and CoNLL-2009 Shared Task Word Features. Fields are tab separated, always filles by non-empty string, lines end with linefeed only, and coding is UTF-8.
MD5
bf1b65d2af0f0671fdbf8493acb466ee
Preview
  File Preview
    • morfflex-cz.2016-11-15.utf8.conll09.tab.csv8 GB