This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.
 

Czech OOV Inflection Dataset

Please use the following text to cite this item or export to a predefined format:
Sourada, Tomáš, 2024, Czech OOV Inflection Dataset, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-5471.
Date issued
2024
Size
6270880 entries
Language(s)
Description
Czech OOV Inflection Dataset is a Czech inflection dataset of nouns, focused on evaluation in out-of-vocabulary (OOV) conditions. It consists of two parts: a standard lemma-disjoint train-dev-test split of a subset of noun paradigms of existing morphological dictionary Czech MorfFlex 2.0 (files train, dev and test-MorfFlex); and small set of neologisms from Čeština 2.0, annotated for inflected forms (file test-neologisms).
 Files in this item
Name
CzechOOVInflectionDataset.tar.xz
Size
17.08 MB
Format
application/x-xz
Description
xz Archive
MD5
f768e0166d0e81535e8afb2555d3eca3
Preview
  File Preview