This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

Slovak Dependency Treebank

Please use the following text to cite this item or export to a predefined format:
Gajdošová, Katarína; Šimková, Mária and et al., 2016, Slovak Dependency Treebank, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-1822.
Date issued
2016-11-07
Size
10604 sentences,
106043 tokens
Language(s)
Description
Slovak Dependency Treebank (Slovenský závislostný korpus) was created as part of the Slovak National Corpus at the Ľ. Štúr Institute of the Slovak Academy of Sciences. The annotation follows the guidelines of the Prague Dependency Treebank (Czech), slightly modified in the spirit of Slovak grammatical tradition. Morphological tags, lemmas and dependency relations have been assigned manually to every word. The present dataset is a subset of the original treebank. We automatically selected the sentences where the two human annotators 100% agreed on the analysis. This increases the quality and trustworthiness of the data but it also results in selecting short sentences most of the time. An extended version may be published in the future when manually merged and checked annotation is available. The selected sentences have been converted to the CoNLL-X file format (original token IDs are preserved in the FEATS column). This PDT-style annotation will serve as the source for the first Slovak dataset in the Universal Dependencies (to be published separately).
Acknowledgement
 Files in this item
Name
stb.conll.gz
Size
1.16 MB
Format
application/x-gzip
Description
Slovak Dependency Treebank (subset, CoNLL-X format)
MD5
28d92d0c401d858700cefe797e466d93
Preview
  File Preview
    • stb.conll5 MB