Slovak Dependency Treebank
Please use the following text to cite this item or export to a predefined format:
Gajdošová, Katarína; Šimková, Mária and et al., 2016,
Slovak Dependency Treebank, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11234/1-1822.
Authors
Item identifier
Project URL
Date issued
2016-11-07
Size
10604 sentences,
106043 tokens
Language(s)
Description
Slovak Dependency Treebank (Slovenský závislostný korpus) was created as part of the Slovak National Corpus at the Ľ. Štúr Institute of the Slovak Academy of Sciences. The annotation follows the guidelines of the Prague Dependency Treebank (Czech), slightly modified in the spirit of Slovak grammatical tradition. Morphological tags, lemmas and dependency relations have been assigned manually to every word.
The present dataset is a subset of the original treebank. We automatically selected the sentences where the two human annotators 100% agreed on the analysis. This increases the quality and trustworthiness of the data but it also results in selecting short sentences most of the time. An extended version may be published in the future when manually merged and checked annotation is available.
The selected sentences have been converted to the CoNLL-X file format (original token IDs are preserved in the FEATS column). This PDT-style annotation will serve as the source for the first Slovak dataset in the Universal Dependencies (to be published separately).
Acknowledgement
Ministerstvo školstva SR
Project code:2003SP200280307
Project name:Komplexné spracovanie slovenského jazyka a jeho elektronizácia na účely jazykovedného výskumu
Grantová agentura České republiky
Project code:15-10472S
Project name:Morphologically and Syntactically Annotated Corpora of Many Languages
Subject(s)
Collections
This item isPublicly Available
and licensed under:


