Show simple item record

 
dc.contributor.author Gajdošová, Katarína
dc.contributor.author Šimková, Mária
dc.contributor.author et al.
dc.date.accessioned 2016-11-07T14:50:58Z
dc.date.available 2016-11-07T14:50:58Z
dc.date.issued 2016-11-07
dc.identifier.uri http://hdl.handle.net/11234/1-1822
dc.description Slovak Dependency Treebank (Slovenský závislostný korpus) was created as part of the Slovak National Corpus at the Ľ. Štúr Institute of the Slovak Academy of Sciences. The annotation follows the guidelines of the Prague Dependency Treebank (Czech), slightly modified in the spirit of Slovak grammatical tradition. Morphological tags, lemmas and dependency relations have been assigned manually to every word. The present dataset is a subset of the original treebank. We automatically selected the sentences where the two human annotators 100% agreed on the analysis. This increases the quality and trustworthiness of the data but it also results in selecting short sentences most of the time. An extended version may be published in the future when manually merged and checked annotation is available. The selected sentences have been converted to the CoNLL-X file format (original token IDs are preserved in the FEATS column). This PDT-style annotation will serve as the source for the first Slovak dataset in the Universal Dependencies (to be published separately).
dc.language.iso slk
dc.publisher Jazykovedný ústav Ľ. Štúra Slovenskej akadémie vied
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri http://creativecommons.org/licenses/by-sa/4.0/
dc.source.uri http://korpus.juls.savba.sk/
dc.subject dependency
dc.subject treebank
dc.subject syntax
dc.subject morphology
dc.title Slovak Dependency Treebank
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
dc.rights.label PUB
has.files yes
branding LINDAT / CLARIAH-CZ
contact.person Daniel Zeman zeman@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and Physics, ÚFAL
sponsor Ministerstvo školstva SR 2003SP200280307 Komplexné spracovanie slovenského jazyka a jeho elektronizácia na účely jazykovedného výskumu nationalFunds
sponsor Grantová agentura České republiky 15-10472S Morphologically and Syntactically Annotated Corpora of Many Languages nationalFunds
size.info 10604 sentences
size.info 106043 tokens
files.size 1219869
files.count 1


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
stb.conll.gz
Size
1.16 MB
Format
application/x-gzip
Description
Slovak Dependency Treebank (subset, CoNLL-X format)
MD5
28d92d0c401d858700cefe797e466d93
 Download file

Show simple item record