dc.contributor.author | Gajdošová, Katarína |
dc.contributor.author | Šimková, Mária |
dc.contributor.author | et al. |
dc.date.accessioned | 2016-11-07T14:50:58Z |
dc.date.available | 2016-11-07T14:50:58Z |
dc.date.issued | 2016-11-07 |
dc.identifier.uri | http://hdl.handle.net/11234/1-1822 |
dc.description | Slovak Dependency Treebank (Slovenský závislostný korpus) was created as part of the Slovak National Corpus at the Ľ. Štúr Institute of the Slovak Academy of Sciences. The annotation follows the guidelines of the Prague Dependency Treebank (Czech), slightly modified in the spirit of Slovak grammatical tradition. Morphological tags, lemmas and dependency relations have been assigned manually to every word. The present dataset is a subset of the original treebank. We automatically selected the sentences where the two human annotators 100% agreed on the analysis. This increases the quality and trustworthiness of the data but it also results in selecting short sentences most of the time. An extended version may be published in the future when manually merged and checked annotation is available. The selected sentences have been converted to the CoNLL-X file format (original token IDs are preserved in the FEATS column). This PDT-style annotation will serve as the source for the first Slovak dataset in the Universal Dependencies (to be published separately). |
dc.language.iso | slk |
dc.publisher | Jazykovedný ústav Ľ. Štúra Slovenskej akadémie vied |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | http://creativecommons.org/licenses/by-sa/4.0/ |
dc.source.uri | http://korpus.juls.savba.sk/ |
dc.subject | dependency |
dc.subject | treebank |
dc.subject | syntax |
dc.subject | morphology |
dc.title | Slovak Dependency Treebank |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
dc.rights.label | PUB |
has.files | yes |
branding | LINDAT / CLARIAH-CZ |
contact.person | Daniel Zeman zeman@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and Physics, ÚFAL |
sponsor | Ministerstvo školstva SR 2003SP200280307 Komplexné spracovanie slovenského jazyka a jeho elektronizácia na účely jazykovedného výskumu nationalFunds |
sponsor | Grantová agentura České republiky 15-10472S Morphologically and Syntactically Annotated Corpora of Many Languages nationalFunds |
size.info | 10604 sentences |
size.info | 106043 tokens |
files.size | 1219869 |
files.count | 1 |
Files in this item
This item is
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
- Name
- stb.conll.gz
- Size
- 1.16 MB
- Format
- application/x-gzip
- Description
- Slovak Dependency Treebank (subset, CoNLL-X format)
- MD5
- 28d92d0c401d858700cefe797e466d93