dc.contributor.author |
Novák, Michal |
dc.contributor.author |
Nedoluzhko, Anna |
dc.contributor.author |
Schwarz (Khoroshkina), Anna |
dc.date.accessioned |
2016-11-22T08:42:25Z |
dc.date.available |
2016-11-22T08:42:25Z |
dc.date.issued |
2016-09-30 |
dc.identifier.uri |
http://hdl.handle.net/11234/1-1791 |
dc.description |
Prague Czech-English Dependency Treebank - Russian translation (PCEDT-R) is a project of translating a subset of Prague Czech-English Dependency Treebank 2.0 (PCEDT 2.0) to Russian and linguistically annotating the Russian translations with emphasis on coreference and cross-lingual alignment of coreferential expressions. Cross-lingual comparison of coreference means is currently the purpose that drives development of this corpus.
The current version 0.5 is a preliminary version, which contains (+ denotes new features):
* complete PCEDT 2.0 documents "wsj_1900"-"wsj_1949"
* Czech-English word alignment of coreferential expressions annotated manually mainly on the t-layer
+ Russian translations of the original English sentences
+ automatic tokenization, part-of-speech tagging and morphological analysis for Russian
+ automatic word alignment between all Czech and Russian words
+ manual alignment between Russian and the other two languages on possessive pronouns |
dc.language.iso |
eng |
dc.language.iso |
ces |
dc.language.iso |
rus |
dc.publisher |
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) |
dc.rights |
CC-BY-NC-SA + LDC99T42 |
dc.rights.uri |
https://lindat.mff.cuni.cz/repository/xmlui/page/license-pcedt2 |
dc.subject |
multilingual |
dc.subject |
coreference |
dc.title |
Prague Czech-English Dependency Treebank 2.0 - Russian translation |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
dc.rights.label |
RES |
has.files |
yes |
branding |
LINDAT / CLARIAH-CZ |
contact.person |
Michal Novák mnovak@ufal.mff.cuni.cz ÚFAL MFF UK |
sponsor |
Czech Science Foundation GA 16-05394S Structure of coreferential chains in parallel language data nationalFunds |
sponsor |
The Charles University Grant Agency GAUK 338915 Cross-lingual approaches to coreference resolution nationalFunds |
sponsor |
Ministerstvo školství, mládeže a tělovýchovy České republiky LH14011 Vícejazyčná korpusová anotace nationalFunds |
sponsor |
Ministerstvo školství, mládeže a tělovýchovy České republiky LM2015071 LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat nationalFunds |
sponsor |
Univerzita Karlova (mimo GAUK) SVV 260 333 Specifický vysokoškolský výzkum nationalFunds |
size.info |
1127 sentences |
files.size |
7212088 |
files.count |
1 |