Zobrazit minimální záznam

 
dc.contributor.author Novák, Michal
dc.contributor.author Nedoluzhko, Anna
dc.contributor.author Schwarz (Khoroshkina), Anna
dc.date.accessioned 2016-11-22T08:42:25Z
dc.date.available 2016-11-22T08:42:25Z
dc.date.issued 2016-09-30
dc.identifier.uri http://hdl.handle.net/11234/1-1791
dc.description Prague Czech-English Dependency Treebank - Russian translation (PCEDT-R) is a project of translating a subset of Prague Czech-English Dependency Treebank 2.0 (PCEDT 2.0) to Russian and linguistically annotating the Russian translations with emphasis on coreference and cross-lingual alignment of coreferential expressions. Cross-lingual comparison of coreference means is currently the purpose that drives development of this corpus. The current version 0.5 is a preliminary version, which contains (+ denotes new features): * complete PCEDT 2.0 documents "wsj_1900"-"wsj_1949" * Czech-English word alignment of coreferential expressions annotated manually mainly on the t-layer + Russian translations of the original English sentences + automatic tokenization, part-of-speech tagging and morphological analysis for Russian + automatic word alignment between all Czech and Russian words + manual alignment between Russian and the other two languages on possessive pronouns
dc.language.iso eng
dc.language.iso ces
dc.language.iso rus
dc.publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.rights CC-BY-NC-SA + LDC99T42
dc.rights.uri https://lindat.mff.cuni.cz/repository/xmlui/page/license-pcedt2
dc.subject multilingual
dc.subject coreference
dc.title Prague Czech-English Dependency Treebank 2.0 - Russian translation
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
dc.rights.label RES
has.files yes
branding LINDAT / CLARIAH-CZ
contact.person Michal Novák mnovak@ufal.mff.cuni.cz ÚFAL MFF UK
sponsor Czech Science Foundation GA 16-05394S Structure of coreferential chains in parallel language data nationalFunds
sponsor The Charles University Grant Agency GAUK 338915 Cross-lingual approaches to coreference resolution nationalFunds
sponsor Ministerstvo školství, mládeže a tělovýchovy České republiky LH14011 Vícejazyčná korpusová anotace nationalFunds
sponsor Ministerstvo školství, mládeže a tělovýchovy České republiky LM2015071 LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat nationalFunds
sponsor Univerzita Karlova (mimo GAUK) SVV 260 333 Specifický vysokoškolský výzkum nationalFunds
size.info 1127 sentences
files.size 7212088
files.count 1


 Soubory tohoto záznamu

Licenční kategorie:
Restricted Use

Licence: CC-BY-NC-SA + LDC99T42
Distributed under Creative Commons Attribution Required Noncommercial Share Alike
Icon
Název
pcedt-r.zip
Velikost
6.88 MB
Formát
application/zip
Popis
data
MD5
6022c87f5ecd29e457341438fae73166
 Stáhnout soubor

Zobrazit minimální záznam