Prague Czech-English Dependency Treebank 2.0 - Russian translation
Please use the following text to cite this item or export to a predefined format:
Novák, Michal; Nedoluzhko, Anna and Schwarz (Khoroshkina), Anna, 2016,
Prague Czech-English Dependency Treebank 2.0 - Russian translation, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11234/1-1791.
Authors
Item identifier
Date issued
2016-09-30
Size
1127 sentences
Description
Prague Czech-English Dependency Treebank - Russian translation (PCEDT-R) is a project of translating a subset of Prague Czech-English Dependency Treebank 2.0 (PCEDT 2.0) to Russian and linguistically annotating the Russian translations with emphasis on coreference and cross-lingual alignment of coreferential expressions. Cross-lingual comparison of coreference means is currently the purpose that drives development of this corpus.
The current version 0.5 is a preliminary version, which contains (+ denotes new features):
* complete PCEDT 2.0 documents "wsj_1900"-"wsj_1949"
* Czech-English word alignment of coreferential expressions annotated manually mainly on the t-layer
+ Russian translations of the original English sentences
+ automatic tokenization, part-of-speech tagging and morphological analysis for Russian
+ automatic word alignment between all Czech and Russian words
+ manual alignment between Russian and the other two languages on possessive pronouns
Acknowledgement
Czech Science Foundation
Project code:GA 16-05394S
Project name:Structure of coreferential chains in parallel language data
The Charles University Grant Agency
Project code:GAUK 338915
Project name:Cross-lingual approaches to coreference resolution
Ministerstvo školství, mládeže a tělovýchovy České republiky
Project code:LH14011
Project name:Vícejazyčná korpusová anotace
Ministerstvo školství, mládeže a tělovýchovy České republiky
Project code:LM2015071
Project name:LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat
Univerzita Karlova (mimo GAUK)
Project code:SVV 260 333
Project name:Specifický vysokoškolský výzkum
Subject(s)
Collections
Files in this item
- Name
- pcedt-r.zip
- Size
- 6.88 MB
- Format
- application/zip
- Description
- Zip
- MD5
- 6022c87f5ecd29e457341438fae73166

- pcedt-r
- DOCUMENTATION10 kB
- README3 kB
- data
- wsj_1921.treex.gz31 kB
- wsj_1918.treex.gz48 kB
- wsj_1926.treex.gz103 kB
- wsj_1934.treex.gz139 kB
- wsj_1942.treex.gz27 kB
- wsj_1939.treex.gz292 kB
- wsj_1947.treex.gz107 kB
- wsj_1901.treex.gz60 kB
- wsj_1906.treex.gz56 kB
- wsj_1914.treex.gz45 kB
- wsj_1922.treex.gz144 kB
- wsj_1930.treex.gz161 kB
- wsj_1919.treex.gz67 kB
- wsj_1927.treex.gz284 kB
- wsj_1935.treex.gz286 kB
- wsj_1943.treex.gz95 kB
- wsj_1948.treex.gz114 kB
- wsj_1902.treex.gz43 kB
- wsj_1910.treex.gz47 kB
- wsj_1907.treex.gz46 kB
- wsj_1915.treex.gz926 kB
- wsj_1923.treex.gz72 kB
- wsj_1931.treex.gz184 kB
- wsj_1928.treex.gz315 kB
- wsj_1936.treex.gz388 kB
- wsj_1944.treex.gz176 kB
- wsj_1949.treex.gz106 kB
- wsj_1903.treex.gz281 kB
- wsj_1911.treex.gz58 kB
- wsj_1908.treex.gz34 kB
- wsj_1916.treex.gz257 kB
- wsj_1924.treex.gz217 kB
- wsj_1932.treex.gz333 kB
- wsj_1929.treex.gz287 kB
- wsj_1940.treex.gz40 kB
- wsj_1937.treex.gz294 kB
- wsj_1945.treex.gz15 kB
- wsj_1904.treex.gz42 kB
- wsj_1912.treex.gz65 kB
- wsj_1909.treex.gz41 kB
- wsj_1920.treex.gz75 kB
- wsj_1917.treex.gz54 kB
- wsj_1925.treex.gz27 kB
- wsj_1933.treex.gz20 kB
- wsj_1941.treex.gz21 kB
- wsj_1938.treex.gz59 kB
- wsj_1946.treex.gz274 kB
- wsj_1900.treex.gz48 kB
- wsj_1905.treex.gz49 kB
- wsj_1913.treex.gz52 kB
- resources
- treex_subschema_t_layer.xml26 kB
- treex_subschema_bbn.xml3 kB
- treex_subschema_w_layer.xml1 kB
- treex_schema.xml4 kB
- treex_subschema_langcodes.xml10 kB
- treex_subschema_n_layer.xml1 kB
- treex_subschema_interset.xml3 kB
- treex_subschema_a_layer.xml11 kB
- treex_subschema_p_layer.xml5 kB

