UMC 0.1: Czech-Russian-English Multilingual Corpus
Please use the following text to cite this item or export to a predefined format:
Klyueva, Natalia and Bojar, Ondřej, 2008,
UMC 0.1: Czech-Russian-English Multilingual Corpus, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11858/00-097C-0000-0001-4909-7.
Authors
Item identifier
Project URL
Date issued
2008-10-02
Size
1800000 words
Language(s)
Description
UMC 0.1 Czech-English-Russian is a multilingual parallel corpus of texts in Czech, Russian and English languages with automatic pairwise sentence alignments. The primary aim of UMC is to extend the set of languages covered by the corpus CzEng mainly for the purposes of machine translation.
All the texts were downloaded from a single source — The Project Syndicate (Copyright: Project Syndicate 1995-2008), which contains a huge collection of high-quality news articles and commentaries. We were given the permission to use the texts for research and non-commercial purposes.
Acknowledgement
European Union
Project code:FP6-IST-5-034291-STP
Project name:Euromatrix
Subject(s)
Collections
This item isPublicly Available
and licensed under:


