Show simple item record

 
dc.contributor.author Klyueva, Natalia
dc.contributor.author Bojar, Ondřej
dc.date.accessioned 2011-06-28T10:42:32Z
dc.date.available 2008-10-02T00:00:00Z
dc.date.issued 2008-10-02
dc.identifier.uri http://hdl.handle.net/11858/00-097C-0000-0001-4909-7
dc.description UMC 0.1 Czech-English-Russian is a multilingual parallel corpus of texts in Czech, Russian and English languages with automatic pairwise sentence alignments. The primary aim of UMC is to extend the set of languages covered by the corpus CzEng mainly for the purposes of machine translation. All the texts were downloaded from a single source — The Project Syndicate (Copyright: Project Syndicate 1995-2008), which contains a huge collection of high-quality news articles and commentaries. We were given the permission to use the texts for research and non-commercial purposes.
dc.description.sponsorship FP6-IST-5-034291-STP (EuroMatrix)
dc.language.iso ces
dc.publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.rights Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0)
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/3.0/
dc.source.uri http://ufal.mff.cuni.cz/umc/cer
dc.subject multi-language corpus
dc.title UMC 0.1: Czech-Russian-English Multilingual Corpus
dc.type corpus
metashare.ResourceInfo#ContactInfo#PersonInfo.surname Straňák
metashare.ResourceInfo#ContactInfo#PersonInfo.givenName Pavel
metashare.ResourceInfo#ContactInfo#PersonInfo#OrganizationInfo.organizationName Charles University in Prague, UFAL
metashare.ResourceInfo#DistributionInfo.availability unrestrictedUse
metashare.ResourceInfo#DistributionInfo#LicenseInfo.restrictionsOfUse academicUse/nonCommercialUse
metashare.ResourceInfo#DistributionInfo#LicenseInfo.distributionAccessMedium download
metashare.ResourceInfo#ValidationInfo.validated True
metashare.ResourceInfo#ResourceCreationInfo#FundingInfo#ProjectInfo.projectName EuroMatrix
metashare.ResourceInfo#ResourceCreationInfo#FundingInfo#ProjectInfo.fundingType EU
metashare.ResourceInfo#ContentInfo.mediaType text
metashare.ResourceInfo#TextInfo#LanguageInfo.languageCoding ces
metashare.ResourceInfo#TextInfo#SizeInfo.size 1800000
metashare.ResourceInfo#TextInfo#SizeInfo.sizeUnit words
metashare.ResourceInfo#ContactInfo#PersonInfo#OrganizationInfo#CommunicationInfo.email stranak@ufal.mff.cuni.cz
dc.rights.label PUB
has.files yes
branding LINDAT / CLARIAH-CZ
sponsor European Union FP6-IST-5-034291-STP Euromatrix euFunds
size.info 1800000 words
files.size 25537868
files.count 1
featuredService.kontext Czech-Russian|http://lindat.mff.cuni.cz/services/kontext/run.cgi/first_form?corpname=umc_01_cs_m
featuredService.kontext English-Russian|http://lindat.mff.cuni.cz/services/kontext/run.cgi/first_form?corpname=umc_01_enru_en_m
featuredService.kontext Russian-Czech|http://lindat.mff.cuni.cz/services/kontext/run.cgi/first_form?corpname=umc_01_ru_m
featuredService.kontext Russian-English|http://lindat.mff.cuni.cz/services/kontext/run.cgi/first_form?corpname=umc_01_enru_ru_m


 Files in this item

This item is
Publicly Available
and licensed under:
Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0)
Distributed under Creative Commons Attribution Required Noncommercial No Derivative Works
Icon
Name
Czech-Russian-tagged.gz
Size
24.35 MB
Format
application/x-gzip
Description
Tokenized, lemmatized and morphologically tagged data in Czech and Russian (88.093 sentences aligned one-to-one)
MD5
38d599d84181408bdadcc31c2c147140
 Download file

Show simple item record