dc.contributor.author | Klyueva, Natalia |
dc.contributor.author | Bojar, Ondřej |
dc.date.accessioned | 2011-06-28T10:42:32Z |
dc.date.available | 2008-10-02T00:00:00Z |
dc.date.issued | 2008-10-02 |
dc.identifier.uri | http://hdl.handle.net/11858/00-097C-0000-0001-4909-7 |
dc.description | UMC 0.1 Czech-English-Russian is a multilingual parallel corpus of texts in Czech, Russian and English languages with automatic pairwise sentence alignments. The primary aim of UMC is to extend the set of languages covered by the corpus CzEng mainly for the purposes of machine translation. All the texts were downloaded from a single source — The Project Syndicate (Copyright: Project Syndicate 1995-2008), which contains a huge collection of high-quality news articles and commentaries. We were given the permission to use the texts for research and non-commercial purposes. |
dc.description.sponsorship | FP6-IST-5-034291-STP (EuroMatrix) |
dc.language.iso | ces |
dc.publisher | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) |
dc.rights | Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0) |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/ |
dc.source.uri | http://ufal.mff.cuni.cz/umc/cer |
dc.subject | multi-language corpus |
dc.title | UMC 0.1: Czech-Russian-English Multilingual Corpus |
dc.type | corpus |
metashare.ResourceInfo#ContactInfo#PersonInfo.surname | Straňák |
metashare.ResourceInfo#ContactInfo#PersonInfo.givenName | Pavel |
metashare.ResourceInfo#ContactInfo#PersonInfo#OrganizationInfo.organizationName | Charles University in Prague, UFAL |
metashare.ResourceInfo#DistributionInfo.availability | unrestrictedUse |
metashare.ResourceInfo#DistributionInfo#LicenseInfo.restrictionsOfUse | academicUse/nonCommercialUse |
metashare.ResourceInfo#DistributionInfo#LicenseInfo.distributionAccessMedium | download |
metashare.ResourceInfo#ValidationInfo.validated | True |
metashare.ResourceInfo#ResourceCreationInfo#FundingInfo#ProjectInfo.projectName | EuroMatrix |
metashare.ResourceInfo#ResourceCreationInfo#FundingInfo#ProjectInfo.fundingType | EU |
metashare.ResourceInfo#ContentInfo.mediaType | text |
metashare.ResourceInfo#TextInfo#LanguageInfo.languageCoding | ces |
metashare.ResourceInfo#TextInfo#SizeInfo.size | 1800000 |
metashare.ResourceInfo#TextInfo#SizeInfo.sizeUnit | words |
metashare.ResourceInfo#ContactInfo#PersonInfo#OrganizationInfo#CommunicationInfo.email | stranak@ufal.mff.cuni.cz |
dc.rights.label | PUB |
has.files | yes |
branding | LINDAT / CLARIAH-CZ |
sponsor | European Union FP6-IST-5-034291-STP Euromatrix euFunds |
size.info | 1800000 words |
files.size | 25537868 |
files.count | 1 |
featuredService.kontext | Czech-Russian|http://lindat.mff.cuni.cz/services/kontext/run.cgi/first_form?corpname=umc_01_cs_m |
featuredService.kontext | English-Russian|http://lindat.mff.cuni.cz/services/kontext/run.cgi/first_form?corpname=umc_01_enru_en_m |
featuredService.kontext | Russian-Czech|http://lindat.mff.cuni.cz/services/kontext/run.cgi/first_form?corpname=umc_01_ru_m |
featuredService.kontext | Russian-English|http://lindat.mff.cuni.cz/services/kontext/run.cgi/first_form?corpname=umc_01_enru_ru_m |
Files in this item
This item is
Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0)
Publicly Available
and licensed under:Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0)
- Name
- Czech-Russian-tagged.gz
- Size
- 24.35 MB
- Format
- application/x-gzip
- Description
- Tokenized, lemmatized and morphologically tagged data in Czech and Russian (88.093 sentences aligned one-to-one)
- MD5
- 38d599d84181408bdadcc31c2c147140