Show simple item record

 
dc.contributor.author Bojar, Ondřej
dc.contributor.author Macháček, Matouš
dc.contributor.author Tamchyna, Aleš
dc.contributor.author Zeman, Daniel
dc.date.accessioned 2013-12-10T13:41:44Z
dc.date.available 2013-12-10T13:41:44Z
dc.date.issued 2013-09-01
dc.identifier.uri http://hdl.handle.net/11858/00-097C-0000-0023-10B2-F
dc.description This dataset contains the whole set of very many Czech translations for 50 English source sentences coming from WMT11 test set (http://www.statmt.org/wmt11). In total, there are 15431447 Czech sentences, i.e. 300k reference translations per source English sentence on average, but the exact number greatly varies across sentences. You can find more details in included README file. If you use this dataset, please cite the following paper which describes the technique used to construct the Czech translations: Bojar Ondřej, Macháček Matouš, Tamchyna Aleš, Zeman Daniel: Scratching the Surface of Possible Translations. Lecture Notes in Computer Science, Vol. 8082, Text, Speech and Dialogue: 16th International Conference, TSD 2013. Proceedings, Copyright © Springer Verlag, Berlin / Heidelberg, ISBN 978-3-642-40584-6, ISSN 0302-9743, pp. 465-474, 2013, DOI: 10.1007/978-3-642-40585-3_59
dc.description.sponsorship P406/11/1499 of the Grant Agency of the Czech Republic, FP7-ICT-2011-7-288487 (MosesCore) of the European Union and 1356213 of the Grant Agency of the Charles University
dc.language.iso ces
dc.publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.relation info:eu-repo/grantAgreement/EC/FP7/288487
dc.rights Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
dc.rights.uri http://creativecommons.org/licenses/by-sa/3.0/
dc.subject machine translation
dc.subject automatic machine translation evaluation
dc.subject reference translation
dc.title Many Czech References for 50 Sentences Selected from WMT11 Data
dc.type corpus
metashare.ResourceInfo#ContactInfo#PersonInfo.surname Macháček
metashare.ResourceInfo#ContactInfo#PersonInfo.givenName Matouš
metashare.ResourceInfo#ContactInfo#PersonInfo#OrganizationInfo.organizationName Charles University in Prague, UFAL
metashare.ResourceInfo#DistributionInfo.availability restrictedUse
metashare.ResourceInfo#DistributionInfo#LicenseInfo.restrictionsOfUse attribution
metashare.ResourceInfo#DistributionInfo#LicenseInfo.restrictionsOfUse shareAlike
metashare.ResourceInfo#DistributionInfo#LicenseInfo.distributionAccessMedium downloadable
metashare.ResourceInfo#ContentInfo.mediaType text
metashare.ResourceInfo#TextInfo#SizeInfo.size 15431447
metashare.ResourceInfo#TextInfo#SizeInfo.sizeUnit sentences
metashare.ResourceInfo#ContactInfo#PersonInfo#OrganizationInfo#CommunicationInfo.email machacekmatous@gmail.com
dc.rights.label PUB
has.files yes
branding LINDAT / CLARIAH-CZ
sponsor Grantová agentura České republiky GAP406/11/1499 Čeština ve věku strojového překladu nationalFunds
sponsor European Union FP7-ICT-2011-7-288487 MosesCore euFunds info:eu-repo/grantAgreement/EC/FP7/288487
sponsor Grantová agentura Univerzity Karlovy v Praze GAUK 13562/2013 Využití mnohonásobných referencí ve strojovém překladu nationalFunds
size.info 15431447 sentences
files.size 122537300
files.count 2


 Files in this item

 Download all files in item (116.86 MB)
This item is
Publicly Available
and licensed under:
Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
many-czech-references.zip
Size
116.86 MB
Format
application/zip
Description
zip archive containing many references, english source sentences, official wmt11 czech reference translations and README
MD5
f26e17e4d25333948cea1fa0cbb79e96
 Download file  Preview
 File Preview  
Icon
Name
README
Size
1.78 KB
Format
Unknown
Description
copy of the README file from the main archive
MD5
c46fd2011b3dac0df3bee31044b02914
 Download file

Show simple item record