dc.contributor.author | Bojar, Ondřej |
dc.contributor.author | Macháček, Matouš |
dc.contributor.author | Tamchyna, Aleš |
dc.contributor.author | Zeman, Daniel |
dc.date.accessioned | 2013-12-10T13:41:44Z |
dc.date.available | 2013-12-10T13:41:44Z |
dc.date.issued | 2013-09-01 |
dc.identifier.uri | http://hdl.handle.net/11858/00-097C-0000-0023-10B2-F |
dc.description | This dataset contains the whole set of very many Czech translations for 50 English source sentences coming from WMT11 test set (http://www.statmt.org/wmt11). In total, there are 15431447 Czech sentences, i.e. 300k reference translations per source English sentence on average, but the exact number greatly varies across sentences. You can find more details in included README file. If you use this dataset, please cite the following paper which describes the technique used to construct the Czech translations: Bojar Ondřej, Macháček Matouš, Tamchyna Aleš, Zeman Daniel: Scratching the Surface of Possible Translations. Lecture Notes in Computer Science, Vol. 8082, Text, Speech and Dialogue: 16th International Conference, TSD 2013. Proceedings, Copyright © Springer Verlag, Berlin / Heidelberg, ISBN 978-3-642-40584-6, ISSN 0302-9743, pp. 465-474, 2013, DOI: 10.1007/978-3-642-40585-3_59 |
dc.description.sponsorship | P406/11/1499 of the Grant Agency of the Czech Republic, FP7-ICT-2011-7-288487 (MosesCore) of the European Union and 1356213 of the Grant Agency of the Charles University |
dc.language.iso | ces |
dc.publisher | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) |
dc.relation | info:eu-repo/grantAgreement/EC/FP7/288487 |
dc.rights | Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) |
dc.rights.uri | http://creativecommons.org/licenses/by-sa/3.0/ |
dc.subject | machine translation |
dc.subject | automatic machine translation evaluation |
dc.subject | reference translation |
dc.title | Many Czech References for 50 Sentences Selected from WMT11 Data |
dc.type | corpus |
metashare.ResourceInfo#ContactInfo#PersonInfo.surname | Macháček |
metashare.ResourceInfo#ContactInfo#PersonInfo.givenName | Matouš |
metashare.ResourceInfo#ContactInfo#PersonInfo#OrganizationInfo.organizationName | Charles University in Prague, UFAL |
metashare.ResourceInfo#DistributionInfo.availability | restrictedUse |
metashare.ResourceInfo#DistributionInfo#LicenseInfo.restrictionsOfUse | attribution |
metashare.ResourceInfo#DistributionInfo#LicenseInfo.restrictionsOfUse | shareAlike |
metashare.ResourceInfo#DistributionInfo#LicenseInfo.distributionAccessMedium | downloadable |
metashare.ResourceInfo#ContentInfo.mediaType | text |
metashare.ResourceInfo#TextInfo#SizeInfo.size | 15431447 |
metashare.ResourceInfo#TextInfo#SizeInfo.sizeUnit | sentences |
metashare.ResourceInfo#ContactInfo#PersonInfo#OrganizationInfo#CommunicationInfo.email | machacekmatous@gmail.com |
dc.rights.label | PUB |
has.files | yes |
branding | LINDAT / CLARIAH-CZ |
sponsor | Grantová agentura České republiky GAP406/11/1499 Čeština ve věku strojového překladu nationalFunds |
sponsor | European Union FP7-ICT-2011-7-288487 MosesCore euFunds info:eu-repo/grantAgreement/EC/FP7/288487 |
sponsor | Grantová agentura Univerzity Karlovy v Praze GAUK 13562/2013 Využití mnohonásobných referencí ve strojovém překladu nationalFunds |
size.info | 15431447 sentences |
files.size | 122537300 |
files.count | 2 |
Files in this item
Download all files in item (116.86 MB)This item is
Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
Publicly Available
and licensed under:Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
- Name
- many-czech-references.zip
- Size
- 116.86 MB
- Format
- application/zip
- Description
- zip archive containing many references, english source sentences, official wmt11 czech reference translations and README
- MD5
- f26e17e4d25333948cea1fa0cbb79e96
- many-czech-references
- cs.txt6 kB
- many-references-datafile2 GB
- en.txt6 kB
- README1 kB
- Name
- README
- Size
- 1.78 KB
- Format
- Unknown
- Description
- copy of the README file from the main archive
- MD5
- c46fd2011b3dac0df3bee31044b02914