WMT17 Quality Estimation Shared Task Training and Development Data

Name: WMT17 Quality Estimation Shared Task Training and Development Data
License: https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21

Specia, Lucia; Logacheva, Varvara

Show simple item record

dc.contributor.author	Specia, Lucia
dc.contributor.author	Logacheva, Varvara
dc.date.accessioned	2017-02-27T17:42:20Z
dc.date.available	2017-02-27T17:42:20Z
dc.date.issued	2017-02-27
dc.identifier.uri	http://hdl.handle.net/11372/LRT-1974
dc.description	Training and development data for the WMT17 QE task. Test data will be published as a separate item. This shared task will build on its previous five editions to further examine automatic methods for estimating the quality of machine translation output at run-time, without relying on reference translations. We include word-level, phrase-level and sentence-level estimation. All tasks will make use of a large dataset produced from post-editions by professional translators. The data will be domain-specific (IT and Pharmaceutical domains) and substantially larger than in previous years. In addition to advancing the state of the art at all prediction levels, our goals include: - To test the effectiveness of larger (domain-specific and professionally annotated) datasets. We will do so by increasing the size of one of last year's training sets. - To study the effect of language direction and domain. We will do so by providing two datasets created in similar ways, but for different domains and language directions. - To investigate the utility of detailed information logged during post-editing. We will do so by providing post-editing time, keystrokes, and actual edits. This year's shared task provides new training and test datasets for all tasks, and allows participants to explore any additional data and resources deemed relevant. A in-house MT system was used to produce translations for all tasks. MT system-dependent information can be made available under request. The data is publicly available but since it has been provided by our industry partners it is subject to specific terms and conditions. However, these have no practical implications on the use of this data for research purposes.
dc.language.iso	eng
dc.language.iso	deu
dc.publisher	University of Sheffield
dc.relation	info:eu-repo/grantAgreement/EC/H2020/645452
dc.relation.replaces	http://hdl.handle.net/11372/LRT-1646
dc.relation.isreplacedby	http://hdl.handle.net/11372/LRT-2619
dc.rights	AGREEMENT ON THE USE OF DATA IN QT21
dc.rights.uri	https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21
dc.source.uri	http://www.statmt.org/wmt17/quality-estimation-task.html
dc.subject	machine translation
dc.subject	quality estimation
dc.subject	machine learning
dc.title	WMT17 Quality Estimation Shared Task Training and Development Data
dc.type	corpus
metashare.ResourceInfo#ContentInfo.mediaType	text
dc.rights.label	PUB
hidden	false
hasMetadata	false
has.files	yes
branding	LRT + Open Submissions
contact.person	Lucia Specia l.specia@sheffield.ac.uk University of Sheffield
sponsor	European Union H2020-ICT-2014-1-645452 QT21: Quality Translation 21 euFunds info:eu-repo/grantAgreement/EC/H2020/645452
files.size	94632740
files.count	8