dc.contributor.author |
Specia, Lucia |
dc.contributor.author |
Logacheva, Varvara |
dc.contributor.author |
Scarton, Carolina |
dc.date.accessioned |
2016-02-29T13:37:23Z |
dc.date.available |
2016-02-29T13:37:23Z |
dc.date.issued |
2016-02-29 |
dc.identifier.uri |
http://hdl.handle.net/11372/LRT-1646 |
dc.description |
Training and development data for the WMT16 QE task. Test data will be published as a separate item.
This shared task will build on its previous four editions to further examine automatic methods for estimating the quality of machine translation output at run-time, without relying on reference translations. We include word-level, sentence-level and document-level estimation. The sentence and word-level tasks will explore a large dataset produced from post-editions by professional translators (as opposed to crowdsourced translations as in the previous year). For the first time, the data will be domain-specific (IT domain). The document-level task will use, for the first time, entire documents, which have been human annotated for quality indirectly in two ways: through reading comprehension tests and through a two-stage post-editing exercise. Our tasks have the following goals:
- To advance work on sentence and word-level quality estimation by providing domain-specific, larger and professionally annotated datasets.
- To study the utility of detailed information logged during post-editing (time, keystrokes, actual edits) for different levels of prediction.
- To analyse the effectiveness of different types of quality labels provided by humans for longer texts in document-level prediction.
This year's shared task provides new training and test datasets for all tasks, and allows participants to explore any additional data and resources deemed relevant. A in-house MT system was used to produce translations for the sentence and word-level tasks, and multiple MT systems were used to produce translations for the document-level task. Therefore, MT system-dependent information will be made available where possible. |
dc.language.iso |
eng |
dc.language.iso |
deu |
dc.publisher |
University of Sheffield |
dc.relation |
info:eu-repo/grantAgreement/EC/H2020/645452 |
dc.relation.replaces |
http://hdl.handle.net/11372/LRT-1631 |
dc.relation.isreplacedby |
http://hdl.handle.net/11372/LRT-1974 |
dc.rights |
AGREEMENT ON THE USE OF DATA IN QT21 |
dc.rights.uri |
https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21 |
dc.source.uri |
http://www.statmt.org/wmt16/quality-estimation-task.html |
dc.subject |
machine translation |
dc.subject |
quality estimation |
dc.subject |
machine learning |
dc.title |
WMT16 Quality Estimation Shared Task Training and Development Data |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
dc.rights.label |
PUB |
hidden |
false |
hasMetadata |
false |
has.files |
yes |
branding |
LRT + Open Submissions |
contact.person |
Lucia Specia l.specia@sheffield.ac.uk University of Sheffield |
sponsor |
European Union H2020-ICT-2014-1-645452 QT21: Quality Translation 21 euFunds info:eu-repo/grantAgreement/EC/H2020/645452 |
size.info |
3.59 mb |
files.size |
4949533 |
files.count |
7 |