This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.
 

WMT16 Quality Estimation Shared Task Training and Development Data

Please use the following text to cite this item or export to a predefined format:
Specia, Lucia; Logacheva, Varvara and Scarton, Carolina, 2016, WMT16 Quality Estimation Shared Task Training and Development Data, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11372/LRT-1646.
Date issued
2016-02-29
Size
3.59 mb
Language(s)
Description
Training and development data for the WMT16 QE task. Test data will be published as a separate item. This shared task will build on its previous four editions to further examine automatic methods for estimating the quality of machine translation output at run-time, without relying on reference translations. We include word-level, sentence-level and document-level estimation. The sentence and word-level tasks will explore a large dataset produced from post-editions by professional translators (as opposed to crowdsourced translations as in the previous year). For the first time, the data will be domain-specific (IT domain). The document-level task will use, for the first time, entire documents, which have been human annotated for quality indirectly in two ways: through reading comprehension tests and through a two-stage post-editing exercise. Our tasks have the following goals: - To advance work on sentence and word-level quality estimation by providing domain-specific, larger and professionally annotated datasets. - To study the utility of detailed information logged during post-editing (time, keystrokes, actual edits) for different levels of prediction. - To analyse the effectiveness of different types of quality labels provided by humans for longer texts in document-level prediction. This year's shared task provides new training and test datasets for all tasks, and allows participants to explore any additional data and resources deemed relevant. A in-house MT system was used to produce translations for the sentence and word-level tasks, and multiple MT systems were used to produce translations for the document-level task. Therefore, MT system-dependent information will be made available where possible.
Acknowledgement

Version History

Showing 1 - 3 out of 3 results
VersionDateSummary
2018-02-19 00:00:00
2017-02-27 00:00:00
2*
2016-02-29 00:00:00
* Selected version
This item isPublicly Available
and licensed under:
 Files in this item
Name
task2_en-de_dev.tar.gz
Size
151.85 KB
Format
application/x-gzip
Description
gzip Archive
MD5
0cd094f32230270a574c699a159be511
Preview
  File Preview
Name
task2_en-de_training.tar.gz
Size
1.41 MB
Format
application/x-gzip
Description
gzip Archive
MD5
dce4f0caaabc23238b6c130124354e05
Preview
  File Preview
Name
task2p_en-de_dev.tar.gz
Size
155.47 KB
Format
application/x-gzip
Description
gzip Archive
MD5
21eb5d1cd5d502ec26d45f3598d7979e
Preview
  File Preview
Name
task2p_en-de_training.tar.gz
Size
1.16 MB
Format
application/x-gzip
Description
gzip Archive
MD5
cac2a8ac2af8530f2e8b652a25747c90
Preview
  File Preview
Name
task1_en-de_training.tar.gz
Size
1.18 MB
Format
application/x-gzip
Description
gzip Archive
MD5
3bc5ce125cb0707e997dfb1054609b26
Preview
  File Preview
Name
task1_en-de_dev.tar.gz
Size
105.49 KB
Format
application/x-gzip
Description
gzip Archive
MD5
78db4b9bc1b104ccbfbd73c3b4a4f7cd
Preview
  File Preview
Name
task1_en-de_training-dev-test-additional.tar.gz
Size
575.8 KB
Format
application/x-gzip
Description
gzip Archive
MD5
124345a8f6783e47b498d18393faf065
Preview
  File Preview