This is not the latest version of this item. The latest version can be found here.
WMT16 Quality Estimation Shared Task Training and Development Data
Please use the following text to cite this item or export to a predefined format:
Specia, Lucia; Logacheva, Varvara and Scarton, Carolina, 2016,
WMT16 Quality Estimation Shared Task Training and Development Data, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11372/LRT-1646.
Authors
Item identifier
Date issued
2016-02-29
Size
3.59 mb
Description
Training and development data for the WMT16 QE task. Test data will be published as a separate item.
This shared task will build on its previous four editions to further examine automatic methods for estimating the quality of machine translation output at run-time, without relying on reference translations. We include word-level, sentence-level and document-level estimation. The sentence and word-level tasks will explore a large dataset produced from post-editions by professional translators (as opposed to crowdsourced translations as in the previous year). For the first time, the data will be domain-specific (IT domain). The document-level task will use, for the first time, entire documents, which have been human annotated for quality indirectly in two ways: through reading comprehension tests and through a two-stage post-editing exercise. Our tasks have the following goals:
- To advance work on sentence and word-level quality estimation by providing domain-specific, larger and professionally annotated datasets.
- To study the utility of detailed information logged during post-editing (time, keystrokes, actual edits) for different levels of prediction.
- To analyse the effectiveness of different types of quality labels provided by humans for longer texts in document-level prediction.
This year's shared task provides new training and test datasets for all tasks, and allows participants to explore any additional data and resources deemed relevant. A in-house MT system was used to produce translations for the sentence and word-level tasks, and multiple MT systems were used to produce translations for the document-level task. Therefore, MT system-dependent information will be made available where possible.
Publisher
Acknowledgement
European Union
Project code:H2020-ICT-2014-1-645452
Project name:QT21: Quality Translation 21
Collections
Files in this item
- Name
- task2_en-de_dev.tar.gz
- Size
- 151.85 KB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 0cd094f32230270a574c699a159be511

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- task2_en-de_training.tar.gz
- Size
- 1.41 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- dce4f0caaabc23238b6c130124354e05

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- task2p_en-de_dev.tar.gz
- Size
- 155.47 KB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 21eb5d1cd5d502ec26d45f3598d7979e

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- task2p_en-de_training.tar.gz
- Size
- 1.16 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- cac2a8ac2af8530f2e8b652a25747c90

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- task1_en-de_training.tar.gz
- Size
- 1.18 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 3bc5ce125cb0707e997dfb1054609b26

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- task1_en-de_dev.tar.gz
- Size
- 105.49 KB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 78db4b9bc1b104ccbfbd73c3b4a4f7cd

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- task1_en-de_training-dev-test-additional.tar.gz
- Size
- 575.8 KB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 124345a8f6783e47b498d18393faf065

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz

