This is not the latest version of this item. The latest version can be found here.
WMT17 Quality Estimation Shared Test Data
Please use the following text to cite this item or export to a predefined format:
Specia, Lucia and Logacheva, Varvara, 2017,
WMT17 Quality Estimation Shared Test Data, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11372/LRT-2135.
Authors
Item identifier
Date issued
2017-04-13
Description
Test data for the WMT17 QE task. Train data can be downloaded from http://hdl.handle.net/11372/LRT-1974
This shared task will build on its previous five editions to further examine automatic methods for estimating the quality of machine translation output at run-time, without relying on reference translations. We include word-level, phrase-level and sentence-level estimation. All tasks will make use of a large dataset produced from post-editions by professional translators. The data will be domain-specific (IT and Pharmaceutical domains) and substantially larger than in previous years. In addition to advancing the state of the art at all prediction levels, our goals include:
- To test the effectiveness of larger (domain-specific and professionally annotated) datasets. We will do so by increasing the size of one of last year's training sets.
- To study the effect of language direction and domain. We will do so by providing two datasets created in similar ways, but for different domains and language directions.
- To investigate the utility of detailed information logged during post-editing. We will do so by providing post-editing time, keystrokes, and actual edits.
This year's shared task provides new training and test datasets for all tasks, and allows participants to explore any additional data and resources deemed relevant. A in-house MT system was used to produce translations for all tasks. MT system-dependent information can be made available under request. The data is publicly available but since it has been provided by our industry partners it is subject to specific terms and conditions. However, these have no practical implications on the use of this data for research purposes.
Publisher
Acknowledgement
European Union
Project code:H2020-ICT-2014-1-645452
Project name:QT21: Quality Translation 21
Collections
Files in this item
- Name
- task3_en-de_test.tar.gz
- Size
- 2.82 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 9340af4cbf4c310126390cee58ce9046

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- task1_de-en_test.tar.gz
- Size
- 229.7 KB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 2e46586526ff3f7c615d00b1ab135c82

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- task1_en-de_test.tar.gz
- Size
- 380 KB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- d2b19fe5cbfb3edf87280c7c15852973

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- task2_en-de_test.tar.gz
- Size
- 2.68 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 8478c5bb3d128be592784cb5063812e3

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- task2_de-en_test.tar.gz
- Size
- 1.45 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 1bdcb31d3a0add4efa060ae0d7d27eb7

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- task3_de-en_test.tar.gz
- Size
- 1.34 MB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- b34ad071b091de8637a0b38083cc3ecc

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- task3b_de-en_test.tar.gz
- Size
- 232.66 KB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 484e0a52a5a75be822655e881ffe7012

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz

