Zobrazit minimální záznam

 
dc.contributor.author Specia, Lucia
dc.contributor.author Logacheva, Varvara
dc.date.accessioned 2017-02-27T17:42:20Z
dc.date.available 2017-02-27T17:42:20Z
dc.date.issued 2017-02-27
dc.identifier.uri http://hdl.handle.net/11372/LRT-1974
dc.description Training and development data for the WMT17 QE task. Test data will be published as a separate item. This shared task will build on its previous five editions to further examine automatic methods for estimating the quality of machine translation output at run-time, without relying on reference translations. We include word-level, phrase-level and sentence-level estimation. All tasks will make use of a large dataset produced from post-editions by professional translators. The data will be domain-specific (IT and Pharmaceutical domains) and substantially larger than in previous years. In addition to advancing the state of the art at all prediction levels, our goals include: - To test the effectiveness of larger (domain-specific and professionally annotated) datasets. We will do so by increasing the size of one of last year's training sets. - To study the effect of language direction and domain. We will do so by providing two datasets created in similar ways, but for different domains and language directions. - To investigate the utility of detailed information logged during post-editing. We will do so by providing post-editing time, keystrokes, and actual edits. This year's shared task provides new training and test datasets for all tasks, and allows participants to explore any additional data and resources deemed relevant. A in-house MT system was used to produce translations for all tasks. MT system-dependent information can be made available under request. The data is publicly available but since it has been provided by our industry partners it is subject to specific terms and conditions. However, these have no practical implications on the use of this data for research purposes.
dc.language.iso eng
dc.language.iso deu
dc.publisher University of Sheffield
dc.relation info:eu-repo/grantAgreement/EC/H2020/645452
dc.relation.replaces http://hdl.handle.net/11372/LRT-1646
dc.relation.isreplacedby http://hdl.handle.net/11372/LRT-2619
dc.rights AGREEMENT ON THE USE OF DATA IN QT21
dc.rights.uri https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21
dc.source.uri http://www.statmt.org/wmt17/quality-estimation-task.html
dc.subject machine translation
dc.subject quality estimation
dc.subject machine learning
dc.title WMT17 Quality Estimation Shared Task Training and Development Data
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
dc.rights.label PUB
hidden false
hasMetadata false
has.files yes
branding LRT + Open Submissions
contact.person Lucia Specia l.specia@sheffield.ac.uk University of Sheffield
sponsor European Union H2020-ICT-2014-1-645452 QT21: Quality Translation 21 euFunds info:eu-repo/grantAgreement/EC/H2020/645452
files.size 94632740
files.count 8


 Soubory tohoto záznamu

 Stáhnout všechny soubory záznamu (90.25 MB)
Licenční kategorie:
Publicly Available

Licence: AGREEMENT ON THE USE OF DATA IN QT21
Icon
Název
task1_en-de_training-dev.tar.gz
Velikost
3.12 MB
Formát
application/x-gzip
Popis
task1 en-de train-dev
MD5
55b93574e0b049a21ad0b0ee291480d6
 Stáhnout soubor
Icon
Název
task1_de-en_training-dev_corrected_version.tar.gz
Velikost
3.79 MB
Formát
application/x-gzip
Popis
Corrected version (10/04/2017) of task1 de-en training-dev
MD5
702eedc063fe2dfdcb7ef45244e52c12
 Stáhnout soubor
Icon
Název
broken_task1_de-en_training-dev.tar.gz
Velikost
3.76 MB
Formát
application/x-gzip
Popis
Incorrect task1 de-en train-dev
MD5
b8e1d96c9ed1110ae94124e62efa3b2f
 Stáhnout soubor
Icon
Název
task2_en-de_training-dev.tar.gz
Velikost
17.41 MB
Formát
application/x-gzip
Popis
task2 en-de train-dev
MD5
fc421313fc4e6331e9cc99e5dc4901f1
 Stáhnout soubor
Icon
Název
task2_de-en_training-dev.tar.gz
Velikost
22.41 MB
Formát
application/x-gzip
Popis
task2 de-en train-dev
MD5
6778bccb466c68909d58c701d65f10dd
 Stáhnout soubor
Icon
Název
task3_en-de_training-dev.tar.gz
Velikost
18.58 MB
Formát
application/x-gzip
Popis
task3 en-de train-dev
MD5
ccd87805caa2c5d3ef391386bd833417
 Stáhnout soubor
Icon
Název
task3_de-en_training-dev.tar.gz
Velikost
18.31 MB
Formát
application/x-gzip
Popis
task3 de-en train-dev
MD5
dfec4d0f045c5461ccd32f4c163ab922
 Stáhnout soubor
Icon
Název
task3b_de-en_training-dev.tar.gz
Velikost
2.86 MB
Formát
application/x-gzip
Popis
task3b de-en training-dev
MD5
5d09379f7f1271ed409495653ebe0b5f
 Stáhnout soubor

Zobrazit minimální záznam