Zobrazit minimální záznam

 
dc.contributor.author Specia, Lucia
dc.contributor.author Logacheva, Varvara
dc.contributor.author Scarton, Carolina
dc.date.accessioned 2016-02-29T13:37:23Z
dc.date.available 2016-02-29T13:37:23Z
dc.date.issued 2016-02-29
dc.identifier.uri http://hdl.handle.net/11372/LRT-1646
dc.description Training and development data for the WMT16 QE task. Test data will be published as a separate item. This shared task will build on its previous four editions to further examine automatic methods for estimating the quality of machine translation output at run-time, without relying on reference translations. We include word-level, sentence-level and document-level estimation. The sentence and word-level tasks will explore a large dataset produced from post-editions by professional translators (as opposed to crowdsourced translations as in the previous year). For the first time, the data will be domain-specific (IT domain). The document-level task will use, for the first time, entire documents, which have been human annotated for quality indirectly in two ways: through reading comprehension tests and through a two-stage post-editing exercise. Our tasks have the following goals: - To advance work on sentence and word-level quality estimation by providing domain-specific, larger and professionally annotated datasets. - To study the utility of detailed information logged during post-editing (time, keystrokes, actual edits) for different levels of prediction. - To analyse the effectiveness of different types of quality labels provided by humans for longer texts in document-level prediction. This year's shared task provides new training and test datasets for all tasks, and allows participants to explore any additional data and resources deemed relevant. A in-house MT system was used to produce translations for the sentence and word-level tasks, and multiple MT systems were used to produce translations for the document-level task. Therefore, MT system-dependent information will be made available where possible.
dc.language.iso eng
dc.language.iso deu
dc.publisher University of Sheffield
dc.relation info:eu-repo/grantAgreement/EC/H2020/645452
dc.relation.replaces http://hdl.handle.net/11372/LRT-1631
dc.relation.isreplacedby http://hdl.handle.net/11372/LRT-1974
dc.rights AGREEMENT ON THE USE OF DATA IN QT21
dc.rights.uri https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21
dc.source.uri http://www.statmt.org/wmt16/quality-estimation-task.html
dc.subject machine translation
dc.subject quality estimation
dc.subject machine learning
dc.title WMT16 Quality Estimation Shared Task Training and Development Data
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
dc.rights.label PUB
hidden false
hasMetadata false
has.files yes
branding LRT + Open Submissions
contact.person Lucia Specia l.specia@sheffield.ac.uk University of Sheffield
sponsor European Union H2020-ICT-2014-1-645452 QT21: Quality Translation 21 euFunds info:eu-repo/grantAgreement/EC/H2020/645452
size.info 3.59 mb
files.size 4949533
files.count 7


 Soubory tohoto záznamu

 Stáhnout všechny soubory záznamu (4.72 MB)
Licenční kategorie:
Publicly Available

Licence: AGREEMENT ON THE USE OF DATA IN QT21
Icon
Název
task1_en-de_training.tar.gz
Velikost
1.18 MB
Formát
application/x-gzip
Popis
Task 1 en-de training
MD5
3bc5ce125cb0707e997dfb1054609b26
 Stáhnout soubor
Icon
Název
task1_en-de_dev.tar.gz
Velikost
105.49 KB
Formát
application/x-gzip
Popis
Task 1 en-de development
MD5
78db4b9bc1b104ccbfbd73c3b4a4f7cd
 Stáhnout soubor
Icon
Název
task2_en-de_training.tar.gz
Velikost
1.41 MB
Formát
application/x-gzip
Popis
Task 2 en-de training
MD5
dce4f0caaabc23238b6c130124354e05
 Stáhnout soubor
Icon
Název
task2_en-de_dev.tar.gz
Velikost
151.85 KB
Formát
application/x-gzip
Popis
Task 2 en-de development
MD5
0cd094f32230270a574c699a159be511
 Stáhnout soubor
Icon
Název
task2p_en-de_training.tar.gz
Velikost
1.16 MB
Formát
application/x-gzip
Popis
Task 2p en-de training
MD5
cac2a8ac2af8530f2e8b652a25747c90
 Stáhnout soubor
Icon
Název
task2p_en-de_dev.tar.gz
Velikost
155.47 KB
Formát
application/x-gzip
Popis
Task 2p en-de development
MD5
21eb5d1cd5d502ec26d45f3598d7979e
 Stáhnout soubor
Icon
Název
task1_en-de_training-dev-test-additional.tar.gz
Velikost
575.8 KB
Formát
application/x-gzip
Popis
Additional annotation (post-editing time) and reference translations
MD5
124345a8f6783e47b498d18393faf065
 Stáhnout soubor

Zobrazit minimální záznam