WMT16 Tuning Shared Task Models (Czech-to-English)
Please use the following text to cite this item or export to a predefined format:
Kamran, Amir; Jawaid, Bushra; Bojar, Ondřej and Stanojevic, Milos, 2016,
WMT16 Tuning Shared Task Models (Czech-to-English), LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11372/LRT-1671.
Authors
Item identifier
Project URL
Date issued
2016-03-21
Description
The item contains models to tune for the WMT16 Tuning shared task for Czech-to-English.
CzEng 1.6pre (http://ufal.mff.cuni.cz/czeng/czeng16pre) corpus is used for the training of the translation models. The data is tokenized (using Moses tokenizer), lowercased and sentences longer than 60 words and shorter than 4 words are removed before training. Alignment is done using fast_align (https://github.com/clab/fast_align) and the standard Moses pipeline is used for training.
Two 5-gram language models are trained using KenLM: one only using the CzEng English data and the other is trained using all available English mono data for WMT except Common Crawl.
Also included are two lexicalized bidirectional reordering models, word based and hierarchical, with msd conditioned on both source and target of processed CzEng.
Acknowledgement
European Union
Project code:H2020-ICT-2014-1-645452
Project name:QT21: Quality Translation 21
Technology Foundation STW
Project code:12271
Project name:Data-Powered Domain-Specific Translation Services On Demand (DatAptor)
Subject(s)
Collections
This item isPublicly Available
and licensed under:
Files in this item
- Name
- wmt16.czeng.blm.en.tgz
- Size
- 7.79 GB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- dd910814d89f3bb41261ead0a95930dc

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- cs2en_model.tgz
- Size
- 36.56 GB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 9f97c40bab9bbc8844b362437ead3c71

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- wmt16.mono.blm.en.tgz
- Size
- 60.04 GB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- a57e4fd4f43c05f826cda33cfe257eed

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- Makefile
- Size
- 16.96 KB
- Format
- application/octet-stream
- Description
- Unknown
- MD5
- 5f56434491ccb9591c35d8fe20fb8aa9

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- moses.ini
- Size
- 1.29 KB
- Format
- application/octet-stream
- Description
- Unknown
- MD5
- 52551ca476c84dbaf9409ca3083b02c2

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz

