This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.
 

WMT16 Tuning Shared Task Models (Czech-to-English)

Please use the following text to cite this item or export to a predefined format:
Kamran, Amir; Jawaid, Bushra; Bojar, Ondřej and Stanojevic, Milos, 2016, WMT16 Tuning Shared Task Models (Czech-to-English), LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11372/LRT-1671.
Date issued
2016-03-21
Language(s)
Description
The item contains models to tune for the WMT16 Tuning shared task for Czech-to-English. CzEng 1.6pre (http://ufal.mff.cuni.cz/czeng/czeng16pre) corpus is used for the training of the translation models. The data is tokenized (using Moses tokenizer), lowercased and sentences longer than 60 words and shorter than 4 words are removed before training. Alignment is done using fast_align (https://github.com/clab/fast_align) and the standard Moses pipeline is used for training. Two 5-gram language models are trained using KenLM: one only using the CzEng English data and the other is trained using all available English mono data for WMT except Common Crawl. Also included are two lexicalized bidirectional reordering models, word based and hierarchical, with msd conditioned on both source and target of processed CzEng.
Acknowledgement
 Files in this item
Name
wmt16.czeng.blm.en.tgz
Size
7.79 GB
Format
application/x-gzip
Description
gzip Archive
MD5
dd910814d89f3bb41261ead0a95930dc
Preview
  File Preview
Name
cs2en_model.tgz
Size
36.56 GB
Format
application/x-gzip
Description
gzip Archive
MD5
9f97c40bab9bbc8844b362437ead3c71
Preview
  File Preview
Name
wmt16.mono.blm.en.tgz
Size
60.04 GB
Format
application/x-gzip
Description
gzip Archive
MD5
a57e4fd4f43c05f826cda33cfe257eed
Preview
  File Preview
Name
Makefile
Size
16.96 KB
Format
application/octet-stream
Description
Unknown
MD5
5f56434491ccb9591c35d8fe20fb8aa9
Preview
  File Preview
Name
moses.ini
Size
1.29 KB
Format
application/octet-stream
Description
Unknown
MD5
52551ca476c84dbaf9409ca3083b02c2
Preview
  File Preview