Show simple item record

 
dc.contributor.author Tamchyna, Aleš
dc.contributor.author Bojar, Ondřej
dc.date.accessioned 2015-11-26T11:22:41Z
dc.date.available 2015-11-26T11:22:41Z
dc.date.issued 2015
dc.identifier.uri http://hdl.handle.net/11234/1-1581
dc.description AMALACH project component TMODS:ENG-CZE; machine translation of queries from Czech to English. This archive contains models for the Moses decoder (binarized, pruned to allow for real-time translation) and configuration files for the MTMonkey toolkit. The aim of this package is to provide a full service for Czech->English translation which can be easily utilized as a component in a larger software solution. (The required tools are freely available and an installation guide is included in the package.) The translation models were trained on CzEng 1.0 corpus and Europarl. Monolingual data for LM estimation additionally contains WMT news crawls until 2013.
dc.language.iso ces
dc.language.iso eng
dc.publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/4.0/
dc.source.uri http://ufal.mff.cuni.cz/grants/amalach
dc.subject machine translation
dc.subject query translationn
dc.title TMODS:ENG-CZE -- query translation
dc.type toolService
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent true
metashare.ResourceInfo#ContentInfo.detailedType suiteOfTools
dc.rights.label PUB
has.files yes
branding LINDAT / CLARIAH-CZ
contact.person Aleš Tamchyna tamchyna@ufal.mff.cuni.cz Charles University in Prague, UFAL
sponsor Ministerstvo kultury České republiky DF12P01OVV022 Zpřístupnění rozsáhlého video archivu kulturního dědictví pomocí metod automatického rozpoznávání mluvené řeči a strojového překladu. (AMALACH) nationalFunds
files.size 2942964173
files.count 2


 Files in this item

Icon
Name
TMODS-ENG-CZE-query.tar.gz
Size
2.74 GB
Format
application/x-gzip
Description
Model files for TMODS:ENG-CZE query translation
MD5
41246e0b67c36c6b872c8d4bc9281e36
 Download file  Preview
 File Preview  
  • TMODS-ENG-CZE-query
    • README.txt1 kB
    • model
      • biglm.trie1 GB
      • worker.cfg175 B
      • moses.ini844 B
      • appserver.cfg120 B
      • ttable.lemmas.minphr933 MB
      • smalllm.trie250 MB
      • ttable.minphr975 MB
Icon
Name
README.txt
Size
1.28 KB
Format
Text file
MD5
92d655ca32f3f8ec76d77d3c3d3f27b0
 Download file  Preview
 File Preview  
TMODS:ENG-CZE -- query translation component
============================================

Installation:

1) Download and compile the Moses decoder:

  (Requires Boost, libXML and CMPH.)

  git clone https://github.com/moses-smt/mosesdecoder.git
  cd mosesdecoder
  ./bjam --with-xmlrpc=<path-to-libxml> --max-kenlm-order=12 --with-cmph=<path-to-cmph-library>
  cd ..

2) Download and configure MTMonkey:
  
  (See MTMonkey documentation for installation requirements.)

  git clone https://github.com/ufal/mtmonkey
  cd mtmonkey
  git checkout chimera_preprocessing
  # the tested commit ID is d0e3175ee112a3fdd4790ccd1b0ff4e5d90c8d04
  cd ..

3) Install MorphoDita and its Python bindings.

  Follow the steps described here:

  http://ufal.mff.cuni.cz/morphodita/install#python_installation

4) Download Morphodita model for Czech:

  http://hdl.handle.net/11858/00-097C-0000-0023-68D8-1

  Copy the model in the MTMonkey directory:

  cp czech-morfflex-pdt-131112.tagger-fast mtmonkey

5) Start t . . .
                                            

Show simple item record