This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.
 

Czech image captioning, machine translation, and sentiment analysis (Neural Monkey models)

Please use the following text to cite this item or export to a predefined format:
Libovický, Jindřich; Rosa, Rudolf; Helcl, Jindřich and Popel, Martin, 2018, Czech image captioning, machine translation, and sentiment analysis (Neural Monkey models), LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-2839.
Date issued
2018-07-13
Language(s)
Description
This submission contains trained end-to-end models for the Neural Monkey toolkit for Czech and English, solving three NLP tasks: machine translation, image captioning, and sentiment analysis. The models are trained on standard datasets and achieve state-of-the-art or near state-of-the-art performance in the tasks. The models are described in the accompanying paper. The same models can also be invoked via the online demo: https://ufal.mff.cuni.cz/grants/lsd There are several separate ZIP archives here, each containing one model solving one of the tasks for one language. To use a model, you first need to install Neural Monkey: https://github.com/ufal/neuralmonkey To ensure correct functioning of the model, please use the exact version of Neural Monkey specified by the commit hash stored in the 'git_commit' file in the model directory. Each model directory contains a 'run.ini' Neural Monkey configuration file, to be used to run the model. See the Neural Monkey documentation to learn how to do that (you may need to update some paths to correspond to your filesystem organization). The 'experiment.ini' file, which was used to train the model, is also included. Then there are files containing the model itself, files containing the input and output vocabularies, etc. For the sentiment analyzers, you should tokenize your input data using the Moses tokenizer: https://pypi.org/project/mosestokenizer/ For the machine translation, you do not need to tokenize the data, as this is done by the model. For image captioning, you need to: - download a trained ResNet: http://download.tensorflow.org/models/resnet_v2_50_2017_04_14.tar.gz - clone the git repository with TensorFlow models: https://github.com/tensorflow/models - preprocess the input images with the Neural Monkey 'scripts/imagenet_features.py' script (https://github.com/ufal/neuralmonkey/blob/master/scripts/imagenet_features.py) -- you need to specify the path to ResNet and to the TensorFlow models to this script Feel free to contact the authors of this submission in case you run into problems!
Acknowledgement

Version History

Showing 1 - 2 out of 2 results
VersionDateSummary
2020-01-07 00:00:00
1*
2018-07-13 00:00:00
* Selected version
 Files in this item
Name
captioning_cs_bigger.zip
Size
366.38 MB
Format
application/zip
Description
Zip
MD5
c76a42de20d9ba675f25f5d8f2f82627
Preview
  File Preview
  • captioning_cs_bigger
    • experiment.log72 kB
    • vocab.cs60 kB
    • run.ini1 kB
    • original.ini2 kB
    • variables.data.avg-0.data-00000-of-00001395 MB
    • variables.data.avg-0.meta268 kB
    • git_diff0 B
    • variables.data.best21 B
    • args53 B
    • experiment.ini2 kB
    • git_commit41 B
    • checkpoint97 B
    • variables.data.avg-0.index2 kB
Name
resnet.zip
Size
83.65 MB
Format
application/zip
Description
Zip
MD5
8d67aaecdf30b75d08bd6babf70d5237
Preview
  File Preview
  • resnet
    • variables.data.index10 kB
    • experiment.log19 kB
    • run.ini621 B
    • original.ini1 kB
    • variables.data.meta1 MB
    • git_diff2 kB
    • variables.data.best15 B
    • args83 B
    • experiment.ini826 B
    • variables.data.data-00000-of-0000189 MB
    • git_commit41 B
    • checkpoint85 B
Name
sentiment_cs_csfd_rnn_san.zip
Size
184.5 MB
Format
application/zip
Description
Zip
MD5
c94f110b9330f3e30c5fb1f4b49e81d6
Preview
  File Preview
  • sentiment_cs_csfd_rnn_san
    • variables.data.index1 kB
    • experiment.log113 kB
    • run.ini1 kB
    • original.ini2 kB
    • variables.data.meta654 kB
    • classes.txt26 B
    • git_diff0 B
    • variables.data.best14 B
    • args68 B
    • experiment.ini1 kB
    • variables.data.data-00000-of-00001197 MB
    • vocabulary50k.txt588 kB
    • git_commit41 B
    • checkpoint181 B
Name
sentiment_en_yelp_rnn_san.zip
Size
119.3 MB
Format
application/zip
Description
Zip
MD5
13c60ad183ef745c0cb516f87beaacf4
Preview
  File Preview
  • sentiment_en_yelp_rnn_san
    • variables.data.index1 kB
    • experiment.log1 MB
    • run.ini1 kB
    • original.ini2 kB
    • variables.data.meta775 kB
    • classes.txt10 B
    • git_diff0 B
    • variables.data.best14 B
    • args68 B
    • experiment.ini1 kB
    • variables.data.data-00000-of-00001128 MB
    • git_commit41 B
    • checkpoint181 B
    • vocabulary30k.txt363 kB
Name
translation_encs_transformer.zip
Size
1.76 GB
Format
application/zip
Description
Zip
MD5
3057c5e1a17ca03e533a20b525140dc7
Preview
  File Preview
  • translation_encs_transformer
    • variables.data.meta1 GB
    • checkpoint85 B
    • variables.data.best15 B
    • experiment.ini1 kB
    • vocab296 kB
    • variables.data.data-00000-of-00001800 MB
    • preprocess.ini281 B
    • variables.data.index10 kB
Name
captioning_en_multiref_bigger.zip
Size
399.45 MB
Format
application/zip
Description
Zip
MD5
1baa582d527ea5f758ac5f8748c172c4
Preview
  File Preview
  • captioning_en_multiref_bigger
    • experiment.log3 MB
    • run.ini1 kB
    • original.ini2 kB
    • variables.data.avg-0.data-00000-of-00001431 MB
    • variables.data.avg-0.meta268 kB
    • git_diff0 B
    • variables.data.best21 B
    • args53 B
    • experiment.ini2 kB
    • git_commit41 B
    • checkpoint97 B
    • variables.data.avg-0.index2 kB
    • en.vocab77 kB