Show simple item record

 
dc.contributor.author Libovický, Jindřich
dc.contributor.author Rosa, Rudolf
dc.contributor.author Helcl, Jindřich
dc.contributor.author Popel, Martin
dc.date.accessioned 2020-01-10T09:43:29Z
dc.date.available 2020-01-10T09:43:29Z
dc.date.issued 2020-01-07
dc.identifier.uri http://hdl.handle.net/11234/1-3145
dc.description This submission contains trained end-to-end models for the Neural Monkey toolkit for Czech and English, solving four NLP tasks: machine translation, image captioning, sentiment analysis, and summarization. The models are trained on standard datasets and achieve state-of-the-art or near state-of-the-art performance in the tasks. The models are described in the accompanying paper. The same models can also be invoked via the online demo: https://ufal.mff.cuni.cz/grants/lsd In addition to the models presented in the referenced paper (developed and published in 2018), we include models for automatic news summarization for Czech and English developed in 2019. The Czech models were trained using the SumeCzech dataset (https://www.aclweb.org/anthology/L18-1551.pdf), the English models were trained using the CNN-Daily Mail corpus (https://arxiv.org/pdf/1704.04368.pdf) using the standard recurrent sequence-to-sequence architecture. There are several separate ZIP archives here, each containing one model solving one of the tasks for one language. To use a model, you first need to install Neural Monkey: https://github.com/ufal/neuralmonkey To ensure correct functioning of the model, please use the exact version of Neural Monkey specified by the commit hash stored in the 'git_commit' file in the model directory. Each model directory contains a 'run.ini' Neural Monkey configuration file, to be used to run the model. See the Neural Monkey documentation to learn how to do that (you may need to update some paths to correspond to your filesystem organization). The 'experiment.ini' file, which was used to train the model, is also included. Then there are files containing the model itself, files containing the input and output vocabularies, etc. For the sentiment analyzers, you should tokenize your input data using the Moses tokenizer: https://pypi.org/project/mosestokenizer/ For the machine translation, you do not need to tokenize the data, as this is done by the model. For image captioning, you need to: - download a trained ResNet: http://download.tensorflow.org/models/resnet_v2_50_2017_04_14.tar.gz - clone the git repository with TensorFlow models: https://github.com/tensorflow/models - preprocess the input images with the Neural Monkey 'scripts/imagenet_features.py' script (https://github.com/ufal/neuralmonkey/blob/master/scripts/imagenet_features.py) -- you need to specify the path to ResNet and to the TensorFlow models to this script The summarization models require input that is tokenized with Moses Tokenizer (https://github.com/alvations/sacremoses) and lower-cased. Feel free to contact the authors of this submission in case you run into problems!
dc.language.iso ces
dc.language.iso eng
dc.publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.relation.isreferencedby http://ceur-ws.org/Vol-2203/138.pdf
dc.relation.replaces http://hdl.handle.net/11234/1-2839
dc.rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/4.0/
dc.source.uri https://ufal.mff.cuni.cz/grants/lsd
dc.subject sentiment analysis
dc.subject machine translation
dc.subject image captioning
dc.subject neural networks
dc.subject transformer
dc.subject Neural Monkey
dc.subject summarization
dc.title Czech image captioning, machine translation, sentiment analysis and summarization (Neural Monkey models)
dc.type toolService
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent true
metashare.ResourceInfo#ContentInfo.detailedType suiteOfTools
dc.rights.label PUB
has.files yes
branding LINDAT / CLARIAH-CZ
demo.uri https://ufal.mff.cuni.cz/grants/lsd
contact.person Jindřich Libovický libovicky@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
contact.person Rudolf Rosa rosa@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
sponsor GA ČR 18-02196S Reprezentace lingvistické struktury v neuronových sítích nationalFunds
sponsor Ministerstvo školství, mládeže a tělovýchovy České republiky LM2015071 LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat nationalFunds
sponsor Ministerstvo školství, mládeže a tělovýchovy České republiky CZ.02.1.01/0.0/0.0/16_013/0001781 LINDAT/CLARIN - Výzkumná infrastruktura pro jazykové technologie - rozšíření repozitáře a výpočetní kapacity nationalFunds
sponsor Univerzita Karlova (mimo GAUK) SVV 260 453 Specifický vysokoškolský výzkum nationalFunds
sponsor GAUK 976518 Využití lingvistické informace v neuronovém strojovém překladu ownFunds
files.size 4328681659
files.count 8


 Files in this item

Icon
Name
sentiment_en_yelp_rnn_san.zip
Size
119.3 MB
Format
application/zip
Description
English sentiment analysis (Yelp)
MD5
13c60ad183ef745c0cb516f87beaacf4
 Download file  Preview
 File Preview  
  • sentiment_en_yelp_rnn_san
    • variables.data.index1 kB
    • experiment.log1 MB
    • run.ini1 kB
    • original.ini2 kB
    • variables.data.meta775 kB
    • classes.txt10 B
    • git_diff0 B
    • variables.data.best14 B
    • args68 B
    • experiment.ini1 kB
    • variables.data.data-00000-of-00001128 MB
    • git_commit41 B
    • checkpoint181 B
    • vocabulary30k.txt363 kB
Icon
Name
sentiment_cs_csfd_rnn_san.zip
Size
184.5 MB
Format
application/zip
Description
Czech sentiment analysis (ČSFD)
MD5
c94f110b9330f3e30c5fb1f4b49e81d6
 Download file  Preview
 File Preview  
  • sentiment_cs_csfd_rnn_san
    • variables.data.index1 kB
    • experiment.log113 kB
    • run.ini1 kB
    • original.ini2 kB
    • variables.data.meta654 kB
    • classes.txt26 B
    • git_diff0 B
    • variables.data.best14 B
    • args68 B
    • variables.data.data-00000-of-00001197 MB
    • experiment.ini1 kB
    • git_commit41 B
    • vocabulary50k.txt588 kB
    • checkpoint181 B
Icon
Name
translation_encs_transformer.zip
Size
1.76 GB
Format
application/zip
Description
English-to-Czech machine translation
MD5
3057c5e1a17ca03e533a20b525140dc7
 Download file  Preview
 File Preview  
  • translation_encs_transformer
    • checkpoint85 B
    • variables.data.meta1 GB
    • variables.data.best15 B
    • experiment.ini1 kB
    • vocab296 kB
    • variables.data.data-00000-of-00001800 MB
    • variables.data.index10 kB
    • preprocess.ini281 B
Icon
Name
captioning_cs_bigger.zip
Size
366.38 MB
Format
application/zip
Description
Czech image captioning
MD5
c76a42de20d9ba675f25f5d8f2f82627
 Download file  Preview
 File Preview  
  • captioning_cs_bigger
    • vocab.cs60 kB
    • experiment.log72 kB
    • run.ini1 kB
    • original.ini2 kB
    • variables.data.avg-0.data-00000-of-00001395 MB
    • variables.data.avg-0.meta268 kB
    • git_diff0 B
    • args53 B
    • variables.data.best21 B
    • experiment.ini2 kB
    • git_commit41 B
    • checkpoint97 B
    • variables.data.avg-0.index2 kB
Icon
Name
captioning_en_multiref_bigger.zip
Size
399.45 MB
Format
application/zip
Description
English image captioning
MD5
1baa582d527ea5f758ac5f8748c172c4
 Download file  Preview
 File Preview  
  • captioning_en_multiref_bigger
    • experiment.log3 MB
    • run.ini1 kB
    • original.ini2 kB
    • variables.data.avg-0.data-00000-of-00001431 MB
    • variables.data.avg-0.meta268 kB
    • git_diff0 B
    • variables.data.best21 B
    • args53 B
    • experiment.ini2 kB
    • git_commit41 B
    • checkpoint97 B
    • variables.data.avg-0.index2 kB
    • en.vocab77 kB
Icon
Name
resnet.zip
Size
83.65 MB
Format
application/zip
Description
ResNet
MD5
8d67aaecdf30b75d08bd6babf70d5237
 Download file  Preview
 File Preview  
  • resnet
    • variables.data.index10 kB
    • experiment.log19 kB
    • run.ini621 B
    • original.ini1 kB
    • variables.data.meta1 MB
    • git_diff2 kB
    • variables.data.best15 B
    • args83 B
    • variables.data.data-00000-of-0000189 MB
    • experiment.ini826 B
    • git_commit41 B
    • checkpoint85 B
Icon
Name
cnn-daily-mail-rnn-rnn.zip
Size
586.55 MB
Format
application/zip
MD5
59ce9b2cd3d7b7f0e1e3cfbd5706a446
 Download file  Preview
 File Preview  
  • cnn-daily-mail-rnn-rnn
    • variables.data.index4 kB
    • experiment.log76 MB
    • original.ini2 kB
    • variables.data.meta1 MB
    • git_diff977 B
    • variables.data.best14 B
    • args59 B
    • variables.data.data-00000-of-00001813 MB
    • experiment.ini2 kB
    • git_commit41 B
    • checkpoint211 B
Icon
Name
sumeczech-rnn-rnn.zip
Size
588.76 MB
Format
application/zip
MD5
b6ccd233449a6509004c742b4f923eea
 Download file  Preview
 File Preview  
  • sumeczech-rnn-rnn
    • variables.data.index4 kB
    • experiment.log63 MB
    • original.ini2 kB
    • variables.data.meta1 MB
    • git_diff977 B
    • variables.data.best14 B
    • args54 B
    • variables.data.data-00000-of-00001813 MB
    • experiment.ini1 kB
    • git_commit41 B
    • checkpoint201 B

Show simple item record