Rights: Apache License 2.0 - LINDAT/CLARIAH-CZ Catalog Search Results

Creator:: Libovický, Jindřich
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: service and toolService
Subject:: keyword extraction
Language:: Czech and English
Description:: KER is a keyword extractor that was designed for scanned texts in Czech and English. It is based on the standard tf-idf algorithm with the idf tables trained on texts from Wikipedia. To deal with the data sparsity, texts are preprocessed by Morphodita: morphological dictionary and tagger.
Rights:: Apache License 2.0, http://opensource.org/licenses/Apache-2.0, and PUB

Creator:: Tamchyna, Aleš, Dušek, Ondřej, and Rosa, Rudolf
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: toolService and infrastructure
Subject:: machine translation, distributed computing, web service, and infrastructure
Description:: MTMonkey is a web service which handles and distributes JSON-encoded HTTP requests for machine translation (MT) among multiple machines running an MT system, including text pre- and post processing. It consists of an application server and remote workers which handle text processing and communicate translation requests to MT systems. The communication between the application server and the workers is based on the XML-RPC protocol. and The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 257528 (KHRESMOI). This work has been using language resources developed and/or stored and/or distributed by the LINDAT-Clarin project of the Ministry of Education of the Czech Republic (project LM2010013). This work has been supported by the AMALACH grant (DF12P01OVV02) of the Ministry of Culture of the Czech Republic.
Rights:: Apache License 2.0, http://opensource.org/licenses/Apache-2.0, and PUB

Creator:: Hajič, Jan Jr
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: image annotation, Python, and music notation
Description:: MUSCIMarker is an open-source tool for annotating visual objects and their relationships in binary images. It is implemented in Python, known to run on Windows, Linux and OS X, and supports working offline. MUSCIMarker is being used for creating a dataset of musical notation symbols, but can support any object set. The user documentation online is currently (12.2016) incomplete, as it is continually changing to reflect annotators' comments and incorporate new features. This version of the software is *not* the final one, and it is under continuous development (we're currently working on adding grayscale image support with auto-binarization, and Android support for touch-based annotation). However, the current version (1.1) has already been used to annotate more than 100 pages of sheet music, over all the major desktop OSes, and I believe it is already in a state where it can be useful beyond my immediate music notation data gathering use case.
Rights:: Apache License 2.0, http://opensource.org/licenses/Apache-2.0, and PUB

Creator:: Variš, Dušan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: machine translation, neural machine translation, tensor2tensor, and docker
Description:: This submission contains Dockerfile for creating a Docker image with compiled Tensor2tensor backend with compatible (TensorFlow Serving) models available in the Lindat Translation service (https://lindat.mff.cuni.cz/services/transformer/). Additionally, the submission contains a web frontend for simple in-browser access to the dockerized backend service. Tensor2Tensor (https://github.com/tensorflow/tensor2tensor) is a library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Rights:: Apache License 2.0, http://opensource.org/licenses/Apache-2.0, and PUB

Creator:: Korvas, Matěj, Plátek, Ondřej, Dušek, Ondřej, Žilka, Lukáš, and Jurčíček, Filip
Publisher:: Charles University, Faculty of Mathematics and Physics
Type:: toolService and tool
Subject:: ASR, HTK, Kaldi, and acoustic model
Language:: English and Czech
Description:: Vystadial 2013 is a dataset of telephone conversations in English and Czech, developed for training acoustic models for automatic speech recognition in spoken dialogue systems. It ships in three parts: Czech data, English data, and scripts. The data comprise over 41 hours of speech in English and over 15 hours in Czech, plus orthographic transcriptions. The scripts implement data pre-processing and building acoustic models using the HTK and Kaldi toolkits. This is the scripts part of the dataset. and This research was funded by the Ministry of Education, Youth and Sports of the Czech Republic under the grant agreement LK11221.
Rights:: Apache License 2.0, http://opensource.org/licenses/Apache-2.0, and PUB

Limit your search