Type: tool - LINDAT/CLARIAH-CZ Catalog Search Results

1. Česílko 2.0 Shallow Transfer RBMT framework (opensource version)

Creator:: Vičič, Jernej, Kuboň, Vladislav, and Homola, Petr
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: Shallow Parse, Shallow Transfer Rule-Based Machine Translation, stochastic ranker, related languages, and toolbox
Description:: The system Česílko (language data and software tools) was first developed as an answer to a growing need of translation and localisation from one source language to many target languages. The starting system belonged to the Shallow Parse, Shallow Transfer Rule-Based Machine Translation – (RBMT) paradigm and it was designed primarily for translation of related languages. The latest implementation of the system uses a stochastic ranker; so technically it belongs to the hybrid machine translation paradigm, using stochastic methods combined with the traditional Shallow Transfer RBMT methods. The system has been stripped of the accompanying language resources due to copyright restrictions. The data that is available is just for demonstrative purposes.
Rights:: Not specified

2. Chared

Creator:: Pomikálek, Jan
Publisher:: Masaryk University, NLP Centre
Type:: toolService and tool
Subject:: character encoding, character encoding detection, charset, and unicode
Language:: English
Description:: Chared is a software tool which can detect character encoding of a text document provided the language of the document is known. The language of the text has to be specified as an input parameter so that the corresponding language model can be used. The package contains models for a wide range of languages (currently 57 --- covering all major languages). Furthermore, it provides a training script to learn models for additional languages using a set of user supplied sample html pages in the given language. The detection algorithm is based on determining similarity of byte trigrams vectors. In general, chared should be more accurate than other character encoding detection tools with no language constraints. This is an important advantage allowing precise character decoding needed for building large textual corpora. The tool has been used for building corpora in American Spanish, Arabic, Czech, French, Japanese, Russian, Tajik, and six Turkic languages consisting of 70 billions tokens altogether. Chared is an open source software, licensed under New BSD License and available for download (including the source code) at http://code.google.com/p/chared/. The research leading to this piece of software was published in POMIKÁLEK, Jan a Vít SUCHOMEL. chared: Character Encoding Detection with a Known Language. In Aleš Horák, Pavel Rychlý. RASLAN 2011. 5. vyd. Brno, Czech Republic: Tribun EU, 2011. od s. 125-129, 5 s. ISBN 978-80-263-0077-9. and PRESEMT, Lexical Computing Ltd
Rights:: BSD 3-Clause "New" or "Revised" license, http://opensource.org/licenses/BSD-3-Clause, and PUB

3. CorPipe 23 multilingual CorefUD 1.1 model (corpipe23-corefud1.1-231206)

Creator:: Straka, Milan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: coreference resolution, CorPipe, and CorefUD
Language:: Catalan, Czech, German, English, Spanish, French, Hungarian, Lithuanian, Norwegian Bokmål, Norwegian Nynorsk, Polish, Russian, and Turkish
Description:: The `corpipe23-corefud1.1-231206` is a `mT5-large`-based multilingual model for coreference resolution usable in CorPipe 23 (https://github.com/ufal/crac2023-corpipe). It is released under the CC BY-NC-SA 4.0 license. The model is language agnostic (no _corpus id_ on input), so it can be used to predict coreference in any `mT5` language (for zero-shot evaluation, see the paper). However, note that the empty nodes must be present already on input, they are not predicted (the same settings as in the CRAC23 shared task).
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

4. CorPipe 23 multilingual CorefUD 1.2 model (corpipe23-corefud1.2-240906)

Creator:: Straka, Milan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: coreference resolution, CorPipe, and CorefUD
Language:: Catalan, Czech, Church Slavic, German, English, Spanish, French, Ancient Greek (to 1453), Ancient Hebrew, Hungarian, Lithuanian, Norwegian Bokmål, Norwegian Nynorsk, Polish, Russian, and Turkish
Description:: The `corpipe23-corefud1.2-240906` is a `mT5-large`-based multilingual model for coreference resolution usable in CorPipe 23 <https://github.com/ufal/crac2023-corpipe>. It is released under the CC BY-NC-SA 4.0 license. The model is language agnostic (no corpus id on input), so it can be in theory used to predict coreference in any `mT5` language. However, the model expects empty nodes to be already present on input, predicted by the https://www.kaggle.com/models/ufal-mff/crac2024_zero_nodes_baseline/. This model was present in the CorPipe 24 paper as an alternative to a single-stage approach, where the empty nodes are predicted joinly with coreference resolution (via http://hdl.handle.net/11234/1-5672), an approach circa twice as fast but of slightly worse quality.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

5. CorPipe 24 Multilingual CorefUD 1.2 Model (corpipe24-corefud1.2-240906)

Creator:: Straka, Milan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: coreference resolution, CorPipe, and CorefUD
Language:: Catalan, Czech, German, English, Spanish, French, Hungarian, Lithuanian, Norwegian Bokmål, Norwegian Nynorsk, Polish, Russian, Turkish, Church Slavic, Ancient Greek (to 1453), and Ancient Hebrew
Description:: The `corpipe24-corefud1.2-240906` is a `mT5-large`-based multilingual model for coreference resolution usable in CorPipe 24 (https://github.com/ufal/crac2024-corpipe). It is released under the CC BY-NC-SA 4.0 license. The model is language agnostic (no corpus id on input), so it can be in theory used to predict coreference in any `mT5` language. This model jointly predicts also the empty nodes needed for zero coreference. The paper introducing this model also presents an alternative two-stage approach first predicting empty nodes (via https://www.kaggle.com/models/ufal-mff/crac2024_zero_nodes_baseline/) and then performing coreference resolution (via http://hdl.handle.net/11234/1-5673), which is circa twice as slow but slightly better.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

6. CorpusExplorer

Creator:: Rüdiger, Jan Oliver
Publisher:: Jan Oliver Rüdiger
Type:: tool and toolService
Subject:: Corpus Linguisitics, NLP, conll, tei, XML, nlp, Natural Language Processing, linguistics, Linguistics, Computational Linguistics, corpus processing, tagger, POS tagger, lemmatization, text cleaning, CommonCrawl, epub, JSON, Twitter, Pandoc, Wikipedia, digital data, DTA, DSpin, MySQL, ElasticSearch, TextGrid, text corpora, TigerXML, and WeblichtXML
Language:: German, English, French, Italian, Dutch, Spanish, Polish, Arabic, Chinese, and Portuguese
Description:: Software for corpus linguists and text/data mining enthusiasts. The CorpusExplorer combines over 45 interactive visualizations under a user-friendly interface. Routine tasks such as text acquisition, cleaning or tagging are completely automated. The simple interface supports the use in university teaching and leads users/students to fast and substantial results. The CorpusExplorer is open for many standards (XML, CSV, JSON, R, etc.) and also offers its own software development kit (SDK). Source code available at https://github.com/notesjor/corpusexplorer2.0
Rights:: Not specified

7. CUBBITT Translation Models (en-cs) (v1.0)

Creator:: Popel, Martin, Tomková, Markéta, Tomek, Jakub, Kaiser, Łukasz, Uszkoreit, Jakob, Bojar, Ondřej, and Žabokrtský, Zdeněk
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: machine translation, neural machine translation, transformer, and cubbitt
Language:: English and Czech
Description:: CUBBITT En-Cs translation models, exported via TensorFlow Serving, available in the Lindat translation service (https://lindat.mff.cuni.cz/services/translation/). Models are compatible with Tensor2tensor version 1.6.6. For details about the model training (data, model hyper-parameters), please contact the archive maintainer. Evaluation on newstest2014 (BLEU): en->cs: 27.6 cs->en: 34.4 (Evaluated using multeval: https://github.com/jhclark/multeval)
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

8. CUBBITT Translation Models (en-fr) (v1.0)

Creator:: Popel, Martin, Tomková, Markéta, Tomek, Jakub, Kaiser, Łukasz, Uszkoreit, Jakob, Bojar, Ondřej, and Žabokrtský, Zdeněk
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: machine translation, neural machine translation, transformer, and cubbitt
Language:: English and French
Description:: CUBBITT En-Fr translation models, exported via TensorFlow Serving, available in the Lindat translation service (https://lindat.mff.cuni.cz/services/translation/). Models are compatible with Tensor2tensor version 1.6.6. For details about the model training (data, model hyper-parameters), please contact the archive maintainer. Evaluation on newstest2014 (BLEU): en->fr: 38.2 fr->en: 36.7 (Evaluated using multeval: https://github.com/jhclark/multeval)
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

9. CUBBITT Translation Models (en-pl) (v1.0)

Creator:: Popel, Martin, Tomková, Markéta, Tomek, Jakub, Kaiser, Łukasz, Uszkoreit, Jakob, Bojar, Ondřej, and Žabokrtský, Zdeněk
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: machine translation, neural machine translation, transformer, and cubbitt
Language:: English and Polish
Description:: CUBBITT En-Pl translation models, exported via TensorFlow Serving, available in the Lindat translation service (https://lindat.mff.cuni.cz/services/translation/). Models are compatible with Tensor2tensor version 1.6.6. For details about the model training (data, model hyper-parameters), please contact the archive maintainer. Evaluation on newstest2020 (BLEU): en->pl: 12.3 pl->en: 20.0 (Evaluated using multeval: https://github.com/jhclark/multeval)
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

10. Czech PDT-C 1.0 Model for UDPipe 2 (2023-11-16)

Creator:: Straka, Milan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: tokenizer, POS tagger, lemmatization, parser, dependency parser, MorfFlex CZ 2.0, and PDT-C 1.0
Language:: Czech
Description:: Tokenizer, POS Tagger, Lemmatizer, and Parser model based on the PDT-C 1.0 treebank (https://hdl.handle.net/11234/1-3185). The model documentation including performance can be found at https://ufal.mff.cuni.cz/udpipe/2/models#czech_pdtc1.0_model . To use these models, you need UDPipe version 2.1, which you can download from https://ufal.mff.cuni.cz/udpipe/2 .
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

1. Česílko 2.0 Shallow Transfer RBMT framework (opensource version)

2. Chared

3. CorPipe 23 multilingual CorefUD 1.1 model (corpipe23-corefud1.1-231206)

4. CorPipe 23 multilingual CorefUD 1.2 model (corpipe23-corefud1.2-240906)

5. CorPipe 24 Multilingual CorefUD 1.2 Model (corpipe24-corefud1.2-240906)

6. CorpusExplorer

7. CUBBITT Translation Models (en-cs) (v1.0)

8. CUBBITT Translation Models (en-fr) (v1.0)

9. CUBBITT Translation Models (en-pl) (v1.0)

10. Czech PDT-C 1.0 Model for UDPipe 2 (2023-11-16)

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Creator

Show values starting with

Language

Show values starting with

Publisher

Show values starting with

Rights

Show values starting with

Subject

Show values starting with

Type

Date

Original context has metadata only

Harvested from