Czech models for NameTag, providing recognition of named entities.
The models are trained on Czech Named Entity Corpus 2.0 and 1.1. and This work has been using language resources developed and/or stored and/or distributed by the LINDAT/CLARIN project of the Ministry of Education of the Czech Republic (project LM2010013).
Czech models are trained on Czech Named Entity Corpus, which was created by Magda Ševčíková, Zdeněk Žabokrtský, Jana Straková and Milan Straka.
The recognizer research was supported by the projects MSM0021620838 and LC536 of Ministry of Education, Youth and Sports of the Czech Republic, 1ET101120503 of Academy of Sciences of the Czech Republic, LINDAT/CLARIN project of the Ministry of Education of the Czech Republic (project LM2010013), and partially by SVV project number 267 314. The research was performed by Jana Straková, Zdeněk Žabokrtský and Milan Straka.
Czech models use MorphoDiTa as a tagger and lemmatizer, therefore MorphoDiTa Acknowledgements (http://ufal.mff.cuni.cz/morphodita#morphodita_acknowledgements) and Czech MorphoDiTa Model Acknowledgements (http://ufal.mff.cuni.cz/morphodita/users-manual#czech-morfflex-pdt_acknowledgements) apply.
Czech models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging.
The morphological dictionary is created from MorfFlex CZ and the PoS tagger is trained on PDT (Prague Dependency Treebank). and This work has been using language resources developed and/or stored and/or distributed by the LINDAT/CLARIN project of the Ministry of Education of the Czech Republic (project LM2010013).
The Czech morphologic system was devised by Jan Hajič.
The MorfFlex CZ dictionary was created by Jan Hajič and Jaroslava Hlaváčová.
The morphologic guesser research was supported by the projects 1ET101120503 and 1ET101120413 of Academy of Sciences of the Czech Republic and 100008/2008 of Charles University Grant Agency. The research was performed by Jan Hajič, Jaroslava Hlaváčová and David Kolovratník.
The tagger algorithm and feature set research was supported by the projects MSM0021620838 and LC536 of Ministry of Education, Youth and Sports of the Czech Republic, GA405/09/0278 of the Grant Agency of the Czech Republic and 1ET101120503 of Academy of Sciences of the Czech Republic. The research was performed by Drahomíra "johanka" Spoustová, Jan Hajič, Jan Raab and Miroslav Spousta.
The tagger is trained on morphological layer of Prague Dependency Treebank PDT 2.5, which was supported by the projects LM2010013, LC536, LN00A063 and MSM0021620838 of Ministry of Education, Youth and Sports of the Czech Republic, and developed by Martin Buben, Jan Hajič, Jiří Hana, Hana Hanová, Barbora Hladká, Emil Jeřábek, Lenka Kebortová, Kristýna Kupková, Pavel Květoň, Jiří Mírovský, Andrea Pfimpfrová, Jan Štěpánek and Daniel Zeman.
Czech models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging.
The morphological dictionary is created from MorfFlex CZ 160310 and the PoS tagger is trained on Prague Dependency Treebank 3.0 (PDT). and This work has been using language resources developed and/or stored and/or distributed by the LINDAT/CLARIN project of the Ministry of Education of the Czech Republic (project LM2010013).
The Czech morphologic system was devised by Jan Hajič.
The MorfFlex CZ dictionary was created by Jan Hajič and Jaroslava Hlaváčová.
The morphologic guesser research was supported by the projects 1ET101120503 and 1ET101120413 of Academy of Sciences of the Czech Republic and 100008/2008 of Charles University Grant Agency. The research was performed by Jan Hajič, Jaroslava Hlaváčová and David Kolovratník.
The tagger algorithm and feature set research was supported by the projects MSM0021620838 and LC536 of Ministry of Education, Youth and Sports of the Czech Republic, GA405/09/0278 of the Grant Agency of the Czech Republic and 1ET101120503 of Academy of Sciences of the Czech Republic. The research was performed by Drahomíra "johanka" Spoustová, Jan Hajič, Jan Raab and Miroslav Spousta.
The tagger is trained on morphological layer of Prague Dependency Treebank PDT 2.5, which was supported by the projects LM2010013, LC536, LN00A063 and MSM0021620838 of Ministry of Education, Youth and Sports of the Czech Republic, and developed by Martin Buben, Jan Hajič, Jiří Hana, Hana Hanová, Barbora Hladká, Emil Jeřábek, Lenka Kebortová, Kristýna Kupková, Pavel Květoň, Jiří Mírovský, Andrea Pfimpfrová, Jan Štěpánek and Daniel Zeman.
Czech models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging.
The morphological dictionary is created from MorfFlex CZ 161115 and DeriNet 1.2 and the PoS tagger is trained on Prague Dependency Treebank 3.0 (PDT). and This work has been using language resources developed and/or stored and/or distributed by the LINDAT/CLARIN project of the Ministry of Education of the Czech Republic (project LM2010013).
The Czech morphologic system was devised by Jan Hajič.
The MorfFlex CZ dictionary was created by Jan Hajič and Jaroslava Hlaváčová.
The morphologic guesser research was supported by the projects 1ET101120503 and 1ET101120413 of Academy of Sciences of the Czech Republic and 100008/2008 of Charles University Grant Agency. The research was performed by Jan Hajič, Jaroslava Hlaváčová and David Kolovratník.
The tagger algorithm and feature set research was supported by the projects MSM0021620838 and LC536 of Ministry of Education, Youth and Sports of the Czech Republic, GA405/09/0278 of the Grant Agency of the Czech Republic and 1ET101120503 of Academy of Sciences of the Czech Republic. The research was performed by Drahomíra "johanka" Spoustová, Jan Hajič, Jan Raab and Miroslav Spousta.
The tagger is trained on morphological layer of Prague Dependency Treebank PDT 2.5, which was supported by the projects LM2010013, LC536, LN00A063 and MSM0021620838 of Ministry of Education, Youth and Sports of the Czech Republic, and developed by Martin Buben, Jan Hajič, Jiří Hana, Hana Hanová, Barbora Hladká, Emil Jeřábek, Lenka Kebortová, Kristýna Kupková, Pavel Květoň, Jiří Mírovský, Andrea Pfimpfrová, Jan Štěpánek and Daniel Zeman.
Czech Named Entity Corpus 1.1 fixes some issues of the Czech Named Entity Corpus 1.0: misannotated entities are fixed, all formats contain the same data, tmt format is replaced with treex format, all formats contain splitting into training, development and testing portion of the data. and SVV 267 314 (Teoretické základy informatiky a výpočetní lingvistiky), LM2010013 (LINDAT-CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat), GPP406/12/P175 (Vybrané derivační vztahy pro automatické zpracování češtiny), PRVOUK (PRVOUK)
Czech Named Entity Corpus 2.0 is a corpus of 8993 Czech sentences with manually annotated 35220 Czech named entities, classified according to a two-level hierarchy of 46 named entities. and SVV 267 314 (Teoretické základy informatiky a výpočetní lingvistiky), LM2010013 (LINDAT-CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat), GPP406/12/P175 (Vybrané derivační vztahy pro automatické zpracování češtiny), PRVOUK (PRVOUK)
A slightly modified version of the Czech Wordnet. This is the version used to annotate "The Lexico-Semantic Annotation of PDT using Czech WordNet": http://hdl.handle.net/11858/00-097C-0000-0001-487A-4
The Czech WordNet was developed by the Centre of Natural Language Processing at the Faculty of Informatics, Masaryk University, Czech Republic.
The Czech WordNet captures nouns, verbs, adjectives, and partly adverbs, and contains 23,094 word senses (synsets). 203 of these were created or modified by UFAL during correction of annotations. This version of WordNet was used to annotate word senses in PDT: http://hdl.handle.net/11858/00-097C-0000-0001-487A-4
A more recent version of Czech WordNet is distributed by ELRA: http://catalog.elra.info/product_info.php?products_id=1089 and 1ET201120505, LM2010013
CzEngVallex is a bilingual valency lexicon of corresponding Czech and English verbs. It connects 20835 aligned valency frame pairs (verb senses) which are translations of each other, aligning their arguments as well. The CzEngVallex serves as a powerful, real-text-based database of frame-to-frame and subsequently argument-to-argument pairs and can be used for example for machine translation applications. It uses the data from the Prague Czech-English Dependency Treebank project (PCEDT 2.0, http://hdl.handle.net/11858/00-097C-0000-0015-8DAF-4) and it also takes advantage of two existing valency lexicons: PDT-Vallex for Czech and EngVallex for English, using the same view of valency (based on the Functional Generative Description theory). The CzEngVallex is available in an XML format in the LINDAT/CLARIN repository, and also in a searchable form (see the “More Apps” tab) interlinked with PDT-Vallex (http://hdl.handle.net/11858/00-097C-0000-0023-4338-F),EngVallex (http://hdl.handle.net/11858/00-097C-0000-0023-4337-2) and with examples from the PCEDT.
DeriNet is a lexical network which contains derivational relations in Czech modeled as an oriented graph. Nodes correspond to Czech lexemes (a lexeme is a single lemma, possibly with only a subset of its senses – homonyms may have different derivations and are thus represented by several lexemes) and edges represent derivations between them. DeriNet 1.0 contains 968,967 lexemes with 965,535 unique lemmas; connected by 715,729 derivational links. Lexemes in DeriNet 1.0 are sampled from the MorfFlex dictionary.
English model for NameTag, a named entity recognition tool. The model is trained on CoNLL-2003 training data. Recognizes PER, ORG, LOC and MISC named entities. Achieves F-measure 84.73 on CoNLL-2003 test data.
Ministerstvo školství, mládeže a tělovýchovy České republiky@@LM2010013@@LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat@@nationalFunds@@✖[remove]37