Segment from Československý zvukový týdeník Aktualita (Czechoslovak Aktualita Sound Newsreel) 1941, issue no. 52, reports on a meeting of the Southeast European Economic Society and German Economic Society in Bohemia and Moravia in the Spanish Hall of Prague Castle on 17 December 1941. The gathering is attended by Acting Reich Protector Reinhard Heydrich, State President Emil Hácha, Reich Secretary Karl Hermann Frank, and Prime Minister of the Protectorate Government Jaroslav Krejčí. Speeches are given by Acting Reich Protector Reinhard Heydrich, Hitlerjugend leader Baldur von Schirach, and Reich Minister of Economic Affairs Walter Funk (silent). The latter points out the need to break with the Anglo-Saxon model of colonial economic policy in Eastern Europe. The meeting concludes with paying tribute to the Führer.
The ILRB has been created by two cooperating teams - by the team of the Institute of Czech Language, Czech Academy of Sciences and the team of the NLP Centre at the Faculty of Informatics, Masaryk University (2004-2008).
The tool consists of two sections: wordlist and reference (explanatory) one. Comments and remarks are welcome and should be send to the address poradna@ujc.cas.cz.
1. Wordlist section
It contains more than 60 000 dictionary entries and is based on the glossary of the School Rules of Czech Orthography, the Dictionary of the Literary Czech and selected entries from the New Dictionary of Words of Foreign Origin and Dictionary of Neologisms. The entries typically include information that is asked about frequently by the users. Also inflectional forms of the particular words forms are offered in the form of tables thanks to the morphological analyzer ajka created at the Faculty of Informatics, MU. The dictionary part is linked to the explanatory one through the hypertext links.
2. Reference section
It comprises the explanations about linguistic phenomena described in the Rules of Czech Orthography and contemporary Czech grammars, frequently and repeatedly asked by the users turning to the Linguistic Advisory Line in the Institute of Czech Language. In the offered explanations some typical spelling problems are dealt with including the appropriate recommendations. The ILRB is regularly updated and completed, new expressions are added and made more precise. and Academy of Sciences of the Czech Republic in project 1ET200610406 and Ministry of Education, Youth and Sports in projects LM2010013, LC536 and 2C06009.
The THEaiTRobot 1.0 tool allows the user to interactively generate scripts for individual theatre play scenes.
The tool is based on GPT-2 XL generative language model, using the model without any fine-tuning, as we found that with a prompt formatted as a part of a theatre play script, the model usually generates continuation that retains the format.
We encountered numerous problems when generating the script in this way. We managed to tackle some of the problems with various adjustments, but some of them remain to be solved in a future version.
THEaiTRobot 1.0 was used to generate the first THEaiTRE play, "AI: Když robot píše hru" ("AI: When a robot writes a play").
The THEaiTRobot 2.0 tool allows the user to interactively generate scripts for individual theatre play scenes.
The previous version of the tool (http://hdl.handle.net/11234/1-3507) was based on GPT-2 XL generative language model, using the model without any fine-tuning, as we found that with a prompt formatted as a part of a theatre play script, the model usually generates continuation that retains the format.
The current version also uses vanilla GPT-2 by default, but can also instead use a GPT-2 medium model fine-tuned on theatre play scripts (as well as film and TV series scripts). Apart from the basic "flat" generation using a theatrical starting prompt and the script model, the tool also features a second, hierarchical variant, where in the first step, a play synopsis is generated from its title using a synopsis model (GPT-2 medium fine-tuned on synopses of theatre plays, as well as film, TV series and book synopses). The synopsis is then used as input for the second stage, which uses the script model.
The choice of models to use is done by setting the MODEL variable in start_server.sh and start_syn_server.sh
THEaiTRobot 2.0 was used to generate the second THEaiTRE play, "Permeation/Prostoupení".
AMALACH project component TMODS:ENG-CZE; machine translation of queries from Czech to English. This archive contains models for the Moses decoder (binarized, pruned to allow for real-time translation) and configuration files for the MTMonkey toolkit. The aim of this package is to provide a full service for Czech->English translation which can be easily utilized as a component in a larger software solution. (The required tools are freely available and an installation guide is included in the package.)
The translation models were trained on CzEng 1.0 corpus and Europarl. Monolingual data for LM estimation additionally contains WMT news crawls until 2013.
Segment from Český zvukový týdeník Aktualita (Czech Aktualita Sound Newsreel) issue no. 23A from 1943 was shot on 30 May during the official opening of a training camp organised by the Board of Trustees for the Education of Youth at Protivín Chateau. The ceremony was held to mark the first anniversary of the Board. The importance of the event was highlighted by the presence of Prime Minister Jaroslav Krejčí. Minister of Agriculture and Forestry Adolf Hrubý and Minister of Education and People´s Enlightenment and Chairman of the Board Emanuel Moravec spoke to the participants in the chateau courtyard. In the afternoon, the course participants put on a collective sports performance in the park adjoining the chateau.
Segment from Český zvukový týdeník Aktualita (Czech Aktualita Sound Newsreel) issue no. 39A from 1944 was shot during a training course for the regional leaders of the Board of Trustees for the Education of Youth, which was held at the Čeperka Guest House near Unhošť in connection with changes in the conditions of mandatory youth service resulting from the declaration of forced labour (Totaleinsatz) in August 1944. In addition to lectures, the programme included sports activities to improve the physical fitness of the participants.
The datasets described in Droganova, Kira, and Daniel Zeman. "Towards a Unified Taxonomy of Deep Syntactic Relations." Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2024.
Four languages are included in this release. English PropBank is omitted due to its license terms.
Treex::Web is a web frontend for running Treex applications from your browser.
Treex (formerly TectoMT) is a highly modular NLP framework implemented in Perl programming language. It is primarily aimed at Machine Translation, making use of the ideas and technology created during the Prague Dependency Treebank project.