Data from a questionnaire survey conducted from 2022-08-25 to 2022-11-15 and exploring the use of machine translation by Ukrainian refugees in the Czech Republic. The presented spreadsheet contains minimally processed data exported from the two questionnaires that were created in Google Forms in the Ukrainian and the Russian language. The links to these questionnaires were distributed by three methods: direct email to particular refugees whose contact details the authors obtained while volunteering; through a non-profit organisation helping refugees (Vesna women’s education institution) and on social networks by posting links to the survey in groups associating the Ukrainian community across Czech regions and towns.
Since we asked potential respondents to spread the questionnaire further, we could not prevent it from reaching Ukrainians who had arrived in Czechia previously, or received temporary protection in other countries. Due to this fact, the textual answers to the question 1.5 "Which country are you in right now?" were replaced in the dataset by numbers (1 for the Czech Republic, 2 for other countries) in order for us to be able to separate the data of respondents not located in the Czech Republic, which were irrelevant for our survey. Also, in this version of the dataset, the textual answers to the question 1.6 "How many months have you been to this country?" were replaced by numbers, so that we could separate the data of respondents who arrived in the Czech Republic in February 2022 or later from the other data (0 for those staying in Czechia before February 2022, 1 for those staying in Czechia since February 2022 or later, 2 for those staying in other countries).
WALS is a large database of structural (phonological, grammatical, lexical) properties of languages gathered from descriptive materials (such as reference grammars) by a team of more than 40 authors (many of them the leading authorities on the subject).
The THEaiTRobot 1.0 tool allows the user to interactively generate scripts for individual theatre play scenes.
The tool is based on GPT-2 XL generative language model, using the model without any fine-tuning, as we found that with a prompt formatted as a part of a theatre play script, the model usually generates continuation that retains the format.
We encountered numerous problems when generating the script in this way. We managed to tackle some of the problems with various adjustments, but some of them remain to be solved in a future version.
THEaiTRobot 1.0 was used to generate the first THEaiTRE play, "AI: Když robot píše hru" ("AI: When a robot writes a play").
The THEaiTRobot 2.0 tool allows the user to interactively generate scripts for individual theatre play scenes.
The previous version of the tool (http://hdl.handle.net/11234/1-3507) was based on GPT-2 XL generative language model, using the model without any fine-tuning, as we found that with a prompt formatted as a part of a theatre play script, the model usually generates continuation that retains the format.
The current version also uses vanilla GPT-2 by default, but can also instead use a GPT-2 medium model fine-tuned on theatre play scripts (as well as film and TV series scripts). Apart from the basic "flat" generation using a theatrical starting prompt and the script model, the tool also features a second, hierarchical variant, where in the first step, a play synopsis is generated from its title using a synopsis model (GPT-2 medium fine-tuned on synopses of theatre plays, as well as film, TV series and book synopses). The synopsis is then used as input for the second stage, which uses the script model.
The choice of models to use is done by setting the MODEL variable in start_server.sh and start_syn_server.sh
THEaiTRobot 2.0 was used to generate the second THEaiTRE play, "Permeation/Prostoupení".
Actor Theodor Pištěk with his colleagues Alfréd Schleisnger and Marie (Máňa) Ženíšková in Takový je život (Such Is Life, dir. Carl Junghans, 1929). Theodor Pištěk in Cikáni (Gypsies, dir. Karel Anton, 1921). Pištěk putting on make-up. Pištěk with his portrait carved in glass in a segment from Československý zvukový týdeník Aktualita (Czechoslovak Aktualita Sound Newsreel) 1942, issue no. 28. Pištěk With his daughter-in-law Věra Filipová Pištěková on Bohumil Veselý's balcony.
The Thesaurus linguae Latinae is the first comprehensive dictionary of ancient Latin;
• it is compiled on the basis of all Latin texts surviving from antiquity (until AD 600), both literary and non-literary
• for less common words it cites every attestation, for the rest (those marked with an asterisk) an instructive and representative sample
• it records all meanings (including technical usages) and all constructions
• it documents peculiarities of inflection, spelling, and prosody
• it supplies information about the etymology of the Latin words and their survival in the Romance languages, contributed by recognised authorities in the fields of Indo-European and Romance studies
• it collects the comments of ancient sources on the word in question
The Thesaurus therefore offers for every Latin word a comprehensive, richly documented picture of its possibilities and history – not only for Latin scholars, but also for scholars of the various branches of ancient studies and for related disciplines.
An elegantly simple and robust machine-learning method, based on the combination of ideas from a number of MBL implementations, resulting in a useful tool for NLP research.