Among the results of Russian influence on Czech in the 19th century was the emergence of an active past participle in -(v)ší in Czech. Although not welcomed by all grammarians, this participle continued its existence in Czech until today, becoming mainly a device of archaic and bookish style. In the actual work, the occurence oft the active past participle in -(v)ší in the largest partial corpus of the Czech National Corpus containing journalistic texts is studied. A main result of the study is that apart from a large number of examples from different verbs which show the active past participle on -(v)ší in the studied corpus once or twice and where it is indeed a device of archaic and bookish style, sometimes even of irony and humor, there is a small group of (mainly intransitive) verbs, where this participle functions with considerable frequency in stylistically more neutral contexts of written Standard Czech as the only participle (sometimes as a - stylistically more marked - variant of a more numerous active past participle in -l). In theses cases, it remains overwhelmingly a syntactically unextended direct attribute of a noun. Such active past participle in -(v)ší is to be found most often in sports coverage where it is built from a set of verbs with terminological function.
Experimental materials, data and R scripts used in the paper "Garden-path sentences and the diversity of their
(mis)representations" (Ceháková - Chromý, 2023).
RobeCzech is a monolingual RoBERTa language representation model trained on Czech data. RoBERTa is a robustly optimized Transformer-based pretraining approach. We show that RobeCzech considerably outperforms equally-sized multilingual and Czech-trained contextualized language representation models, surpasses current state of the art in all five evaluated NLP tasks and reaches state-of-theart results in four of them. The RobeCzech model is released publicly at https://hdl.handle.net/11234/1-3691 and https://huggingface.co/ufal/robeczech-base, both for PyTorch and TensorFlow.
The item contains a list of 2,058 noun/verb conversion pairs along with related formations (word-formation paradigms) provided with linguistic features, including semantic categories that characterize semantic relations between the noun and the verb in each conversion pair. Semantic categories were assigned manually by two human annotators based on a set of sentences containing the noun and the verb from individual conversion pairs. In addition to the list of paradigms, the item contains a set of 739 files (a separate file for each conversion pair) annotated by the annotators in parallel and a set of 2,058 files containing the final annotation, which is included in the list of paradigms.
Simple question answering database version 2.1 (SQAD_v2.1) created from Czech Wikipedia. Each record of SQAD consist of four files (in vertical form provided with lemmatization and POS tagging) and two metadata files.
Simple question answering database version 3 (SQAD v3) created from Czech Wikipedia. New version consits of 13477 records. Each record of SQAD consist of multiple files - question, answer extraction, answer selection, ulr, question metadata and in some cases answer context.
Simple question answering database (SQAD) created from Czech Wikipedia. Each record of SQAD consist of four files (in vertical form provided with lemmatization and POS tagging) and two metadata files.
This was the Opening Address at ''Fateful Eights in Czech History: Historical Anniversaries of 2008 and Their Signifi cance for the Czech Republic Today'', an international conference organized by the Czech Embassy in Washington, held at the George Washington University, Washington, D.C., on 23-24 October 2008. In this essay the author provides a basic overview of twentieth-century Czech history, weighing the gains and losses, the victories and defeats, the ups and downs of the Czechs, the Czech nation, Czech society, on the way from gaining independence in a democratic state to loosing it, and the German occupation, to the renewal of Czechoslovak independence and the destruction of democracy under the Communist regime, to the failed attempt at the reform of that regime, and the victory of the democratic revolution - all marked by the historical milestones of the years 1918, 1938/39, 1945-48, 1968, and 1989 - as well as the author’s refl ections on the long-term changes in the mentality of the country.
The Valency Lexicon of Czech Verbs, Version 2.5 (VALLEX 2.5), is a collection of linguistically annotated data and documentation, resulting from an attempt at formal description of valency frames of Czech verbs. VALLEX 2.5 has been developed at the Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University, Prague.
VALLEX 2.5 provides information on the valency structure (combinatorial potential) of verbs in their particular senses - there are roughly 2,730 lexeme entries containing together around 6,460 lexical units ("senses"). and LC 536 - Center for Computational Linguistics, 1ET100300517 and 1ET101120503.
VALLEX 3.0 provides information on the valency structure (combinatorial potential) of verbs in their particular senses, which are characterized by glosses and examples. VALLEX 3.0 describes almost 4 600 Czech verbs in more than 10 800 lexical units, i.e., given verbs in the given senses.
VALLEX 3.0 is a is a collection of linguistically annotated data and documentation, resulting from an attempt at formal description of valency frames of Czech verbs. In order to satisfy different needs of different potential users, the lexicon is distributed (i) in a HTML version (the data allows for an easy and fast navigation through the lexicon) and (ii) in a machine-tractable form as a single XML file, so that the VALLEX data can be used in NLP applications.