Subject: korpus - LINDAT/CLARIAH-CZ Catalog Search Results

Start Over Subject korpus

1. Automatická slovnědruhová desambiguace slova to v ustálených větných výrazech

Creator:: Hnátková, Milena
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: corpus, automatic morphological disambiguation, automatic identification of collocations, sentential phrases, word form to, korpus, automatická morfologická analýza, vyhledávání ustálených slovních spojení, větné frazémy, and slovní tvar toto
Language:: Czech
Description:: This paper deals with an automatic part-of-speech disambiguation of Czech texts containing the word to (E. it) in fixed collocations used especially in spoken Czech, and, moreover, with case identification of the pronominal reading of this word. The word to is ambiguous: the result of automatic morphological analysis of this word is either the pronominal lemma ten (it) as a nominative/accusative singular neuter, or the particle lemma to. It is very difficult to automatically distinguish the nonprepositional nominative and accusative case in Czech texts. Therefore, the paper primarily focuses on to as a particle. The software module performing automatic identification of collocations in Czech corpus texts is part of the automatic morphological rule-based disambiguation used for tagging texts of synchronic Czech in the corpora of the SYN series: it deals mainly with the disam-biguation of nongrammatical collocations and phrases. The paper focuses on fixed ex-pressions listed in the Dictionary of Czech Phraseology and Idiomatics and is based on the description of automatic identification and classification of collocations comprising the word to in the SYN2010 corpus. Also, examples (primarily idioms) are presented where automatic disambiguation using general grammatical rules yields unreliable results.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

2. Indikativische tempora in inhaltssätzen im Deutschen

Creator:: Koutová, Marta
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: indicative, relative tenses, subordinate content clause, indirect speech, corpus, indikativ, relativní časy, vedlejší věta obsahová, nepřímá řeč, and korpus
Language:: Czech
Description:: The article deals with the development of analytic approaches to the use of specific tenses in the indicative mood in German subordinate content clauses, as presented in German linguistics. The author presents the results of her own research based on the examples of subordinate content clauses found in the Mannheim corpus of German texts. According to the latest German scholarship, there are two principles governing the tense distribution in the indicative mood in subordinate content clauses introduced by a verb in the past tense: 1. the perspective of speaker 1, i.e. the point of view of the characters in the story, 2. the perspective of speaker 2, i.e. the point of view of the narrator. The first principle is comparable to the principle governing the use of the tenses in subordinate content clauses in Slavic languages, the second principle is comparable to the sequence of tenses used in English and other Germanic languages. The first principle finds its use more in the spoken or non-standard discourse, the second one is typical for standard German. The present paper focuses on sentences consisting of a past-tense main clause and one embedded content clause that allows the alternation between present tense and preterite (...dass sie schwanger ist vs. ...dass sie schwanger war), as attested in the Mannheim corpus. The analysis essentially confirms the existing approaches and theories but it also brings new findings, which call for adjusting the current views and which pose new questions for more comprehensive corpus-based research. and Článek se zabývá vývojem názorů německé lingvistiky na problematiku užití indikativních časů v nepřímé řeči, resp. šířeji ve větách obsahových v němčině. Ve druhé části článku jsou představeny výsledky krátkého výzkumu, který byl proveden na příkladech vět obsahových získaných z korpusu mannheimského IDS. Podle nejnovějších publikovaných poznatků existují v němčině dva způsoby užití indikativních časů ve větách obsahových uvozených slovesem v minulém čase: a) z pozice mluvčího 1, tj. z perspektivy postav; b) z pozice mluvčího 2, tj. z perspektivy vypravěče. První způsob je srovnatelný se způsobem užití časů ve větách obsahových ve slovanských jazycích, druhý způsob je srovnatelný s časovou sousledností v angličtině a dalších germán-ských jazycích. První způsob se více uplatňuje v mluveném či nespisovném projevu, druhý způsob je typický pro němčinu spisovnou. Analýza souboru vět s řídící větou v minulém čase a jedinou konkrétní závislou větou obsahovou, v níž si konkurují prézens a préteritum (...dass sie schwanger ist vs. ...dass sie schwanger war) z mannheimského korpusu v podstatě potvrdily dosavadní poznatky a teorie, nicméně přinášejí i některé korigující informace, resp. poznatky podněcující k rozsáhlejšímu korpusovému výzkumu.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

3. Jednoznačnost a kontext: kvantitativní studie

Creator:: Cvrček, Václav and Václavík, Jiří
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: context disambiguation, corpus, lemma, word-form, morphology, kontextová disambiguace, korpus, slovní tvar, and morfologie
Language:: Czech
Description:: General consensus in linguistics is that language context (or ''co-text'') plays crucial role in describing linguistic properties of language items. Isolated units are, as a corollary to this statement, inherently ambiguous (polysemous and/or polyfunctional). In this paper we describe the most influential forces leading to disambiguation of language units, specifically the role of n-gram length on its ambiguity.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

4. Neuhochdeutsche Lehnwörter im Tschechischen: eine Frequenzuntersuchung zwecks der Tendenzermittlung

Creator:: Tikhonov, Aleksej
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: germanisms, synchrony, diachrony, corpus, Czech, German, frequency, loanwords, ermanizmy, synchronie, diachronie, korpus, čeština, němčina, frekvence, and přejatá slova
Language:: Czech
Description:: This article deals with germanisms in Czech. Frequencies of 26 different new High German loanwords were analyzed in the Czech National Corpus. These borrowed words were standing in competition with their Czech synonyms. This comparison is used to study the question of whether germanisms or their equivalents in Czech are more used by native speakers. For this analysis new High German loanwords were deliberately selected in order to verify the actuality of the topic. But the major part of the study was examined in a diachronic period. This shows not only the current situation but in most cases the frequency of the selected loanwords throughout their existence. The calculations of the average frequency are made for each century (since 1650), and also in the recent modern period (from 1947 to 2008). and Článek se zabývá germanizmy v češtině. Prostřednictvím Českého národního korpusu byly zjišťovány různé frekvence 26 novohornoněmeckých výpůjček a jim konkurujících českých synonym. Článek se na základě frekvenčních srovnání snaží odpovědět na otázku, zda čeští rodilí mluvčí preferují germanizmy či dávají přednost jejich českým ekvivalentům. Článek analyzuje nejen aktuální situaci, ale ve většině případů ukazuje frekvenci vybraných germanizmů z diachronního hlediska, po celou dobu jejich existence. Byla vypočtena průměrná frekvence za každé století (od roku 1650), včetně posledního moderního období (od roku 1947 do roku 2008).
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

5. Případ Kmetiněves: leave the language alone, nebo axiologicky ukotvený průzkum jazyka?

Creator:: Šimandl, Josef
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: language use, corpora, laissez-faire linguistics, axiology (as evaluation of qualities, not only quantities), language culture, úzus, korpus, lingvistika laissez-faire, axiologie (jako hodnocení kvalit, ne jen kvantit), and jazyková kultura
Language:: Czech
Description:: From a case study, a kind of manifesto grows in this article - or a challenge to discuss the principles of axiology in a corpus-based grammar. Part 1 (introduction) presents some facts about a group of Czech village names. One of them has been used frequently in the media last year, not always in accordance with language handbooks; Part 2 records this phenomena. Part 3 sketches how this phenomena would be treated in the spirit of laissez-faire linguistics. Part 4 starts with a reminder that there are not only language phenomena in corpora, but errors as well. Then, the axiology is presented as observation of values (a) in the national language, (b) in texts, (c) in language description. A description within a badly needed axiologic frame is claimed and demonstrated, where language phenomena would be evaluated not only after mere frequencies, but also depending on qualities of source texts. Part 5 adumbrates a broader frame and some parallels of other disciplines where the description of human practice differs from theoretical postulates. Part 6 specifies the role this journal hopes to play in further discussions: about the use of corpora in a grammar research, about criteria of marking language phenomena, about distinguishing innovations from errors, about values of single language phenomena.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

6. Proměny prózy v letech 1992 až 2018

Creator:: Poukarová, Petra and Cvrček, Václav
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: beletrie, próza, registr, multidimenzionální analýza, korpus, překlad, fiction, prose, register, multidimension analysis, corpus, and translation
Language:: Czech
Description:: This study summarizes a corpus-based analysis of tendencies in register variation of Czech-written fiction texts in the period from 1992 to 2018. The analysis is based on projection of the results from a large sample of Czech prose texts (1070 texts, 12.7 mil. words) on a general register model (established by previous research using multidimensional analysis). The major tendencies found in the material are a decrease of cohesion level, addressee coding and retrospective narration, and increased polythematicity/lexical richness. These findings are supplemented by additional analyses of the role of translation, the position of a text excerpt in the original text (beginning, middle and end) and type of text in the results
Rights:: http://creativecommons.org/licenses/by-nc-sa/4.0/ and policy:public

7. Some current problems of corpus and computational linguistics, or Fifteen commandments and general truths

Creator:: Čermák, František
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: corpus, corpus lingustics, computational linguistics, methodology, type of data, type of information, representativeness of corpora, systems of tagging, lemmatizers, ir/regularity in language, collocations, meaning, aligners, korpus, korpusová lingvistika, komputační lingvistika, metodologie, typy dat, typy informace, reprezentativnost korpusu, systémy taggování, lemmatizátory, ne/pravidelnost v jazyce, kolokace, význam, and alignery
Language:: Czech
Description:: This contribution, which in a brief, succint and almost aphoristic way, critically brings forward to the reader a number of problems of today’s corpus and computational linguistics as well as their unsatisfactory solutions, is trying, at the same time, to do away with a number of myths and simplified opinions in the field. and Příspěvek ve stručné a téměř aforizované podobě připomíná řadu kritizovaných problémů a jejich neuspokojivých řešení v dnešní korpusové a komputační lingvistice a snaží se tak odstranit řadu mýtů a zjednodušujících představ.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

8. Srovnání žánrů v korpusu na základě syntaktických funkcí substantiv

Creator:: Jelínek, Tomáš
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: syntax, syntaktická funkce, korpus, žánr, reprezentativnost, syntactic function, corpus, genre, and representativeness
Language:: Czech
Description:: Large synchronic textual corpora of the Czech National Corpus are built as representative: they contain a balanced quantity of texts of various styles, divided into three genre subcorpora: fiction, technical/scientific literature and journalism. Comparisons of these genres have been performed on phonological and morphological level; in this paper, I deal with differences between genres on the surface-syntactic level. I use an automatic syntactic annotation of the SYN2005 corpus in the formalism of the analytical layer of the Prague Dependency Treebank. I compare the frequencies of syntactic functions of nouns in the three genres represented by the corresponding subcorpora of SYN2005. I also present a more detailed analysis of four syntactic phenomena: subtypes of the function of attribute in non-prepositional genitive; frequencies of groups of the type pan Novák (Mr. Novák); frequencies of the function of agent in passive constructions expressed by nouns in non-prepositional instrumental and the ratio of the expression of the nominal part of a verbal-nominal predicate by nominative and instrumental. Significant differences found between genres in all the syntactic phenomena analyzed show that in comparing corpora one should carefully monitor their genre composition.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

9. Syntaktická adverbializace typu jaktěživo neměl názor (zrušení shody se subjektem v rodě a čísle)

Creator:: Štěpán, Josef
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: agreement and disagreement with the subject in gender and number, adverbialisation, frequency, noun, pronoun, adverbial (frozen) expression, negative clause, corpus, shoda a neshoda se subjektem v rodě a čísle, adverbializace, frekvence, substantivum, zájmeno, adverbiální (ustrnulý) výraz, záporná věta, and korpus
Language:: Czech
Description:: On the basis of the material of the corpus SYN, the article deals, at first, with the description of morphologically frozen expressions jakživ, jaktěživ with an adverbial meaning ''never'' in negative clauses, while these expressions are, due to their ending, in syntactic agreement in gender and number with the grammatical subject. Also this agreement in positive clauses, where the frozen expressions mean ''ever (in one’s life)'', is briefly mentioned. However, the principal aim of the article is to show that the syntactic adverbialisation of these expressions in negative clauses causes the disturbance of this agreement, cf. jaktěživo neměl názor ''never in his life had he an opinion'', while there are two possible results of this adverbialisation: the forms of neuter jaktěživo, jakživo are more common in Bohemia, while the forms of masculine jaktěživ, jakživ are used rather in Moravia. The author interprets the frequency of both concordant and non-concordant (frozen) expressions, ordered according to their descending frequency in SYN.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

10. Wörter gibt's, die gibt‘s gar nicht!: ein Exkurs ins grammatische Raritätenkabinett

Creator:: Strecker, Bruno
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: corpora, frequency of the grammatical phenomena, correctness, realisations of syntactic constructions, korpus, frekvence gramatických jevů, správnost, and realizace syntaktických konstrukcí
Language:: Czech
Description:: The possibility to search electronically very large corpora of texts has opened up ways in which we can truly evaluate the rules through which grammarians have tried and continue to try to simulate natural languages. However, the possibility to handle incredibly large amounts of texts might lead to problems with the assessment of certain phenomena that are hardly ever represented in those corpora and yet, have always been regarded as grammatically correct elements of a given language. In German, typical phenomena of this kind are forms like betrögest or erwögest, i.e. second person singular of the so-called strong verbs in the subjunctive mood. Should we see them merely as grammarians’ inventions? Before doing so, we should reconsider the nature of these phenomena. They may appear to be isolated word forms but, in fact, are compact realizations of syntactic constructions, and it is the frequency of these constructions that should be evaluated, not the frequency of their specific realizations. and Možnost prohledávat velmi rozsáhlé korpusy textů pomocí elektronických nástrojů ukazuje cesty, jak evaluovat pravidla, jimiž se lingvisté snažili a stále snaží simu-lovat přirozený jazyk. Avšak možnost zpracovávat obrovské množství textů může přiná-šet problémy, jak hodnotit jisté jevy, jež se i v takto velkých korpusech nikdy nevyskytly, přestože byly vždy považovány za gramaticky korektní elementy daného jazyka. V němčině jsou typickými prvky tohoto druhu tvary jako betrögest nebo erwögest, tj. 2. os. sg. konjunktivu préterita tzv. silných sloves. Máme se na ně dívat jako na pouhý výmysl gramatiků? Než tak učiníme, měli bychom znovu zhodnotit povahu těchto jevů. Může se zdát, že jde o izolované slovní formy, avšak ve skutečnosti jde o kondenzované realizace syntaktických konstrukcí, a proto bychom měli hodnotit frekvenci těchto konstrukcí, ni-koli frekvenci jejich specifických realizací.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

1. Automatická slovnědruhová desambiguace slova to v ustálených větných výrazech

2. Indikativische tempora in inhaltssätzen im Deutschen

3. Jednoznačnost a kontext: kvantitativní studie

4. Neuhochdeutsche Lehnwörter im Tschechischen: eine Frequenzuntersuchung zwecks der Tendenzermittlung

5. Případ Kmetiněves: leave the language alone, nebo axiologicky ukotvený průzkum jazyka?

6. Proměny prózy v letech 1992 až 2018

7. Some current problems of corpus and computational linguistics, or Fifteen commandments and general truths

8. Srovnání žánrů v korpusu na základě syntaktických funkcí substantiv

9. Syntaktická adverbializace typu jaktěživo neměl názor (zrušení shody se subjektem v rodě a čísle)

10. Wörter gibt's, die gibt‘s gar nicht!: ein Exkurs ins grammatische Raritätenkabinett

Limit your search

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Coverage

Creator

Show values starting with

Format

Language

Rights

Subject

Show values starting with

Type

Original context has metadata only

Harvested from