Original context has metadata only: false / Subject: corpus - LINDAT/CLARIAH-CZ Catalog Search Results

Start Over Subject corpus Original context has metadata only false

21. HWC2023 –Hamburg.de Website Corpus 2023

Creator:: Rüdiger, Jan Oliver
Publisher:: Leibniz-Institut für Deutsche Sprache
Type:: text and corpus
Subject:: corpus, Web corpus, web corpora, Germanistik, German, websites, crawling corpus, and CorpusExplorer
Language:: German
Description:: A petition for a referendum (called: "Schluss mit Gendersprache in Verwaltung und Bildung" / eng.: "abolition of gender language in administration and education") was formed in Hamburg in February 2023. The project "Empirical Gender Linguistics" at the "Leibniz Institute for the German Language" took this as an opportunity to completely scrap the "https://www.hamburg.de" website (except the list of ships in the Port of Hamburg and the yellow page). The Hamburg.de website is the central digital contact point for citizens. The scraped texts were cleaned, processed and annotated using http://www.CorpusExplorer.de (TreeTagger - POS/Lemma information). We use the corpus to analyze the use of words with gender signs.
Rights:: Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0), PUB, and http://creativecommons.org/licenses/by-nc-sa/3.0/

22. Indikativische tempora in inhaltssätzen im Deutschen

Creator:: Koutová, Marta
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: indicative, relative tenses, subordinate content clause, indirect speech, corpus, indikativ, relativní časy, vedlejší věta obsahová, nepřímá řeč, and korpus
Language:: Czech
Description:: The article deals with the development of analytic approaches to the use of specific tenses in the indicative mood in German subordinate content clauses, as presented in German linguistics. The author presents the results of her own research based on the examples of subordinate content clauses found in the Mannheim corpus of German texts. According to the latest German scholarship, there are two principles governing the tense distribution in the indicative mood in subordinate content clauses introduced by a verb in the past tense: 1. the perspective of speaker 1, i.e. the point of view of the characters in the story, 2. the perspective of speaker 2, i.e. the point of view of the narrator. The first principle is comparable to the principle governing the use of the tenses in subordinate content clauses in Slavic languages, the second principle is comparable to the sequence of tenses used in English and other Germanic languages. The first principle finds its use more in the spoken or non-standard discourse, the second one is typical for standard German. The present paper focuses on sentences consisting of a past-tense main clause and one embedded content clause that allows the alternation between present tense and preterite (...dass sie schwanger ist vs. ...dass sie schwanger war), as attested in the Mannheim corpus. The analysis essentially confirms the existing approaches and theories but it also brings new findings, which call for adjusting the current views and which pose new questions for more comprehensive corpus-based research. and Článek se zabývá vývojem názorů německé lingvistiky na problematiku užití indikativních časů v nepřímé řeči, resp. šířeji ve větách obsahových v němčině. Ve druhé části článku jsou představeny výsledky krátkého výzkumu, který byl proveden na příkladech vět obsahových získaných z korpusu mannheimského IDS. Podle nejnovějších publikovaných poznatků existují v němčině dva způsoby užití indikativních časů ve větách obsahových uvozených slovesem v minulém čase: a) z pozice mluvčího 1, tj. z perspektivy postav; b) z pozice mluvčího 2, tj. z perspektivy vypravěče. První způsob je srovnatelný se způsobem užití časů ve větách obsahových ve slovanských jazycích, druhý způsob je srovnatelný s časovou sousledností v angličtině a dalších germán-ských jazycích. První způsob se více uplatňuje v mluveném či nespisovném projevu, druhý způsob je typický pro němčinu spisovnou. Analýza souboru vět s řídící větou v minulém čase a jedinou konkrétní závislou větou obsahovou, v níž si konkurují prézens a préteritum (...dass sie schwanger ist vs. ...dass sie schwanger war) z mannheimského korpusu v podstatě potvrdily dosavadní poznatky a teorie, nicméně přinášejí i některé korigující informace, resp. poznatky podněcující k rozsáhlejšímu korpusovému výzkumu.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

23. Individual Textual Profiles of Hillary Clinton and Donald Trump

Creator:: Kvítková, Alena
Publisher:: Charles University, Faculty of Arts, Department of English Language and ELT Methodology
Type:: text and corpus
Subject:: idiolect, individual textual profile, Clinton, Trump, corpus, presidential debates, American president, candidates, Democrats, and Republicans
Language:: English
Description:: This corpus consists of full transcriptions of both Democratic and Republican 2016 presidential candidate debates, with a special focus on the idiolects of Hillary Clinton and Donald Trump against the background of the speeches of other candidates for the post of president of the United States. The transcriptions are sourced from the American Presidency Project at the University of California, Santa Barbara. Any use of the material requires a prior and explicit written permission by the project administrator (contact policy@ucsb.edu). This corpus material is now being shared with their kindly permission.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

24. Indonesian web corpus (idWac)

Creator:: Medveď, Marek and Suchomel, Vít
Publisher:: Natural Language Processing Centre, Faculty of Informatics, Masaryk University
Type:: text and corpus
Subject:: corpus, lemmatization, and PoS tagging
Language:: Indonesian
Description:: Indonesian text corpus from web. Crawling done by SpiderLing in 2017. Filtering by JusText and Onion (see http://corpus.tools/ for details). Tagged and lemmatized by MorphInd (http://septinalarasati.com/morphind/).
Rights:: NLP Centre Web Corpus License, https://lindat.mff.cuni.cz/repository/xmlui/page/license-NLPC-WeC, and ACA

25. Investigating nepřizpůsobivý as a key word in critical analysis of Czech press reports on Roma

Creator:: Slavíčková, Tess
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: corpus, CDA, media, otherization, Roma, nepřizpůsobivý, quantitative, qualitative research, frequency, collocation, attribution, kritická analýza diskurzu, Romové, kvantitativní a kvalitativní výzkum, vyčleňování z většinové společnosti, minority, and kolokační analýza
Language:: Czech
Description:: This paper works with data provided by the Czech National Corpus to consider the use of nepřizpůsobivý (inadaptable) by the Czech mainstream print media as a code word that is widely understood to signify a Roma citizen. The study shows that nepřizpůsobivý is used far more frequently in journalism than in other text genres and that its use has increased over the past decade. Examination of collocations reveals that nepřizpůsobivý typically is associated with negative reports on housing, residency and crime. This paper can also be seen as a case study to illustrate the usefulness of corpus data to critical discourse analysis and the role of the corpus in providing quantitative support to qualitative research in general. and Článek založený na datech z Českého národního korpusu zkoumá užívání slova nepřizpůsobivý v hlavních českých denících. Nepřizpůsobivý je ve skutečnosti užíváno jako zástupné slovo pro Romy / romskou populaci. Výzkum ukazuje, že toto slovo se používá daleko častěji v rámci publicistiky než v jiných typech textů a že jeho frekvence v posledních deseti letech výrazně stoupla. Kolokační analýza odhaluje, že slovo nepřizpů-sobivý se typicky vyskytuje v negativních kontextech v novinových článcích o bytech a bydlení obecně, o soužití občanů a o kriminalitě. Tento článek může být nahlížen i jako případová studie, která je příkladem využití korpusových dat v kritické analýze diskurzu a zároveň dokládá roli korpusu v poskytování kvantitativní opory v rámci kvalitativního lingvistického výzkumu.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

26. Jednoznačnost a kontext: kvantitativní studie

Creator:: Cvrček, Václav and Václavík, Jiří
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: context disambiguation, corpus, lemma, word-form, morphology, kontextová disambiguace, korpus, slovní tvar, and morfologie
Language:: Czech
Description:: General consensus in linguistics is that language context (or ''co-text'') plays crucial role in describing linguistic properties of language items. Isolated units are, as a corollary to this statement, inherently ambiguous (polysemous and/or polyfunctional). In this paper we describe the most influential forces leading to disambiguation of language units, specifically the role of n-gram length on its ambiguity.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

27. KAMOKO-Digitalizer

Creator:: Rüdiger, Jan Oliver
Publisher:: Rüdiger, Jan Oliver
Type:: tool and toolService
Subject:: learner corpus, corpus, and annotation
Language:: German
Description:: This editor was developed especially for the needs of the KAMOKO project (https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-3261). The editor allows the quick entry of example sentences and sentence variants as well as the corresponding speaker ratings.
Rights:: Affero General Public License 3 (AGPL-3.0), http://opensource.org/licenses/AGPL-3.0, and PUB

28. KAMOKO: KAsseler MOrgenstern KOrpus

Creator:: Schrott, Angela, Wieders-Lohéac, Aline, and Rüdiger, Jan Oliver
Publisher:: Universität Kassel - Institut für Romanistik
Type:: text and corpus
Subject:: corpus, annotated corpus, French, learner corpus, XML, CorpusExplorer, TXM, and WeblichtXML
Language:: French
Description:: KAMOKO is a structured and commented french learner-corpus. It addresses the central structures of the French language from a linguistic perspective (18 different courses). The text examples in this corpus are annotated by native speakers. This makes this corpus a valuable resource for (1) advanced language practice/teaching and (2) linguistics research. The KAMOKO corpus can be used free of charge. Information on the structure of the corpus and instructions on how to use it are presented in detail in the KAMOKO Handbook and a video-tutorial (both in german). In addition to the raw XML-data, we also offer various export formats (see ZIP files – supported file formats: CorpusExplorer, TXM, WebLicht, TreeTagger, CoNLL, SPEEDy, CorpusWorkbench and TXT).
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

29. KAMOKO: KAsseler MOrgenstern KOrpus (2021-02-09)

Creator:: Schrott, Angela, Wieders-Lohéac, Aline, and Rüdiger, Jan Oliver
Publisher:: Universität Kassel - Institut für Romanistik
Type:: text and corpus
Subject:: corpus, annotated corpus, French, learner corpus, XML, CorpusExplorer, TXM, and WeblichtXML
Language:: French
Description:: KAMOKO is a structured and commented french learner-corpus. It addresses the central structures of the French language from a linguistic perspective (18 different courses). The text examples in this corpus are annotated by native speakers. This makes this corpus a valuable resource for (1) advanced language practice/teaching and (2) linguistics research. The KAMOKO corpus can be used free of charge. Information on the structure of the corpus and instructions on how to use it are presented in detail in the KAMOKO Handbook and a video-tutorial (both in german). In addition to the raw XML-data, we also offer various export formats (see ZIP files – supported file formats: CorpusExplorer, TXM, WebLicht, TreeTagger, CoNLL, SPEEDy, CorpusWorkbench and TXT).
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

30. Khresmoi Query Translation Test Data 1.0

Creator:: Pecina, Pavel, Dušek, Ondřej, Hajič, Jan, and Urešová, Zdeňka
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: corpus, test data, medical, health, machine translation, Czech, French, German, and English
Language:: English, French, German, and Czech
Description:: This package contains data sets for development and testing of machine translation of medical search short queries between Czech, English, French, and German. The queries come from general public and medical experts. and This work was supported by the EU FP7 project Khresmoi (European Comission contract No. 257528). The language resources are distributed by the LINDAT/Clarin project of the Ministry of Education, Youth and Sports of the Czech Republic (project no. LM2010013). We thank Health on the Net Foundation for granting the license for the English general public queries, TRIP database for granting the license for the English medical expert queries, and three anonymous translators and three medical experts for translating amd revising the data.
Rights:: Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0), http://creativecommons.org/licenses/by-nc/3.0/, and PUB

21. HWC2023 –Hamburg.de Website Corpus 2023

22. Indikativische tempora in inhaltssätzen im Deutschen

23. Individual Textual Profiles of Hillary Clinton and Donald Trump

24. Indonesian web corpus (idWac)

25. Investigating nepřizpůsobivý as a key word in critical analysis of Czech press reports on Roma

26. Jednoznačnost a kontext: kvantitativní studie

27. KAMOKO-Digitalizer

28. KAMOKO: KAsseler MOrgenstern KOrpus

29. KAMOKO: KAsseler MOrgenstern KOrpus (2021-02-09)

30. Khresmoi Query Translation Test Data 1.0

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Coverage

Creator

Show values starting with

Format

Language

Show values starting with

Publisher

Show values starting with

Rights

Show values starting with

Subject

Show values starting with

Type

Show values starting with

Date

Original context has metadata only

Harvested from