Languages in Migration

Name: Languages in Migration
License: https://lindat.mff.cuni.cz/repository/xmlui/page/license-cnc

Bučková, Aneta; Nekula, Marek; Lukeš, David; Woźniak, Michał; Wastl, Michael; Polowy, Louisa

dc.contributor.author	Bučková, Aneta
dc.contributor.author	Nekula, Marek
dc.contributor.author	Lukeš, David
dc.contributor.author	Woźniak, Michał
dc.contributor.author	Wastl, Michael
dc.contributor.author	Polowy, Louisa
dc.date.accessioned	2023-02-24T17:10:50Z
dc.date.available	2023-02-24T17:10:50Z
dc.date.issued	2023-01-04
dc.identifier.uri	http://hdl.handle.net/11372/LRT-4777
dc.description	LANGUAGES IN MIGRATION is designed as a representation of authentic spoken Czech and German that is used in informal speech (private environment, spontaneity, unpreparedness etc.) by Czech-German bilingual speakers born in Czechoslovakia around 1955 and who departed for Germany after becoming 12 years old. The corpus is composed of interviews conducted from 2018–2020 with 20 speakers on language biographies and narrated in Czech and German respectively. 10 interviews were recorded with late (German) repatriates and 10 with Czech migrants. The corpus includes transcripts of ca. 14 hours of Czech recordings and ca. 13,5 hours of German recordings. It contains 217 650 orthographic words (i.e. a total of 286 533 tokens including punctuation). Metadata of LANGUAGES IN MIGRATION include basic sociolinguistically relevant speaker categories (gender, year of birth and of migration, level of education and region of childhood and present residence). The transcription of LANGUAGES IN MIGRATION is linked to the corresponding audio track. The transcription was carried out on the orthographic tier and supplemented by an additional metalanguage tier. The corpus LANGUAGES IN MIGRATION is lemmatized and morphologically tagged in different formats for Czech and German (Stuttgart-Tübingen-Tagset). Deviations from the norm of the spoken Czech and German of the homeland, which are understood as the result of language contact and language isolation, are tagged in a further tier both in the Czech and in the German sub-corpuses of LANGUAGES IN MIGRATION. The (anonymized) corpus is provided in form of transcripts in EAF format, which can be viewed via the freely available ELAN program, and a (semi-XML) vertical format used as an input to the Manatee query engine. The data thus correspond to the corpus available via the KonText query engine to registered users of the CNC at http://www.korpus.cz
dc.language.iso	deu
dc.language.iso	ces
dc.publisher	Faculty of Arts, Institute of the Czech National Corpus, Charles University in Prague
dc.publisher	Universität Regensburg
dc.relation.isreferencedby	https://bruecken.ff.cuni.cz/magazin/2-28-2021/
dc.rights	Czech National Corpus (Shuffled Corpus Data)
dc.rights.uri	https://lindat.mff.cuni.cz/repository/xmlui/page/license-cnc
dc.subject	spoken language
dc.subject	bilingual
dc.subject	syntactic annotation
dc.subject	migrant language
dc.subject	narrative interviews
dc.subject	language biography
dc.title	Languages in Migration
dc.type	corpus
metashare.ResourceInfo#ContentInfo.mediaType	text
dc.rights.label	ACA
has.files	yes
branding	LRT + Open Submissions
demo.uri	https://www.korpus.cz/kontext/query?corpname=jazyky_v_migraci
contact.person	Marek Nekula marek.nekula@ur.de Universität Regensburg
sponsor	Deutsche Forschungsgesellschaft HA 2659/9-1 Language across generations: contact induced change in morphosyntax in German-Polish bilingual speech nationalFunds
files.size	5240851
files.count	6