ORAL2006: Corpus of informal spoken Czech
Please use the following text to cite this item or export to a predefined format:
Kopřivová, Marie and Waclawičová, Martina, 2006,
ORAL2006: Corpus of informal spoken Czech, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11858/00-097C-0000-0023-119C-C.
Authors
Item identifier
Project URL
Date issued
2006
Size
1000000 words
Language(s)
Description
Corpus of informal spoken Czech sized 1 MW. It contains transcriptions of 221 recordings made in 2002–2006 in the whole of Bohemia. All the recordings were made in informal situations to ensure prototypically spontaneous spoken language. This means private environment, physical presence of speakers who know each other, unscripted speech and topic not given in advance. The total number of speakers is 754, the metadata include sociolinguistic information about them.
The corpus is provided in a (semi-XML) vertical format used as an input to the Manatee query engine. The data thus exactly correspond to the corpus available via query interface to registered users of the CNC.
Acknowledgement
Ministerstvo školství, mládeže a tělovýchovy České republiky
Project code:MSM 0021620823
Project name:Český národní korpus a korpusy dalších jazyků
Subject(s)
Collections
This item isPublicly Available
and licensed under:


