This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

ORAL2008: Balanced corpus of informal spoken Czech

Please use the following text to cite this item or export to a predefined format:
Waclawičová, Martina; Kopřivová, Marie; Křen, Michal and Válková, Lucie, 2008, ORAL2008: Balanced corpus of informal spoken Czech, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11858/00-097C-0000-0023-119D-A.
Date issued
2008
Size
1000000 words
Language(s)
Description
Balanced corpus of informal spoken Czech sized 1 MW. It contains transcriptions of 297 recordings made in 2002–2007 in the whole of Bohemia. All the recordings were made in informal situations to ensure prototypically spontaneous spoken language. This means private environment, physical presence of speakers who know each other, unscripted speech and topic not given in advance. The total number of speakers is 995, the corpus is balanced in their main sociolinguistic categories (gender, age group, education, region of childhood residence). The corpus is provided in a (semi-XML) vertical format used as an input to the Manatee query engine. The data thus exactly correspond to the corpus available via query interface to registered users of the CNC.
Acknowledgement
This item isPublicly Available
and licensed under:
 Files in this item
Name
oral2008.gz
Size
2.58 MB
Format
application/x-gzip
Description
corpus data
MD5
4558c08d313f18a6a4c460acee3e1e4e
Preview
  File Preview
    • oral200817 MB