This is not the latest version of this item. The latest version can be found here.
ORTOFON v1: balanced corpus of informal spoken Czech with multi-tier transcription (transcriptions & audio)
Please use the following text to cite this item or export to a predefined format:
Kopřivová, Marie; Komrsková, Zuzana; Lukeš, David; Poukarová, Petra and Škarpová, Marie, 2017,
ORTOFON v1: balanced corpus of informal spoken Czech with multi-tier transcription (transcriptions & audio), LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11234/1-2579.
Authors
Item identifier
Project URL
Date issued
2017-12-28
Size
1000000 words
Language(s)
Description
ORTOFON v1 is designed as a representation of authentic spoken Czech used in informal situations (private environment, spontaneity, unpreparedness etc.) in the area of the whole Czech Republic. The corpus is composed of 332 recordings from 2012–2017 and contains 1 014 786 orthographic words (i.e. a total of 1 236 508 tokens including punctuation); a total of 624 different speakers appear in the probes. ORTOFON v1 is fully balanced regarding the basic sociolinguistic speaker categories (gender, age group, level of education and region of childhood residence).
The transcription is linked to the corresponding audio track. Unlike the ORAL-series corpora, the transcription was carried out on two main tiers, orthographic and phonetic, supplemented by an additional metalanguage tier. ORTOFON v1 is lemmatized and morphologically tagged. The (anonymized) transcriptions are provided in the XML Elan Annotation format, audio (with corresponding anonymization beeps) is in uncompressed 16-bit PCM WAV, mono, 16 kHz format.
Another format option of the transcriptions is also available under less restrictive CC BY-NC-SA license at http://hdl.handle.net/11234/1-2580
Acknowledgement
Ministerstvo školství, mládeže a tělovýchovy
Project code:LM2015044
Project name:Český národní korpus
Subject(s)
Collections
Files in this item
- Name
- ortofon_v1.tar.gz
- Size
- 9.6 GB
- Format
- application/x-gzip
- Description
- gzip Archive
- MD5
- 51e4f525fe2a478e7f5f73ef7af61c42

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz

