1 - 6 of 6
Number of results to display per page
Search Results
2. ORTOFON v1: balanced corpus of informal spoken Czech with multi-tier transcription (transcriptions & audio)
- Creator:
- Kopřivová, Marie, Komrsková, Zuzana, Lukeš, David, Poukarová, Petra, and Škarpová, Marie
- Publisher:
- Charles University, Faculty of Arts, Institute of the Czech National Corpus
- Type:
- audio and corpus
- Subject:
- balanced corpus, spoken language, informal language, and Czech
- Language:
- Czech
- Description:
- ORTOFON v1 is designed as a representation of authentic spoken Czech used in informal situations (private environment, spontaneity, unpreparedness etc.) in the area of the whole Czech Republic. The corpus is composed of 332 recordings from 2012–2017 and contains 1 014 786 orthographic words (i.e. a total of 1 236 508 tokens including punctuation); a total of 624 different speakers appear in the probes. ORTOFON v1 is fully balanced regarding the basic sociolinguistic speaker categories (gender, age group, level of education and region of childhood residence). The transcription is linked to the corresponding audio track. Unlike the ORAL-series corpora, the transcription was carried out on two main tiers, orthographic and phonetic, supplemented by an additional metalanguage tier. ORTOFON v1 is lemmatized and morphologically tagged. The (anonymized) transcriptions are provided in the XML Elan Annotation format, audio (with corresponding anonymization beeps) is in uncompressed 16-bit PCM WAV, mono, 16 kHz format. Another format option of the transcriptions is also available under less restrictive CC BY-NC-SA license at http://hdl.handle.net/11234/1-2580
- Rights:
- License Agreement for Czech National Corpus Data, https://lindat.mff.cuni.cz/repository/xmlui/page/license-cnc-data, and ACA
3. ORTOFON v1: balanced corpus of informal spoken Czech with multi-tier transcription (transcriptions)
- Creator:
- Kopřivová, Marie, Komrsková, Zuzana, Lukeš, David, Poukarová, Petra, and Škarpová, Marie
- Publisher:
- Charles University, Faculty of Arts, Institute of the Czech National Corpus
- Type:
- text and corpus
- Subject:
- balanced corpus, spoken language, informal language, and Czech
- Language:
- Czech
- Description:
- ORTOFON v1 is designed as a representation of authentic spoken Czech used in informal situations (private environment, spontaneity, unpreparedness etc.) in the area of the whole Czech Republic. The corpus is composed of 332 recordings from 2012–2017 and contains 1 014 786 orthographic words (i.e. a total of 1 236 508 tokens including punctuation); a total of 624 different speakers appear in the probes. ORTOFON v1 is fully balanced regarding the basic sociolinguistic speaker categories (gender, age group, level of education and region of childhood residence). The transcription is linked to the corresponding audio track. Unlike the ORAL-series corpora, the transcription was carried out on two main tiers, orthographic and phonetic, supplemented by an additional metalanguage tier. ORTOFON v1 is lemmatized and morphologically tagged. The (anonymized) corpus is provided in a (semi-XML) vertical format used as an input to the Manatee query engine. The data thus correspond to the corpus available via the KonText query engine to registered users of the CNC at http://www.korpus.cz Please note: this item includes only the transcriptions, audio (and the transcripts in their original format) is available under more restrictive non-CC license at http://hdl.handle.net/11234/1-2579
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
4. ORTOFON v3: corpus of informal spoken Czech with multi-tier transcription (transcriptions & audio)
- Creator:
- Lukeš, David, Kopřivová, Marie, Laubeová, Zuzana, Poukarová, Petra, Horký, Václav, Jelínek, Tomáš, Křivan, Jan, Waclawičová, Martina, Benešová, Lucie, and Škarpová, Marie
- Publisher:
- Charles University, Faculty of Arts, Institute of the Czech National Corpus
- Type:
- audio and corpus
- Subject:
- spoken language and informal language
- Language:
- Czech
- Description:
- ORTOFON v3 is a corpus of authentic spoken Czech used in informal situations (private environment, spontaneity, unpreparedness etc.) that covers the area of the whole Czech Republic. The corpus is composed of 697 recordings from 2012–2020 and contains 2 445 793 orthographic words (i.e. a total of 2 976 742 tokens including punctuation); a total of 1 121 different speakers appear in the probes. ORTOFON v3 is partially balanced regarding the basic sociolinguistic speaker categories (gender, age group, level of education and region of childhood residence). The transcription is linked to the corresponding audio track. Unlike the ORAL-series corpora, the transcription was carried out on two main tiers, orthographic and phonetic, supplemented by an additional metalanguage tier. The (anonymized) transcriptions are provided in the XML Elan Annotation format, audio (with corresponding anonymization beeps) is in uncompressed 16-bit PCM WAV, mono, 16 kHz format. Another format option of the transcriptions is also available under less restrictive CC BY-NC-SA license at http://hdl.handle.net/11234/1-5687
- Rights:
- License Agreement for Czech National Corpus Data, ACA, and https://lindat.mff.cuni.cz/repository/xmlui/page/license-cnc-data
5. ORTOFON v3: corpus of informal spoken Czech with multi-tier transcription (transcriptions)
- Creator:
- Lukeš, David, Kopřivová, Marie, Laubeová, Zuzana, Poukarová, Petra, Horký, Václav, Jelínek, Tomáš, Křivan, Jan, Waclawičová, Martina, Benešová, Lucie, and Škarpová, Marie
- Publisher:
- Charles University, Faculty of Arts, Institute of the Czech National Corpus
- Type:
- text and corpus
- Subject:
- spoken language and informal language
- Language:
- Czech
- Description:
- ORTOFON v3 is a corpus of authentic spoken Czech used in informal situations (private environment, spontaneity, unpreparedness etc.) that covers the area of the whole Czech Republic. The corpus is composed of 697 recordings from 2012–2020 and contains 2 445 793 orthographic words (i.e. a total of 2 976 742 tokens including punctuation); a total of 1 121 different speakers appear in the probes. ORTOFON v3 is partially balanced regarding the basic sociolinguistic speaker categories (gender, age group, level of education and region of childhood residence). The transcription is linked to the corresponding audio track. Unlike the ORAL-series corpora, the transcription was carried out on two main tiers, orthographic and phonetic, supplemented by an additional metalanguage tier. ORTOFON v3 is lemmatized and morphologically tagged according to the SYN2020 standard. This was performed with special attention paid to the specificity of the informal spoken Czech and includes also spoken training data. The (anonymized) corpus is provided in a (semi-XML) vertical format used as an input to the Manatee query engine. The data thus correspond to the corpus available via the KonText query engine to registered users of the CNC at http://www.korpus.cz Please note: this item includes only the transcriptions, audio (and the transcripts in their original format) is available under more restrictive non-CC license at http://hdl.handle.net/11234/1-5686
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
6. Přínos i úskalí edice exulantské hymnografie
- Creator:
- Škarpová, Marie
- Format:
- Type:
- model:internalpart and TEXT
- Language:
- Czech
- Rights:
- http://creativecommons.org/licenses/by-nc-sa/4.0/ and policy:public