Show simple item record

 
dc.contributor.author Kopřivová, Marie
dc.contributor.author Komrsková, Zuzana
dc.contributor.author Lukeš, David
dc.contributor.author Poukarová, Petra
dc.contributor.author Škarpová, Marie
dc.date.accessioned 2018-01-02T12:24:15Z
dc.date.available 2018-01-02T12:24:15Z
dc.date.issued 2017-12-28
dc.identifier.uri http://hdl.handle.net/11234/1-2579
dc.description ORTOFON v1 is designed as a representation of authentic spoken Czech used in informal situations (private environment, spontaneity, unpreparedness etc.) in the area of the whole Czech Republic. The corpus is composed of 332 recordings from 2012–2017 and contains 1 014 786 orthographic words (i.e. a total of 1 236 508 tokens including punctuation); a total of 624 different speakers appear in the probes. ORTOFON v1 is fully balanced regarding the basic sociolinguistic speaker categories (gender, age group, level of education and region of childhood residence). The transcription is linked to the corresponding audio track. Unlike the ORAL-series corpora, the transcription was carried out on two main tiers, orthographic and phonetic, supplemented by an additional metalanguage tier. ORTOFON v1 is lemmatized and morphologically tagged. The (anonymized) transcriptions are provided in the XML Elan Annotation format, audio (with corresponding anonymization beeps) is in uncompressed 16-bit PCM WAV, mono, 16 kHz format. Another format option of the transcriptions is also available under less restrictive CC BY-NC-SA license at http://hdl.handle.net/11234/1-2580
dc.language.iso ces
dc.publisher Charles University, Faculty of Arts, Institute of the Czech National Corpus
dc.rights License Agreement for Czech National Corpus Data
dc.rights.uri https://lindat.mff.cuni.cz/repository/xmlui/page/license-cnc-data
dc.source.uri http://wiki.korpus.cz/doku.php/en:cnk:ortofon
dc.subject balanced corpus
dc.subject spoken language
dc.subject informal language
dc.subject Czech
dc.title ORTOFON v1: balanced corpus of informal spoken Czech with multi-tier transcription (transcriptions & audio)
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType audio
dc.rights.label ACA
has.files yes
branding LINDAT / CLARIN
contact.person David Lukeš david.lukes@ff.cuni.cz Charles University, Faculty of Arts, Institute of the Czech National Corpus
sponsor Ministerstvo školství, mládeže a tělovýchovy LM2015044 Český národní korpus nationalFunds
size.info 1000000 words
files.size 10309980762
files.count 1


 Files in this item

This item is
Academic Use
and licensed under:
License Agreement for Czech National Corpus Data
Attribution Required Noncommercial
Icon
Name
ortofon_v1.tar.gz
Size
9.6 GB
Format
application/x-gzip
Description
Neznámý
MD5
51e4f525fe2a478e7f5f73ef7af61c42
 Download file

Show simple item record