Prague Dependency Treebank of Spoken Czech 2.0 (PDTSC 2.0)
- Title:
- Prague Dependency Treebank of Spoken Czech 2.0 (PDTSC 2.0)
- Creator:
- Mikulová, Marie, Bémová, Alevtina, Hajič, Jan, Hajičová, Eva, Ircing, Pavel, Kolářová, Veronika, Lopatková, Markéta, Mareček, David, Mírovský, Jiří, Nedoluzhko, Anna, Pajas, Petr, Panevová, Jarmila, Peterek, Nino, Romportl, Jan, Sgall, Petr, Ševčíková, Magda, Štěpánek, Jan, Urešová, Zdeňka, and Žabokrtský, Zdeněk
- Contributor:
- Ministerstvo školství, mládeže a tělovýchovy České republiky@@LM2015071@@LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat@@nationalFunds@@, Grantová agentura České republiky@@GA17-12624S@@Subkategorizace adverbiálních významů na základě korpusových dat@@nationalFunds@@, Czech Science Foundation@@GA16-05394S@@Structure of coreferential chains in parallel language data@@nationalFunds@@, Grantová agentura České republiky@@GA16-18177S@@An Integrated Approach to Derivational and Inflectional Morphology of Czech@@nationalFunds@@, Grantová agentura České republiky@@GA17-07313S@@Contextually-based synonymy and valency of verbs in a bilingual setting@@nationalFunds@@, European Union@@FP6-IST-5-034434-IP@@Companions IP@@euFunds@@, National Science Foundation (USA)@@NSF IIS-9732388@@Data preparation for Workshop 1998, JHU, Baltimore, MD, USA@@Other@@, and National Science Foundation (USA)@@0122466@@MALACH: Multilingual Access to Large Spoken Archives@@Other@@
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Identifier:
- http://hdl.handle.net/11234/1-3189
- Subject:
- spoken corpus, speech reconstruction, speech recognition, syntax, semantics, coreference, and audio
- Type:
- text and corpus
- Description:
- The Prague Dependency Treebank of Spoken Czech 2.0 (PDTSC 2.0) is a corpus of spoken language, consisting of 742,316 tokens and 73,835 sentences, representing 7,324 minutes (over 120 hours) of spontaneous dialogs. The dialogs have been recorded, transcribed and edited in several interlinked layers: audio recordings, automatic and manual transcripts and manually reconstructed text. These layers were part of the first version of the corpus (PDTSC 1.0). Version 2.0 is extended by an automatic dependency parser at the analytical and by the manual annotation of “deep” syntax at the tectogrammatical layer, which contains semantic roles and relations as well as annotation of coreference.
- Language:
- Czech
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
http://creativecommons.org/licenses/by-nc-sa/4.0/
PUB - Relation:
- http://hdl.handle.net/11234/1-2375
- Source:
- https://ufal.mff.cuni.cz/pdtsc2.0
- Harvested from:
- LINDAT/CLARIAH-CZ repository
- Metadata only:
- false
- Date:
- 2017
The item or associated files might be "in copyright"; review the provided rights metadata:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
- http://creativecommons.org/licenses/by-nc-sa/4.0/
- PUB