The Prague Dependency Treebank of Spoken Czech 2.0 (PDTSC 2.0) is a corpus of spoken language, consisting of 742,316 tokens and 73,835 sentences, representing 7,324 minutes (over 120 hours) of spontaneous dialogs. The dialogs have been recorded, transcribed and edited in several interlinked layers: audio recordings, automatic and manual transcripts and manually reconstructed text. These layers were part of the first version of the corpus (PDTSC 1.0). Version 2.0 is extended by an automatic dependency parser at the analytical and by the manual annotation of “deep” syntax at the tectogrammatical layer, which contains semantic roles and relations as well as annotation of coreference.
Czech nouns of communication can mostly be modified by three participants, Speaker, Information and Addressee. These participants can be expressed by various forms but only some of them can be combined with each other. We search for frequencies of selected combinations of participants modifying several types of nouns of communication in subcorpora of the Czech National Corpus. We compare them with frequencies of similar combinations of participants modifying a sample of nouns of exchange and provide a quantitative analysis of them. The two semantic classes considerably differ in frequencies of combinations including Agent (Speaker of nouns of communication, Posesor 1 of nouns of exchange). While Agent is deleted in almost all occurrences of nouns of exchange, it is comparably frequent as Information in occurrences of some types of nouns of communication. We confirm our hypothesis that Agent plays an important role in valency behaviour of nouns of communication.