ParCzech PS7 2.0 is a corpus (collection) of stenographic protocols that record the Chamber of Deputies' (PS) meetings held in the 7th term between 2013-2017. The audio recordings are available as well. The corpus is automatically enriched with the morphological and named-entity annotations using the procedures UDPipe 2 and NameTag 2, resp.
To make the corpus accessible in a more user friendly way than the Parliament publishes the protocols, we use the web-based platform TEITOK that enables to (1) browse the corpus (see Browse in the menu on the left) and (2) search it using the CQL and KonText tools (see
CQL Search and
Search in KonText, resp.). The corpus is downloadable from LINDAT/CLARIAH-CZ, see
The menu border is blue that visualizes the latest version of the corpus while gray visualizes the previous versions, see ParCzech PS7 1.0. In the future, we will use one more color, namely the red one, to distinguish live corpora from stable corpora.
Difference from the previous version
- document pagination is identitical to the one in the original sources
- it links original sources
- it includes url links to voting results (e.g. search for the link Hlasování pořadové číslo 25 in this document)
- it includes url links to ”parliamentary prints”, i.e. documents submitted to the parliament for discussion and vote, etc.
- it includes links to personal web sites at vlada.cz or psp.cz
- it is morphologically annotated by another tool (1.0 by MorphoDita, 2.0 by UDPipe 2)
- it is syntactically annotated by UDPipe 2 (not displayed in TEITOK yet)
- the original audio files are available (in 1.0, an audio file contain all the speeches made in a particular sitting)
The following terms in the parliamentary procedures are relevant for browsing: during a term (volební období), there are meetings (schůze) which are a group of sittings (projednávání) and which typically take place in more than one day. Each meeting has its own agenda and an agenda item (bod schůze) is discussed in speeches (promluvy) that can be made at more than one sitting.
The documents (= protocols) are labeled in a way that describes the hierarchy of terms, meetings, sittings, agenda item ids in a given sitting and agenda items. All meetings are numbered from
001 onwards for each term, sittings from
01 onwards for each meeting, agenda item ids from
001 onwards for each sitting and agenda items from
001 onwards for each meeting. For illustration, eight different agenda items were discussed at the sitting on the 24th of October 2014 in order 000, 060, 019 etc.
TEITOK uses the Corpus Query Processor to query corpora in the CQP query language.
Search in KonText
tt- data for TIETOK
ann- annotated TEI file
This work has been using language resources and tools developed and/or stored and/or distributed by the LINDAT/CLARIAH-CZ project of the Ministry of Education, Youth and Sports of the Czech Republic (project LM2018101).