ParCzech PS7 1.0 is a corpus (collection) of stenographic protocols that record the Chamber of Deputies' (PS) meetings held in the 7th term between 2013-2017. The audio recordings are available as well. The corpus is automatically enriched with the morphological and named-entity annotations using the procedures MorphoDita and NameTag, resp.
To make the corpus accessible in a more user friendly way than the Parliament publishes the protocols, we use the web-based platform TEITOK that enables to (1) browse the corpus (see
Browse in the menu on the left) and (2) search it using the CQL and KonText tools (see
CQL Search and
Search in KonText, resp.). The corpus is downloadable from the LINDAT/CLARIAH-CZ repository, see
The menu border is gray that visualizes the older versions of the corpus while blue visualizes the latest version, see ParCzech PS7 2.0. In the future, we will use one more color, namely the red one, to distinguish live corpora from stable corpora.
The following terms in the parliamentary procedures are relevant for browsing: during a term (volební období), there are meetings (schůze) which are a group of sittings (projednávání) and which typically take place in more than one day. Each meeting has its own agenda and an agenda item (bod schůze) is discussed in speeches (promluvy) that can be made at more than one sitting.
The documents (= protocols) are labeled in a way that describes the hierarchy of terms, meetings, sittings, and agenda items. All meetings are numbered from
001 onwards for each term, sittings from
01 onwards for each meeting, agenda items from
001 onwards for each meeting. It means that one document in the corpus corresponds to one agenda item. For illustration, the document
2013-001-01-005 is the protocol of speeches on the ﬁfth agenda item (
005) made in the ﬁrst sitting (
01) of the ﬁrst meeting (
001) of the term that started in 2013 (
2013). The document
2013-001-01-003b.u is the protocol of speeches on the third agenda item made in multiple parts and
b stands for the second part; the sufﬁx
u stands for an unauthorized version. Users can browse the documents over the five different options (sitting date, meeting, term, agenda item, authorized). For example, when browsing over the sitting date users can see that four items of the fifth meeting in the term 2013–2017 were on the agenda on 21 January 2014: 2013-005-01-003, 2013-005-01-002, 2013-005-01-001, 2013-005-01-000.
TEITOK uses the Corpus Query Processor to query corpora in the CQP query language.
Search in KonText
This work has been using language resources and tools developed and/or stored and/or distributed by the LINDAT/CLARIAH-CZ project of the Ministry of Education, Youth and Sports of the Czech Republic (project LM2018101).