Annotation of extended textual coreference and bridging relations is a project related to the Prague Dependency Treebank 2.0 (PDT). It represents a new layer of manual annotation, above the existing layers of the PDT (morphology, surface syntax and underlying syntax) and it portrays linguistic phenomena from the perspective of the text structure. The annotation is a continuation of the annotation of grammatical and pronominal textual coreference that was completed for PDT 2.0 in 2003. The present project reflects two phenomena:
In accordance with the pronominal textual coreference annotation in PDT, the annotation has been performed directly on the syntactic trees.
Detailed information about the annotation can be found in the technical report.
The data consist of 49,431 manually annotated sentences from Czech newspapers (3,165 documents); detailed information about the original PDT data can be found in PDT Guide. 90% of the data have been annotated by one annotator only; 10% of the data have been annotated by two annotators in parallel, with discrepancies solved by a third annotator (directory dtest
). The data are divided into ten directories (train-1
... train-8
, dtest
, etest
). Annotation of each document is captured in four interlinked files, in accordance with the layer of annotation: word layer (files *.w.gz
), morphological layer (*.m.gz
), analytical layer (*.a.gz
), and tectogrammatical layer(*.t.gz
); the annotation of extended textual coreference and bridging relations is a part of *.t.gz
files.
Tree editor TrEd is used to open and browse the data. The editor can be downloaded for various platforms from its home page. Please follow the installation instructions specified at the page.
After the installation, a few extensions need to be installed:
t.gz
.
The Development of Extended Textual Coreference and Bridging Relations in PDT 2.0 was supported by the following organizations and projects:
This work is licensed under a Creative
Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
© 2011 Institute of Formal and Applied Linguistics, Charles University in Prague.