This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

Prague Dependency Treebank 3.5

Please use the following text to cite this item or export to a predefined format:
Hajič, Jan; et al., 2018, Prague Dependency Treebank 3.5, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-2621.
Date issued
2018-02-19
Size
115844 sentences,
1956693 tokens
Language(s)
Description
The Prague Dependency Treebank 3.5 is the 2018 edition of the core Prague Dependency Treebank (PDT). It contains all PDT annotation made at the Institute of Formal and Applied Linguistics under various projects between 1996 and 2018 on the original texts, i.e., all annotation from PDT 1.0, PDT 2.0, PDT 2.5, PDT 3.0, PDiT 1.0 and PDiT 2.0, plus corrections, new structure of basic documentation and new list of authors covering all previous editions. The Prague Dependency Treebank 3.5 (PDT 3.5) contains the same texts as the previous versions since 2.0; there are 49,431 annotated sentences (832,823 words) on all layers, from tectogrammatical annotation to syntax to morphology. There are additional annotated sentences for syntax and morphology; the totals for the lower layers of annotation are: 87,913 sentences with 1,502,976 words at the analytical layer (surface dependency syntax) and 115,844 sentences with 1,956,693 words at the morphological layer of annotation (these totals include the annotation with the higher layers annotated as well). Closely linked to the tectogrammatical layer is the annotation of sentence information structure, multiword expressions, coreference, bridging relations and discourse relations.
Acknowledgement
 Files in this item
Name
PDT3.5.tgz
Size
158.69 MB
Format
application/x-gzip
Description
PDT 3.5 full data, documentation and offline copy of web pages. After unpacking, start in the index.html file.
MD5
5d54596e7c84bfb29d599183a1dd3aee
Preview
  File Preview
    • PDT3.5.tgz207 MB
    • PDT3.5.tgz207 MB