This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.
 

Czech RST Discourse Treebank 1.0

Please use the following text to cite this item or export to a predefined format:
Poláková, Lucie; Zikánová, Šárka; Mírovský, Jiří and Hajičová, Eva, 2023, Czech RST Discourse Treebank 1.0, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-5174.
Date issued
2023-06-30
Size
54 articles,
901 sentences,
14514 tokens
Language(s)
Description
The Czech RST Discourse Treebank 1.0 (CzRST-DT 1.0) is a dataset of 54 Czech journalistic texts manually annotated using the Rhetorical Structure Theory (RST). Each text document in the treebank is represented as a single tree-like structure, the nodes (discourse units) are interconnected through hierarchical rhetorical relations. The dataset also contains concurrent annotations of five double-annotated documents. The original texts are a part of the data annotated in the Prague Dependency Treebank, although the two projects are independent.
Acknowledgement
 Files in this item
Name
README.TXT
Size
3.9 KB
Format
text/plain
Description
Text
MD5
04ec03e6206fb2ba96141b2c1967eabc
Preview
  File Preview
    ===============================================
    Czech RST Discourse Treebank 1.0 (CzRST-DT 1.0)
    ===============================================
    
    
    Authors
    =======
    Lucie Poláková (Charles University, Faculty of Mathematics and Physics),
    Šárka Zikánová (Charles University, Faculty of Mathematics and Physics),
    Jiří Mírovský (Charles University, Faculty of Mathematics and Physics)
    Eva Hajičová (Charles University, Faculty of Mathematics and Physics),
    
    
    Introduction
    ============
    
    The Czech RST Discourse Treebank 1.0 (CzRST-DT 1.0, Poláková et al., 2023)
    is a dataset of 54 Czech journalistic texts manually annotated using
    the Rhetorical Structure Theory (RST; Mann and Thompson, 1988).
    Each text document in the treebank is represented as a single tree-like
    structure, the nodes (discourse units) are interconnected through
    hierarchical rhetorical relations.
    
    The dataset also contains concurrent annotations of five double-annotated
    documents.
    
    The original texts are a part of the data annotated in the Prague Dependency
    Treebank (Hajič et al., 2020), although the two projects are independent.
    
    Please visit https://ufal.mff.cuni.cz/czrst-dt1.0 for detailed and
    updated information about the corpus.
    
    
    Data Format
    ===========
    
    The data can be found in directory data in the
    following subdirectories:
    
    TXT - original texts
    RS3 - RST annotations of the texts in RS3 format
    IAA - double annotated documents in two versions:
          - pre-curated (note: the curated version is in directory RS3)
          - pre-curated and modified to one tree (for IAA measurement)
    
    
    How to get and browse the data
    ==============================
    
    The data can be downloaded from the LINDAT/CLARIAH-CZ
    repository: http://hdl.handle.net/11234/1-5174,
    see the licence below.
    
    The data can be opened using the RSTWeb annotation
    tool (Gessler et al., 2019):
    https://gucorpling.org/rstweb/info/
    
    
    Citation
    ========
    
    Please cite CzRST-DT 1.0 when using the corpus for your research:
    
    Lucie Poláková, Šárka Zikánová, Jiří M . . .
Name
CzRST-DT_1.0.zip
Size
203.38 KB
Format
application/zip
Description
Zip
MD5
93b2a2beab1ff13f7dd652fa5de74bfb
Preview
  File Preview
  • CzRST-DT_1.0
    • README.TXT3 kB
    • data
      • IAA
        • edited_to_one_tree
          • ANNOT2
            • ln94203_145_one_tree.rs37 kB
            • mf930713_055_one_tree.rs36 kB
            • ln94203_43_one_tree.rs34 kB
            • ln94202_135_one_tree.rs34 kB
            • cmpr9415_032_one_tree.rs34 kB
          • ANNOT1
            • ln94203_145_one_tree.rs37 kB
            • mf930713_055_one_tree.rs36 kB
            • ln94203_43_one_tree.rs34 kB
            • ln94202_135_one_tree.rs34 kB
            • cmpr9415_032_one_tree.rs34 kB
        • original_annotations
          • ANNOT2
            • mf930713_055.rs36 kB
            • cmpr9415_032.rs34 kB
            • ln94203_43.rs34 kB
            • ln94203_145.rs37 kB
            • ln94202_135.rs34 kB
          • ANNOT1
            • mf930713_055.rs36 kB
            • cmpr9415_032.rs34 kB
            • ln94203_43.rs34 kB
            • ln94203_145.rs37 kB
            • ln94202_135.rs34 kB
      • RS3
        • ln94207_39.rs33 kB
        • mf920925_021.rs34 kB
        • lnd94103_003.rs32 kB
        • cmpr9413_017.rs37 kB
        • lnd94103_063.rs311 kB
        • ln95049_086.rs35 kB
        • ln95048_056.rs38 kB
        • ln94202_49.rs33 kB
        • ln94200_8.rs32 kB
        • mf930713_099.rs35 kB
        • ln94207_83.rs311 kB
        • mf930713_055.rs36 kB
        • ln95047_134.rs35 kB
        • ln94200_112.rs35 kB
        • ln95048_140.rs34 kB
        • ln94203_145.rs37 kB
        • ln94202_135.rs34 kB
        • mf920922_138.rs33 kB
        • cmpr9415_032.rs34 kB
        • cmpr9410_047.rs312 kB
        • ln94200_84.rs33 kB
        • ln95048_055.rs34 kB
        • ln94207_54.rs311 kB
        • cmpr9413_004.rs34 kB
        • lnd94103_129.rs33 kB
        • ln94200_167.rs33 kB
        • mf920925_087.rs34 kB
        • mf930713_110.rs311 kB
        • lnd94103_013.rs34 kB
        • lnd94103_145.rs36 kB
        • lnd94103_033.rs33 kB
        • ln95048_058.rs35 kB
        • mf930709_087.rs38 kB
        • mf920925_018.rs35 kB
        • mf920922_105.rs39 kB
        • ln94203_100.rs35 kB
        • lnd94103_053.rs33 kB
        • ln95049_100.rs38 kB
        • ln94210_147.rs37 kB
        • mf930709_083.rs33 kB
        • mf920925_114.rs34 kB
        • ln95049_019.rs33 kB
        • ln95048_122.rs33 kB
        • mf930713_013.rs34 kB
        • mf920922_133.rs33 kB
        • ln94209_45.rs39 kB
        • mf930709_058.rs34 kB
        • ln94207_16.rs38 kB
        • ln94203_43.rs34 kB
        • ln94200_170.rs311 kB
        • cmpr9413_026.rs33 kB
        • ln94208_143.rs33 kB
        • cmpr9413_034.rs38 kB
        • ln94206_47.rs37 kB
      • TXT
        • lnd94103_013.txt1 kB
        • lnd94103_145.txt1 kB
        • lnd94103_033.txt733 B
        • mf930709_087.txt2 kB
        • ln95048_058.txt1 kB
        • ln94203_100.txt1 kB
        • mf920925_018.txt1 kB
        • mf920922_105.txt2 kB
        • lnd94103_053.txt1008 B
        • ln95049_100.txt1 kB
        • ln94210_147.txt2 kB
        • mf930709_083.txt944 B
        • mf920925_114.txt997 B
        • ln95048_122.txt622 B
        • ln95049_019.txt821 B
        • mf920922_133.txt595 B
        • mf930713_013.txt1 kB
        • ln94209_45.txt3 kB
        • mf930709_058.txt883 B
        • ln94207_16.txt2 kB
        • ln94203_43.txt1 kB
        • ln94200_170.txt3 kB
        • cmpr9413_026.txt709 B
        • ln94208_143.txt1 kB
        • cmpr9413_034.txt2 kB
        • ln94206_47.txt1 kB
        • ln94207_39.txt940 B
        • lnd94103_003.txt684 B
        • mf920925_021.txt1 kB
        • lnd94103_063.txt4 kB
        • cmpr9413_017.txt2 kB
        • ln95049_086.txt1 kB
        • ln94202_49.txt909 B
        • ln95048_056.txt2 kB
        • ln94200_8.txt735 B
        • mf930713_099.txt967 B
        • ln94207_83.txt3 kB
        • mf930713_055.txt2 kB
        • ln95047_134.txt1 kB
        • ln94200_112.txt1 kB
        • ln95048_140.txt1 kB
        • ln94203_145.txt2 kB
        • ln94202_135.txt1 kB
        • mf920922_138.txt801 B
        • cmpr9415_032.txt1 kB
        • cmpr9410_047.txt4 kB
        • ln94200_84.txt1 kB
        • ln94207_54.txt3 kB
        • ln95048_055.txt960 B
        • cmpr9413_004.txt1 kB
        • lnd94103_129.txt568 B
        • mf920925_087.txt1 kB
        • ln94200_167.txt1 kB
        • mf930713_110.txt3 kB