This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

Enriched Discourse Annotation of PDiT Subset 1.0 (PDiT-EDA 1.0)

Please use the following text to cite this item or export to a predefined format:
Zikánová, Šárka; Synková, Pavlína and Mírovský, Jiří, 2018, Enriched Discourse Annotation of PDiT Subset 1.0 (PDiT-EDA 1.0), LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-2906.
Date issued
2018-12-20
Size
2592 sentences
Language(s)
Description
Enriched discourse annotation of a subset of the Prague Discourse Treebank, adding implicit relations, entity based relations, question-answer relations and other discourse structuring phenomena.
Acknowledgement
 Files in this item
Name
PDiT-EDA1.0.zip
Size
3.44 MB
Format
application/zip
Description
PDiT-EDA 1.0 data
MD5
3142fb6569639d6ce82bf5c949822b7f
Preview
  File Preview
  • PDiT-EDA1.0
    • PDiT-EDA1.0-Introduction.html7 kB
    • data
      • sport
        • mf920922_076.m.gz2 kB
        • ln94206_117.a.gz8 kB
        • mf920925_047.t.gz2 kB
        • ln94206_117.w.gz4 kB
        • mf920922_076.a.gz2 kB
        • mf920922_076.w.gz1 kB
        • mf920925_047.m.gz1 kB
        • mf920925_047.a.gz1 kB
        • mf920925_047.w.gz894 B
        • cmpr9413_004.t.gz9 kB
        • ln94202_126.t.gz21 kB
        • cmpr9413_004.m.gz4 kB
        • mf930709_083.t.gz7 kB
        • cmpr9413_004.a.gz4 kB
        • ln94202_126.m.gz9 kB
        • mf930709_075.t.gz22 kB
        • cmpr9413_004.w.gz2 kB
        • mf930709_087.t.gz14 kB
        • ln94202_126.a.gz8 kB
        • mf930709_083.m.gz3 kB
        • ln94207_54.t.gz21 kB
        • ln94202_126.w.gz4 kB
        • mf930709_075.m.gz10 kB
        • mf930709_087.m.gz7 kB
        • mf930709_083.a.gz3 kB
        • mf930709_083.w.gz1 kB
        • mf930709_075.a.gz9 kB
        • ln94206_117.t.gz19 kB
        • ln94207_54.m.gz10 kB
        • mf930709_087.a.gz6 kB
        • mf930709_075.w.gz4 kB
        • mf930709_087.w.gz3 kB
        • mf920922_076.t.gz4 kB
        • ln94207_54.a.gz9 kB
        • ln94207_54.w.gz4 kB
        • ln94206_117.m.gz9 kB
      • essay
        • ln94202_19.a.gz13 kB
        • lnd94103_063.a.gz11 kB
        • ln94207_83.t.gz21 kB
        • ln94202_19.t.gz33 kB
        • lnd94103_063.t.gz26 kB
        • ln94207_83.w.gz5 kB
        • ln94202_19.w.gz7 kB
        • lnd94103_063.w.gz5 kB
        • ln94200_170.m.gz9 kB
        • ln94200_170.a.gz8 kB
        • ln94200_170.t.gz20 kB
        • ln94200_170.w.gz4 kB
        • ln94207_83.m.gz10 kB
        • ln94202_19.m.gz15 kB
        • lnd94103_063.m.gz12 kB
        • ln94207_83.a.gz9 kB
      • letter
        • cmpr9413_011.t.gz30 kB
        • ln94203_101.m.gz8 kB
        • mf920925_091.a.gz8 kB
        • ln94203_101.t.gz17 kB
        • mf920925_061.m.gz8 kB
        • ln94205_109.a.gz6 kB
        • ln95048_057.w.gz4 kB
        • cmpr9413_011.a.gz14 kB
        • mf920925_061.t.gz18 kB
        • ln94203_101.a.gz7 kB
        • mf920925_061.a.gz7 kB
        • mf920925_091.w.gz4 kB
        • ln94205_109.w.gz3 kB
        • cmpr9413_011.w.gz6 kB
        • ln94203_101.w.gz3 kB
        • mf920925_061.w.gz3 kB
        • ln95048_057.m.gz10 kB
        • ln95048_057.t.gz21 kB
        • ln95048_057.a.gz9 kB
        • mf920925_091.m.gz8 kB
        • ln94205_109.m.gz7 kB
        • mf920925_091.t.gz17 kB
        • cmpr9413_011.m.gz15 kB
        • ln94205_109.t.gz14 kB
      • advice
        • ln95048_140.t.gz8 kB
        • ln94208_1.a.gz1 kB
        • cmpr9413_017.m.gz7 kB
        • ln94208_1.w.gz840 B
        • cmpr9413_007.m.gz8 kB
        • ln95048_002.m.gz10 kB
        • cmpr9413_027.m.gz1 kB
        • cmpr9413_017.a.gz6 kB
        • ln95048_002.a.gz9 kB
        • ln95048_140.m.gz3 kB
        • cmpr9413_007.a.gz8 kB
        • cmpr9413_027.a.gz2 kB
        • cmpr9413_017.w.gz3 kB
        • ln95048_002.w.gz4 kB
        • cmpr9413_007.w.gz4 kB
        • cmpr9413_027.w.gz1 kB
        • ln94203_145.t.gz16 kB
        • ln95048_140.a.gz3 kB
        • ln95048_140.w.gz1 kB
        • ln94203_145.m.gz7 kB
        • cmpr9410_009.t.gz7 kB
        • ln94203_145.a.gz6 kB
        • ln94203_145.w.gz3 kB
        • cmpr9410_009.m.gz3 kB
        • cmpr9410_009.a.gz3 kB
        • ln94203_59.t.gz31 kB
        • cmpr9410_009.w.gz1 kB
        • ln94203_59.m.gz14 kB
        • ln94200_36.t.gz23 kB
        • ln94203_59.a.gz13 kB
        • ln94203_59.w.gz6 kB
        • ln94208_1.t.gz3 kB
        • ln94200_36.m.gz10 kB
        • cmpr9413_017.t.gz15 kB
        • cmpr9413_007.t.gz16 kB
        • ln95048_002.t.gz22 kB
        • cmpr9413_027.t.gz4 kB
        • ln94200_36.a.gz10 kB
        • ln94208_1.m.gz1 kB
        • ln94200_36.w.gz4 kB
      • weather
        • lnd94103_129.m.gz2 kB
        • lnd94103_153.m.gz2 kB
        • lnd94103_153.t.gz4 kB
        • lnd94103_129.t.gz4 kB
        • mf930713_099.w.gz1 kB
        • mf920925_027.w.gz1 kB
        • lnd94103_129.a.gz2 kB
        • lnd94103_153.a.gz1 kB
        • mf930709_066.w.gz3 kB
        • lnd94103_129.w.gz1 kB
        • lnd94103_153.w.gz1 kB
        • mf920925_027.m.gz2 kB
        • mf930713_099.m.gz3 kB
        • mf930713_099.t.gz6 kB
        • mf920925_027.t.gz5 kB
        • mf930709_066.m.gz6 kB
        • mf930709_066.t.gz13 kB
        • mf920925_027.a.gz2 kB
        • mf930713_099.a.gz3 kB
        • mf930709_066.a.gz6 kB
      • topic_interv
        • cmpr9410_019.m.gz20 kB
        • mf920922_057.w.gz5 kB
        • mf930713_058.m.gz9 kB
        • cmpr9413_020.m.gz14 kB
        • mf920922_085.m.gz3 kB
        • cmpr9410_019.t.gz43 kB
        • mf930713_058.t.gz20 kB
        • mf920922_085.t.gz8 kB
        • cmpr9413_020.t.gz29 kB
        • mf920922_009.m.gz2 kB
        • ln94209_49.m.gz10 kB
        • mf920922_009.t.gz4 kB
        • ln94209_49.t.gz22 kB
        • mf930713_058.a.gz8 kB
        • cmpr9410_019.a.gz18 kB
        • cmpr9413_020.a.gz13 kB
        • mf920922_085.a.gz3 kB
        • mf920922_009.a.gz1 kB
        • ln94209_49.a.gz9 kB
        • mf930713_058.w.gz4 kB
        • cmpr9410_019.w.gz9 kB
        • mf920922_085.w.gz1 kB
        • cmpr9413_020.w.gz6 kB
        • mf920922_057.m.gz10 kB
        • mf920922_057.t.gz24 kB
        • mf920922_009.w.gz1 kB
        • ln94209_49.w.gz4 kB
        • mf920922_057.a.gz10 kB
      • comment
        • mf930713_110.t.gz22 kB
        • ln94202_20.t.gz25 kB
        • ln95048_056.w.gz3 kB
        • ln94209_45.a.gz8 kB
        • ln94202_20.a.gz11 kB
        • mf930713_110.a.gz9 kB
        • cmpr9413_034.w.gz3 kB
        • ln94207_16.m.gz7 kB
        • ln94207_16.t.gz14 kB
        • ln94209_45.w.gz4 kB
        • ln94207_16.a.gz6 kB
        • mf930713_110.w.gz4 kB
        • ln94202_20.w.gz5 kB
        • ln95048_056.m.gz6 kB
        • ln95048_056.t.gz14 kB
        • ln94207_16.w.gz3 kB
        • cmpr9413_034.m.gz8 kB
        • cmpr9413_034.t.gz18 kB
        • ln95048_056.a.gz6 kB
        • ln94209_45.m.gz9 kB
        • ln94209_45.t.gz18 kB
        • cmpr9413_034.a.gz7 kB
        • ln94202_20.m.gz12 kB
        • mf930713_110.m.gz9 kB
      • collection
        • ln95049_067.w.gz3 kB
        • ln95048_060.m.gz9 kB
        • ln95048_060.a.gz8 kB
        • ln95048_060.t.gz20 kB
        • ln94204_143.m.gz13 kB
        • ln95048_060.w.gz4 kB
        • ln94203_125.m.gz15 kB
        • ln94204_143.a.gz11 kB
        • ln95049_067.m.gz8 kB
        • ln94203_125.a.gz14 kB
        • ln94204_143.t.gz28 kB
        • ln94203_125.t.gz34 kB
        • ln95049_067.a.gz7 kB
        • ln95049_067.t.gz16 kB
        • ln94204_143.w.gz6 kB
        • ln94203_125.w.gz7 kB
      • survey
        • ln95049_100.m.gz6 kB
        • ln95049_100.t.gz14 kB
        • ln95046_089.a.gz4 kB
        • ln94200_84.m.gz3 kB
        • mf930713_066.a.gz3 kB
        • ln94200_84.t.gz7 kB
        • cmpr9410_029.m.gz4 kB
        • ln95049_100.a.gz6 kB
        • ln94203_43.m.gz4 kB
        • cmpr9410_029.t.gz9 kB
        • ln94203_43.t.gz9 kB
        • ln94200_84.a.gz3 kB
        • ln95046_089.w.gz2 kB
        • cmpr9410_029.a.gz4 kB
        • mf930713_066.w.gz1 kB
        • ln94203_43.a.gz3 kB
        • ln95048_124.m.gz7 kB
        • ln95047_093.m.gz8 kB
        • ln95049_100.w.gz2 kB
        • ln95048_124.t.gz16 kB
        • ln95047_093.t.gz16 kB
        • ln94200_84.w.gz1 kB
        • cmpr9410_029.w.gz2 kB
        • ln95048_124.a.gz6 kB
        • ln94203_43.w.gz2 kB
        • ln95047_093.a.gz7 kB
        • ln95046_089.m.gz5 kB
        • mf930713_066.m.gz3 kB
        • ln95046_089.t.gz11 kB
        • ln95048_124.w.gz3 kB
        • ln95047_093.w.gz3 kB
        • mf930713_066.t.gz7 kB
      • person_interv
        • ln95048_050.m.gz17 kB
        • ln95048_050.t.gz38 kB
        • ln94205_131.m.gz5 kB
        • ln94205_131.t.gz13 kB
        • ln95048_050.w.gz8 kB
        • ln95048_050.a.gz15 kB
        • ln94205_131.w.gz2 kB
        • ln94205_131.a.gz5 kB
      • invitation
        • ln94207_103.t.gz17 kB
        • mf930713_008.w.gz1 kB
        • ln95045_050.w.gz3 kB
        • ln94206_47.m.gz6 kB
        • ln94207_103.a.gz7 kB
        • ln94206_47.t.gz12 kB
        • ln94200_136.m.gz9 kB
        • ln94200_136.t.gz19 kB
        • ln94206_47.a.gz5 kB
        • ln94210_108.m.gz7 kB
        • ln94207_103.w.gz3 kB
        • ln94210_108.t.gz16 kB
        • mf920925_019.m.gz9 kB
        • ln94200_136.a.gz8 kB
        • mf920925_019.t.gz20 kB
        • mf930713_008.m.gz3 kB
        • ln95045_050.m.gz7 kB
        • mf930713_008.t.gz6 kB
        • ln95045_050.t.gz14 kB
        • ln94210_108.a.gz7 kB
        • ln94206_47.w.gz3 kB
        • mf920925_019.a.gz8 kB
        • mf930713_008.a.gz3 kB
        • ln95045_050.a.gz6 kB
        • ln94200_136.w.gz4 kB
        • ln94210_108.w.gz3 kB
        • ln94207_103.m.gz8 kB
        • mf920925_019.w.gz4 kB
      • description
        • cmpr9415_004.w.gz4 kB
        • cmpr9410_047.w.gz5 kB
        • cmpr9410_033.w.gz5 kB
        • ln94204_2.m.gz14 kB
        • cmpr9410_047.m.gz11 kB
        • cmpr9415_004.m.gz10 kB
        • ln94204_2.a.gz14 kB
        • ln94204_2.t.gz32 kB
        • cmpr9410_033.m.gz11 kB
        • cmpr9410_047.a.gz10 kB
        • cmpr9415_004.a.gz8 kB
        • cmpr9415_004.t.gz20 kB
        • cmpr9410_047.t.gz23 kB
        • cmpr9410_033.a.gz11 kB
        • cmpr9410_033.t.gz27 kB
        • ln94204_2.w.gz6 kB
      • news
        • cmpr9413_041.a.gz2 kB
        • ln94202_135.m.gz4 kB
        • cmpr9415_038.a.gz11 kB
        • cmpr9413_041.w.gz1 kB
        • cmpr9415_038.w.gz5 kB
        • lnd94103_084.m.gz2 kB
        • ln94200_112.m.gz4 kB
        • ln94202_135.a.gz4 kB
        • ln94202_135.w.gz2 kB
        • lnd94103_084.a.gz2 kB
        • ln94200_112.a.gz4 kB
        • lnd94103_084.w.gz1 kB
        • ln94200_112.w.gz2 kB
        • cmpr9410_045.t.gz24 kB
        • mf920922_133.t.gz4 kB
        • cmpr9410_045.m.gz11 kB
        • lnd94103_013.t.gz8 kB
        • lnd94103_043.t.gz5 kB
        • lnd94103_003.t.gz4 kB
        • lnd94103_033.t.gz5 kB
        • lnd94103_053.t.gz7 kB
        • cmpr9410_045.a.gz10 kB
        • cmpr9410_045.w.gz5 kB
        • mf920922_133.m.gz2 kB
        • mf920925_114.t.gz6 kB
        • mf930709_021.t.gz13 kB
        • lnd94103_013.m.gz3 kB
        • lnd94103_043.m.gz2 kB
        • lnd94103_003.m.gz2 kB
        • ln94200_167.t.gz6 kB
        • lnd94103_033.m.gz2 kB
        • lnd94103_053.m.gz3 kB
        • mf920922_133.a.gz1 kB
        • mf920922_133.w.gz1 kB
        • lnd94103_043.a.gz2 kB
        • lnd94103_013.a.gz3 kB
        • mf920925_114.m.gz3 kB
        • mf930709_021.m.gz6 kB
        • lnd94103_033.a.gz2 kB
        • lnd94103_003.a.gz1 kB
        • lnd94103_053.a.gz3 kB
        • lnd94103_043.w.gz1 kB
        • lnd94103_013.w.gz1 kB
        • lnd94103_003.w.gz1 kB
        • lnd94103_033.w.gz1 kB
        • lnd94103_053.w.gz1 kB
        • ln94200_167.m.gz3 kB
        • mf930709_021.a.gz5 kB
        • mf920925_114.a.gz3 kB
        • mf920925_114.w.gz1 kB
        • mf930709_021.w.gz3 kB
        • ln94200_167.a.gz2 kB
        • ln94200_167.w.gz1 kB
        • cmpr9413_041.t.gz5 kB
        • cmpr9415_038.t.gz26 kB
        • ln94202_135.t.gz9 kB
        • lnd94103_084.t.gz5 kB
        • cmpr9413_041.m.gz2 kB
        • ln94200_112.t.gz10 kB
        • cmpr9415_038.m.gz11 kB
      • review
        • mf920922_138.m.gz2 kB
        • mf920922_138.t.gz5 kB
        • mf930713_055.a.gz5 kB
        • mf930713_013.a.gz4 kB
        • ln95048_034.a.gz11 kB
        • ln95048_036.m.gz15 kB
        • mf920922_138.a.gz2 kB
        • ln95048_036.t.gz32 kB
        • mf920922_105.m.gz7 kB
        • mf930713_055.w.gz3 kB
        • mf920922_105.t.gz15 kB
        • mf930713_013.w.gz2 kB
        • mf920925_018.m.gz4 kB
        • mf920925_018.t.gz9 kB
        • ln95048_036.a.gz14 kB
        • ln95048_034.w.gz6 kB
        • mf920922_138.w.gz1 kB
        • mf920922_105.a.gz6 kB
        • mf920925_018.a.gz4 kB
        • ln95048_036.w.gz7 kB
        • mf920922_105.w.gz3 kB
        • mf920925_018.w.gz2 kB
        • mf930713_055.m.gz6 kB
        • mf930713_013.m.gz4 kB
        • mf930713_055.t.gz14 kB
        • mf930713_013.t.gz9 kB
        • ln95048_034.m.gz12 kB
        • ln95048_034.t.gz27 kB
      • overview
        • ln95047_134.m.gz4 kB
        • ln94209_38.a.gz6 kB
        • ln95047_134.t.gz9 kB
        • ln94210_147.w.gz3 kB
        • cmpr9413_016.w.gz4 kB
        • ln94206_85.w.gz5 kB
        • cmpr9415_029.m.gz7 kB
        • ln95047_134.a.gz4 kB
        • cmpr9415_029.t.gz13 kB
        • ln94209_38.w.gz3 kB
        • cmpr9415_029.a.gz8 kB
        • ln95047_134.w.gz2 kB
        • ln94210_147.m.gz7 kB
        • ln94210_147.t.gz15 kB
        • cmpr9413_016.m.gz9 kB
        • ln94206_85.m.gz12 kB
        • cmpr9413_016.t.gz15 kB
        • cmpr9415_029.w.gz4 kB
        • ln94206_85.t.gz28 kB
        • ln94210_147.a.gz6 kB
        • ln94209_38.m.gz7 kB
        • cmpr9413_016.a.gz9 kB
        • ln94206_85.a.gz11 kB
        • ln94209_38.t.gz13 kB
      • impl_all.fl2 kB
    • resources
      • wdata_30_schema.xml2 kB
      • adata_30_schema.xml3 kB
      • tdata_evald_discourse_schema.xml13 kB
      • mdata_30_schema.xml2 kB
Name
PDiT-EDA1.0-Introduction.html
Size
7.95 KB
Format
text/html
Description
README
MD5
d47ed0e6e4085e317f4079e3557c4e0e
Preview
  File Preview
    Enriched Discourse Annotation of PDiT Subset 1.0 (PDiT-EDA 1.0) Introduction

    Enriched Discourse Annotation of PDiT Subset 1.0 (PDiT-EDA 1.0)

    Introduction

    PDiT-EDA 1.0 (Zikánová et al., 2018) is a treebank with rich annotation of discourse phenomena developed (2017 – 2018) within the project Implicitní vztahy v textové koherenci (Implicit relations in text coherence), i.e. project GA17-03461S of the Grant Agency of the Czech Republic.

    The corpus contains extended annotation of discourse relations of a subset of the Prague Discourse Treebank 2.0 (Rysová et al., 2016), a large corpus annotated manually with explicit discourse relations, and newly adds implicit relations, entity based relations, question-answer relations and other discourse structuring phenomena.

    PDiT-EDA 1.0 was published in December 20, 2018 in the Lindat/Clarin repository.

    Data, License and Availability

    PDiT-EDA 1.0 can be downloaded as a single zip archive from the LINDAT-Clarin repository. It is publicly available under the Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.

    After unzipping the downloaded archive, the data can be found in the directory data, where they are further divided into fifteen subdirectories representing individual genres (advice column, collection, comment, critical review, description, invitation, letters from readers, news report, overview, personality-focused interview, readers‘ survey, reflective essay, sports news, topical interview, weather forecast). Annotation of each document is captured in four interlinked files, in accordance with the layer of annotation: word layer (files *.w.gz), morphological layer (*.m.gz), analytical layer (*.a.gz), and tectogrammatical layer(*.t.gz).

    The data are stored in the Prague Markup Language format (PML, Pajas and Štěpánek 2008), which is an XML based format for linguistic annotations (esp. treebanks). For the sake of completeness, PML schemata of the files can be found in the directory resources (the schemata are XML files that describe the structure of the annotated files).

    How to browse the data

    Tree editor TrEd (Pajas and Štěpánek 2008) can be used to open, browse and modify the data. The editor can be downloaded for various platforms from its home page. Please follow installation instructions specified at the page for your operating system.

    After the installation, a few extensions need to be installed:

    1. Start TrEd.
    2. In the top menu, select Setup -> Manage Extensions...; a dialog window with a list of installed extensions appears.
    3. Click on the button "Get New Extensions"; a dialog window with a list of available (not yet installed) extensions appears.
    4. Make sure that at least extensions "Discourse Annotation (discourse)" and "Prague Dependency Treebank 3.0 (pdt30)" are checked to install (if they are not in the list, they may have already been installed).
    5. Click on the button "Install Selected"; the selected extensions (and some dependencies) get installed.
    6. Close all TrEd windows including the main application window and start TrEd again.

    Now, TrEd is able to open the data of PDiT-EDA 1.0. To see the annotation of a document on the tectogrammatical layer, open the respective file with extension .t.gz, and switch Mode: (top right corner) to PML_T_Discourse.

    In case of troubles with the installation of TrEd or with browsing the data, please contact the authors at (tred at ufal.mff.cuni.cz).

    How to cite

    If you use the corpus data or for whatever other reason wish to refer to the data, please cite the publication of the data:

    Šárka Zikánová, Pavlína Synková, Jiří Mírovský: Enriched Discourse Annotation of PDiT Subset 1.0 (PDiT-EDA 1.0). Data/software, Charles University, Prague, Czech Republic, http://hdl.handle.net/11234/1-2906, Dec 2018

    More Information

    For documentation and more information about PDiT-EDA 1.0, please go to the PDiT-EDA 1.0 home page.

    References

    Petr Pajas and Jan Štěpánek: Recent Advances in a Feature-Rich Framework for Treebank Annotation. In: The 22nd International Conference on Computational Linguistics - Proceedings of the Conference, The Coling 2008 Organizing Committee, Manchester, UK, ISBN 978-1-905593-45-3, pp. 673-680, 2008.

    Magdaléna Rysová, Pavlína Synková, Jiří Mírovský, Eva Hajičová, Anna Nedoluzhko, Radek Ocelák, Jiří Pergler, Lucie Poláková, Veronika Pavlíková, Jana Zdeňková, Šárka Zikánová: Prague Discourse Treebank 2.0. Data/software, ÚFAL MFF UK, Prague, Czech Republic, http://hdl.handle.net/11234/1-1905, Dec 2016

    Šárka Zikánová, Pavlína Synková, Jiří Mírovský: Enriched Discourse Annotation of PDiT Subset 1.0 (PDiT-EDA 1.0). Data/software, Charles University, Prague, Czech Republic, http://hdl.handle.net/11234/1-2906, Dec 2018