dc.contributor.author | Lapshinova-Koltunski, Ekaterina |
dc.contributor.author | Hardmeier, Christian |
dc.contributor.author | Krielke, Pauline |
dc.date.accessioned | 2018-05-08T12:07:36Z |
dc.date.available | 2018-05-08T12:07:36Z |
dc.date.issued | 2018-05-08 |
dc.identifier.uri | http://hdl.handle.net/11372/LRT-2614 |
dc.description | ParCorFull is a parallel corpus annotated with full coreference chains that has been created to address an important problem that machine translation and other multilingual natural language processing (NLP) technologies face -- translation of coreference across languages. Our corpus contains parallel texts for the language pair English-German, two major European languages. Despite being typologically very close, these languages still have systemic differences in the realisation of coreference, and thus pose problems for multilingual coreference resolution and machine translation. Our parallel corpus covers the genres of planned speech (public lectures) and newswire. It is richly annotated for coreference in both languages, including annotation of both nominal coreference and reference to antecedents expressed as clauses, sentences and verb phrases. This resource supports research in the areas of natural language processing, contrastive linguistics and translation studies on the mechanisms involved in coreference translation in order to develop a better understanding of the phenomenon. |
dc.language.iso | eng |
dc.language.iso | deu |
dc.publisher | Universität des Saarlandes |
dc.publisher | Uppsala University |
dc.relation.isreferencedby | http://www.lrec-conf.org/proceedings/lrec2018/summaries/941.html |
dc.rights | Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
dc.source.uri | https://github.com/chardmeier/parcor-full |
dc.subject | parallel corpus |
dc.subject | annotated corpus |
dc.subject | coreference |
dc.subject | anaphora resolution |
dc.title | ParCorFull: A Parallel Corpus Annotated with Full Coreference |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
dc.rights.label | PUB |
has.files | yes |
branding | LRT + Open Submissions |
contact.person | Christian Hardmeier christian.hardmeier@lingfil.uu.se Uppsala University |
sponsor | European Association for Machine Translation (EAMT) 2017 EAMT Sponsorship of Activities Other |
sponsor | Swedish Research Council 2017-930 Neural Pronoun Models for Machine Translation nationalFunds |
size.info | 158919 tokens |
files.size | 2852575 |
files.count | 4 |
Files in this item
Download all files in item (2.72 MB)This item is
Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
- Name
- Guidelines.pdf
- Size
- 247.65 KB
- Format
- Description
- Annotation guidelines
- MD5
- e5e5f4620989037b523b6f0901211785
- Name
- ParCorFull-1.0.tar.gz
- Size
- 2.17 MB
- Format
- application/x-gzip
- Description
- Annotated corpus data
- MD5
- 49cc8a0998b8ca9c235ce2435c7a9b5b
- ParCorFull-1.0
- TED
- DE
- common_paths.xml763 B
- 007_783.mmax151 B
- 010_837.mmax151 B
- Styles
- default_style.xsl1 kB
- just_text.xsl1 kB
- with_handles.xsl1 kB
- muc_style.xsl5 kB
- generic_nongui_style.xsl690 B
- 006_785.mmax151 B
- 008_824.mmax151 B
- Source
- 007_783.txt.de6 kB
- 010_837.txt.de6 kB
- 004_767.txt.de25 kB
- 001_769.txt.de18 kB
- 007_783.tok.de6 kB
- 010_837.tok.de6 kB
- 004_767.tok.de26 kB
- 005_790.txt.de14 kB
- 002_792.txt.de8 kB
- 001_769.tok.de19 kB
- 005_790.tok.de15 kB
- 002_792.tok.de9 kB
- 003_799.txt.de21 kB
- 009_805.txt.de5 kB
- 003_799.tok.de22 kB
- 009_805.tok.de5 kB
- README144 B
- 008_824.txt.de17 kB
- 008_824.tok.de18 kB
- 000_779.txt.de19 kB
- 006_785.txt.de16 kB
- 006_785.tok.de17 kB
- 000_779.tok.de19 kB
- 002_792.mmax151 B
- 000_779.mmax151 B
- 001_769.mmax151 B
- Schemes
- coref_scheme.xml8 kB
- sentence_scheme.xml585 B
- 005_790.mmax151 B
- Basedata
- 010_837_words.xml37 kB
- words.dtd83 B
- 007_783_words.xml37 kB
- 000_779_words.xml116 kB
- 003_799_words.xml129 kB
- 005_790_words.xml94 kB
- 006_785_words.xml99 kB
- 004_767_words.xml156 kB
- 008_824_words.xml114 kB
- 002_792_words.xml56 kB
- 001_769_words.xml112 kB
- 004_767.mmax151 B
- 003_799.mmax151 B
- Customizations
- sentence_customization.xml73 B
- coref_customization.xml1 kB
- Markables
- 007_783_sentence_level.xml5 kB
- 005_790_sentence_level.xml19 kB
- 008_824_coref_level.xml81 kB
- 005_790_coref_level.xml68 kB
- markables.dtd69 B
- 003_799_coref_level.xml59 kB
- 002_792_coref_level.xml33 kB
- 001_769_coref_level.xml64 kB
- 006_785_coref_level.xml46 kB
- 001_769_sentence_level.xml13 kB
- 010_837_sentence_level.xml7 kB
- 000_779_coref_level.xml86 kB
- 010_837_coref_level.xml24 kB
- 007_783_coref_level.xml18 kB
- 004_767_sentence_level.xml25 kB
- 004_767_coref_level.xml91 kB
- 002_792_sentence_level.xml10 kB
- 003_799_sentence_level.xml23 kB
- 008_824_sentence_level.xml23 kB
- 000_779_sentence_level.xml17 kB
- 006_785_sentence_level.xml17 kB
- EN
- common_paths.xml763 B
- 007_783.mmax151 B
- 010_837.mmax151 B
- Styles
- default_style.xsl1 kB
- just_text.xsl1 kB
- with_handles.xsl1 kB
- muc_style.xsl5 kB
- generic_nongui_style.xsl690 B
- 006_785.mmax151 B
- 008_824.mmax151 B
- Source
- 008_824.txt15 kB
- 004_767.txt24 kB
- 002_792.tok7 kB
- 006_785.txt14 kB
- 000_779.tok17 kB
- 008_824.tok16 kB
- 003_799.txt20 kB
- 006_785.tok15 kB
- 004_767.tok24 kB
- 007_783.txt5 kB
- 003_799.tok20 kB
- 001_769.txt15 kB
- 007_783.tok5 kB
- 001_769.tok16 kB
- README143 B
- 005_790.txt12 kB
- 010_837.txt6 kB
- 005_790.tok13 kB
- 010_837.tok6 kB
- 009_805.txt4 kB
- 002_792.txt7 kB
- 000_779.txt16 kB
- 009_805.tok5 kB
- 002_792.mmax151 B
- 000_779.mmax151 B
- 001_769.mmax151 B
- Schemes
- coref_scheme.xml8 kB
- sentence_scheme.xml585 B
- 005_790.mmax151 B
- Basedata
- 010_837_words.xml41 kB
- words.dtd83 B
- 007_783_words.xml39 kB
- 000_779_words.xml113 kB
- 003_799_words.xml138 kB
- 005_790_words.xml94 kB
- 006_785_words.xml103 kB
- 004_767_words.xml170 kB
- 008_824_words.xml115 kB
- 002_792_words.xml55 kB
- 001_769_words.xml108 kB
- 004_767.mmax151 B
- 003_799.mmax151 B
- Customizations
- sentence_customization.xml73 B
- coref_customization.xml1 kB
- Markables
- 007_783_sentence_level.xml5 kB
- 005_790_sentence_level.xml18 kB
- 008_824_coref_level.xml77 kB
- 005_790_coref_level.xml65 kB
- markables.dtd69 B
- 003_799_coref_level.xml68 kB
- 002_792_coref_level.xml33 kB
- 001_769_coref_level.xml53 kB
- 006_785_coref_level.xml54 kB
- 001_769_sentence_level.xml12 kB
- 010_837_sentence_level.xml6 kB
- 000_779_coref_level.xml65 kB
- 010_837_coref_level.xml27 kB
- 007_783_coref_level.xml19 kB
- 004_767_sentence_level.xml22 kB
- 004_767_coref_level.xml94 kB
- 002_792_sentence_level.xml8 kB
- 003_799_sentence_level.xml21 kB
- 008_824_sentence_level.xml20 kB
- 000_779_sentence_level.xml16 kB
- 006_785_sentence_level.xml14 kB
- DE
- LREC2018.pdf314 kB
- .README.txt.swp12 kB
- ._LREC2018.pdf424 B
- .!42723!Guidelines.pdf9 B
- Guidelines.pdf247 kB
- README.txt3 kB
- news
- DE
- 22.mmax146 B
- 05.mmax146 B
- Basedata
- 20_words.xml13 kB
- 19_words.xml13 kB
- 23_words.xml21 kB
- 05_words.xml24 kB
- 21_words.xml27 kB
- 24_words.xml17 kB
- 03_words.xml12 kB
- 22_words.xml13 kB
- 25_words.xml14 kB
- 13_words.xml13 kB
- 16_words.xml16 kB
- 07_words.xml29 kB
- words.dtd83 B
- 17_words.xml21 kB
- 08_words.xml15 kB
- 18_words.xml22 kB
- 10_words.xml18 kB
- 09_words.xml15 kB
- 01_words.xml12 kB
- 04_words.xml17 kB
- 16.mmax146 B
- 23.mmax146 B
- Schemes
- coref_scheme.xml8 kB
- sentence_scheme.xml585 B
- _scheme.xml78 B
- 24.mmax146 B
- 07.mmax146 B
- Styles
- default_style.xsl1 kB
- just_text.xsl1 kB
- with_handles.xsl1 kB
- muc_style.xsl5 kB
- generic_nongui_style.xsl690 B
- common_paths.xml763 B
- 10.mmax146 B
- 25.mmax146 B
- 08.mmax146 B
- 01.mmax146 B
- 09.mmax146 B
- Source
- 21.de.xml5 kB
- 10.de.xml4 kB
- 24.de.xml3 kB
- 09.de.xml3 kB
- 22.de.xml3 kB
- 04.de.xml4 kB
- 18.de.xml4 kB
- 13.de.xml2 kB
- 07.de.xml6 kB
- 16.de.xml3 kB
- 25.de.xml3 kB
- 20.de.xml3 kB
- 23.de.xml4 kB
- 05.de.xml5 kB
- 19.de.xml2 kB
- 03.de.xml2 kB
- 17.de.xml4 kB
- 08.de.xml3 kB
- 01.de.xml2 kB
- Customizations
- sentence_customization.xml73 B
- coref_customization.xml1 kB
- 17.mmax146 B
- 13.mmax146 B
- 20.mmax146 B
- 03.mmax146 B
- 18.mmax146 B
- Markables
- 18_coref_level.xml14 kB
- 19_sentence_level.xml1 kB
- 25_sentence_level.xml1 kB
- 10_sentence_level.xml2 kB
- 13_coref_level.xml5 kB
- 04_sentence_level.xml2 kB
- 01_coref_level.xml8 kB
- 25_coref_level.xml7 kB
- 20_coref_level.xml9 kB
- 18_sentence_level.xml3 kB
- 24_sentence_level.xml1 kB
- 03_sentence_level.xml2 kB
- 05_coref_level.xml14 kB
- 17_coref_level.xml9 kB
- 23_sentence_level.xml2 kB
- 17_sentence_level.xml2 kB
- 24_coref_level.xml5 kB
- 16_sentence_level.xml3 kB
- 22_sentence_level.xml2 kB
- 01_sentence_level.xml1 kB
- 09_coref_level.xml13 kB
- 04_coref_level.xml11 kB
- 16_coref_level.xml4 kB
- 21_sentence_level.xml3 kB
- 09_sentence_level.xml2 kB
- 23_coref_level.xml10 kB
- 08_coref_level.xml5 kB
- 08_sentence_level.xml2 kB
- 20_sentence_level.xml2 kB
- 03_coref_level.xml6 kB
- 10_coref_level.xml12 kB
- 22_coref_level.xml8 kB
- 13_sentence_level.xml1 kB
- 07_sentence_level.xml4 kB
- 07_coref_level.xml12 kB
- 19_coref_level.xml4 kB
- 05_sentence_level.xml4 kB
- markables.dtd69 B
- 21_coref_level.xml12 kB
- 21.mmax146 B
- 04.mmax146 B
- 19.mmax146 B
- EN
- 22.mmax146 B
- 05.mmax146 B
- Basedata
- 20_words.xml13 kB
- 19_words.xml12 kB
- 23_words.xml21 kB
- 05_words.xml27 kB
- 21_words.xml24 kB
- 24_words.xml16 kB
- 03_words.xml14 kB
- 22_words.xml12 kB
- 25_words.xml14 kB
- 13_words.xml13 kB
- 16_words.xml19 kB
- 07_words.xml27 kB
- words.dtd83 B
- 17_words.xml19 kB
- 08_words.xml15 kB
- 18_words.xml21 kB
- 10_words.xml19 kB
- 09_words.xml15 kB
- 01_words.xml11 kB
- 04_words.xml19 kB
- 16.mmax146 B
- 23.mmax146 B
- Schemes
- coref_scheme.xml8 kB
- sentence_scheme.xml585 B
- 24.mmax146 B
- 07.mmax146 B
- Styles
- default_style.xsl633 B
- just_text.xsl1 kB
- with_handles.xsl1 kB
- muc_style.xsl5 kB
- generic_nongui_style.xsl690 B
- common_paths.xml763 B
- 10.mmax146 B
- 25.mmax146 B
- 08.mmax146 B
- 01.mmax146 B
- 09.mmax146 B
- Source
- 25.en.xml2 kB
- 20.en.xml2 kB
- 19.en.xml2 kB
- 23.en.xml4 kB
- 05.en.xml5 kB
- common_paths.xml763 B
- 08.en.xml2 kB
- 03.en.xml2 kB
- 17.en.xml3 kB
- 01.en.xml2 kB
- 21.en.xml4 kB
- 10.en.xml3 kB
- 24.en.xml3 kB
- 18.en.xml4 kB
- 13.en.xml2 kB
- 09.en.xml3 kB
- 22.en.xml2 kB
- 04.en.xml3 kB
- 07.en.xml5 kB
- 16.en.xml3 kB
- Customizations
- sentence_customization.xml73 B
- coref_customization.xml1 kB
- 17.mmax146 B
- 13.mmax146 B
- 20.mmax146 B
- 03.mmax146 B
- 18.mmax146 B
- Markables
- 18_coref_level.xml12 kB
- 19_sentence_level.xml1 kB
- 25_sentence_level.xml1 kB
- 10_sentence_level.xml2 kB
- 13_coref_level.xml4 kB
- 04_sentence_level.xml3 kB
- 01_coref_level.xml8 kB
- 25_coref_level.xml5 kB
- 20_coref_level.xml11 kB
- 18_sentence_level.xml3 kB
- 24_sentence_level.xml1 kB
- 03_sentence_level.xml2 kB
- 05_coref_level.xml17 kB
- 17_coref_level.xml6 kB
- 23_sentence_level.xml2 kB
- 17_sentence_level.xml2 kB
- 24_coref_level.xml5 kB
- 16_sentence_level.xml3 kB
- 22_sentence_level.xml2 kB
- 01_sentence_level.xml1 kB
- 09_coref_level.xml10 kB
- 04_coref_level.xml13 kB
- 16_coref_level.xml6 kB
- 21_sentence_level.xml3 kB
- 09_sentence_level.xml2 kB
- 23_coref_level.xml10 kB
- 08_coref_level.xml5 kB
- 08_sentence_level.xml2 kB
- 20_sentence_level.xml2 kB
- 03_coref_level.xml8 kB
- 10_coref_level.xml10 kB
- 22_coref_level.xml8 kB
- 13_sentence_level.xml1 kB
- 07_sentence_level.xml4 kB
- 07_coref_level.xml11 kB
- 19_coref_level.xml3 kB
- 05_sentence_level.xml4 kB
- markables.dtd69 B
- 21_coref_level.xml11 kB
- 21.mmax146 B
- 04.mmax146 B
- 19.mmax146 B
- DE
- DiscoMT
- DE
- 005_1938.mmax152 B
- 007_1953.mmax152 B
- common_paths.xml763 B
- 000_1756.mmax152 B
- 009_2043.mmax152 B
- Styles
- default_style.xsl1 kB
- just_text.xsl1 kB
- with_handles.xsl1 kB
- muc_style.xsl5 kB
- generic_nongui_style.xsl690 B
- README111 B
- 001_1819.mmax152 B
- 006_1950.mmax152 B
- 002_1825.mmax152 B
- Schemes
- coref_scheme.xml8 kB
- sentence_scheme.xml585 B
- 011_2053.mmax152 B
- 003_1894.mmax152 B
- Basedata
- 002_1825_words.xml82 kB
- 001_1819_words.xml90 kB
- words.dtd83 B
- 007_1953_words.xml133 kB
- 010_205_words.xml133 kB
- 011_2053_words.xml85 kB
- 003_1894_words.xml179 kB
- 000_1756_words.xml142 kB
- 005_1938_words.xml75 kB
- 006_1950_words.xml194 kB
- 009_2043_words.xml102 kB
- Customizations
- sentence_customization.xml73 B
- coref_customization.xml1 kB
- Markables
- 005_1938_coref_level.xml33 kB
- 003_1894_sentence_level.xml21 kB
- markables.dtd69 B
- 006_1950_sentence_level.xml22 kB
- 006_1950_coref_level.xml93 kB
- 003_1894_coref_level.xml88 kB
- 000_1756_coref_level.xml99 kB
- 005_1938_sentence_level.xml9 kB
- 007_1953_coref_level.xml59 kB
- 001_1819_sentence_level.xml13 kB
- 007_1953_sentence_level.xml21 kB
- 010_205_coref_level.xml68 kB
- 000_1756_sentence_level.xml17 kB
- 009_2043_coref_level.xml45 kB
- 002_1825_coref_level.xml38 kB
- 011_2053_coref_level.xml45 kB
- 009_2043_sentence_level.xml13 kB
- 011_2053_sentence_level.xml13 kB
- 010_205_sentence_level.xml17 kB
- 002_1825_sentence_level.xml11 kB
- 001_1819_coref_level.xml58 kB
- 010_205.mmax151 B
- EN
- 005_1938.mmax152 B
- 007_1953.mmax152 B
- common_paths.xml763 B
- 000_1756.mmax152 B
- 009_2043.mmax152 B
- Styles
- default_style.xsl1 kB
- just_text.xsl1 kB
- with_handles.xsl1 kB
- muc_style.xsl5 kB
- generic_nongui_style.xsl690 B
- 001_1819.mmax152 B
- Source
- README74 B
- segment
- talk001953.de-en.en20 kB
- talk000205.de-en.en17 kB
- talk001819.de-en.en14 kB
- talk001825.de-en.en13 kB
- talk001950.de-en.de32 kB
- talk001950.de-en.en28 kB
- talk001894.de-en.de29 kB
- talk002043.de-en.de16 kB
- talk001894.de-en.en25 kB
- talk002043.de-en.en15 kB
- talk002053.de-en.de15 kB
- talk001938.de-en.de14 kB
- talk002053.de-en.en13 kB
- talk001938.de-en.en13 kB
- talk001756.de-en.de23 kB
- talk001953.de-en.de21 kB
- talk001756.de-en.en20 kB
- talk000205.de-en.de20 kB
- talk001819.de-en.de15 kB
- talk001825.de-en.de14 kB
- sentence
- talk001953.de-en.en20 kB
- talk000205.de-en.en17 kB
- talk001819.de-en.en14 kB
- talk001825.de-en.en13 kB
- talk001950.de-en.de32 kB
- talk001950.de-en.en28 kB
- talk001894.de-en.de29 kB
- talk002043.de-en.de16 kB
- talk001894.de-en.en26 kB
- talk002043.de-en.en15 kB
- talk002053.de-en.de15 kB
- talk001938.de-en.de14 kB
- talk002053.de-en.en14 kB
- talk001938.de-en.en13 kB
- talk001756.de-en.de23 kB
- talk001953.de-en.de21 kB
- talk001756.de-en.en20 kB
- talk000205.de-en.de20 kB
- talk001819.de-en.de16 kB
- talk001825.de-en.de14 kB
- 006_1950.mmax152 B
- 002_1825.mmax152 B
- Schemes
- coref_scheme.xml8 kB
- sentence_scheme.xml585 B
- 011_2053.mmax152 B
- 003_1894.mmax152 B
- Basedata
- 002_1825_words.xml87 kB
- 001_1819_words.xml95 kB
- words.dtd83 B
- 007_1953_words.xml147 kB
- 010_205_words.xml132 kB
- 011_2053_words.xml91 kB
- 003_1894_words.xml185 kB
- 000_1756_words.xml138 kB
- 005_1938_words.xml82 kB
- 006_1950_words.xml195 kB
- 009_2043_words.xml108 kB
- Customizations
- sentence_customization.xml73 B
- coref_customization.xml1 kB
- Markables
- 005_1938_coref_level.xml40 kB
- 003_1894_sentence_level.xml21 kB
- markables.dtd69 B
- 006_1950_sentence_level.xml22 kB
- 006_1950_coref_level.xml105 kB
- 003_1894_coref_level.xml93 kB
- 000_1756_coref_level.xml83 kB
- 005_1938_sentence_level.xml9 kB
- 007_1953_coref_level.xml75 kB
- 001_1819_sentence_level.xml13 kB
- 007_1953_sentence_level.xml21 kB
- 010_205_coref_level.xml74 kB
- 000_1756_sentence_level.xml17 kB
- 009_2043_coref_level.xml52 kB
- 002_1825_coref_level.xml44 kB
- 011_2053_coref_level.xml50 kB
- 009_2043_sentence_level.xml13 kB
- 011_2053_sentence_level.xml13 kB
- 010_205_sentence_level.xml17 kB
- 002_1825_sentence_level.xml11 kB
- 001_1819_coref_level.xml65 kB
- 010_205.mmax151 B
- DE
- ._Guidelines.pdf176 B
- TED
- Name
- README.txt
- Size
- 3.39 KB
- Format
- Text file
- Description
- Release overview
- MD5
- 49e7bd1855ae4cb978ec94281bdf4417
ParCorFull -- a parallel corpus annotated for coreference ========================================================= Release 1 (May 2018) http://hdl.handle.net/11372/LRT-2614 Ekaterina Lapshinova-Koltunski, Christian Hardmeier and Pauline Krielke ----------------------------------------------------------------------- This is the first release of ParCorFull, an English-German parallel corpus annotated for full coreference. If you use this corpus in published work, please cite the following paper: @InProceedings{Lapshinova-Koltunski:2018, author = {Ekaterina Lapshinova-Koltunski and Christian Hardmeier and Pauline Krielke}, title = {{ParCorFull:} a Parallel Corpus Annotated with Full Coreference}, booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)}, pages = {423--428}, year = {2018}, month = {May}, address = {Miyazaki, Japan}, publisher = {European Language Resources Association (ELRA)} } http://www.lrec . . .
- Name
- LREC2018.pdf
- Size
- 314.38 KB
- Format
- Description
- Corpus description paper published at LREC 2018
- MD5
- 859869dbfaec4f00fd9a799953f624ad