Show simple item record

 
dc.contributor.author Šimečková, Zuzana
dc.contributor.author Straka, Milan
dc.date.accessioned 2020-07-31T13:31:42Z
dc.date.available 2020-07-31T13:31:42Z
dc.date.issued 2020-07-30
dc.identifier.uri http://hdl.handle.net/11234/1-3265
dc.description CERED (Czech Relationship Dataset) is a family of datasets created via distant supervision on Czech Wikipedia and Wikidata. It was created as part of a thesis on Relationship Extraction (2020). CERED0 is the largest dataset, it lacks negative relation and its relation inventory is huge. CERED*n* is a subset of CERED*n-1* that satisfies some conditions. The methodology of curating the datasets is detailed in the thesis. The format of the data is jsonL and the tools used to generate the dataset is python.
dc.language.iso ces
dc.publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/4.0/
dc.subject entity relationship
dc.subject relationship extraction
dc.title Czech Relationship Extraction Dataset
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
dc.rights.label PUB
has.files yes
branding LINDAT / CLARIAH-CZ
contact.person Milan Straka straka@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
files.size 320019756
files.count 1


 Files in this item

Icon
Name
CERED_DATASETS.zip
Size
305.19 MB
Format
application/zip
Description
CERED datasets and code that generated it
MD5
1339e52f4df30180f58f44dd3fc25e8f
 Download file  Preview
 File Preview  
  • CERED
    • CERED0_LABELS23 kB
    • CERED3_LABELS2 kB
    • CERED0_TRAINSET682 MB
    • CERED1_LABELS2 kB
    • CERED4_TESTSET77 kB
    • CERED2_TESTSET2 MB
    • CERED0_TESTSET12 MB
    • CERED3_TRAINSET68 MB
    • CERED4_DEVSET83 kB
    • CERED2_DEVSET2 MB
    • CERED0_DEVSET12 MB
    • CERED1_TRAINSET461 MB
    • CERED4_LABELS1 kB
    • CERED2_LABELS2 kB
    • CERED3_TESTSET1 MB
    • CERED4_TRAINSET4 MB
    • CERED1_TESTSET8 MB
    • CERED3_DEVSET1 MB
    • CERED1_DEVSET8 MB
    • CERED2_TRAINSET160 MB
  • wiki_corpus_generator
    • wikidumppreprocess.py5 kB
    • test.py332 B
    • README.md589 B
    • logg_wrapper.py2 kB
    • viewer
      • viewer.py2 kB
      • viewer_stats.py1 kB
      • viewer_article.py2 kB
      • utils.py3 kB
      • viewer_config.py453 B
      • __pycache__
        • viewer_config.cpython-37.pyc682 B
        • Article.cpython-37.pyc7 kB
        • viewer_stats.cpython-37.pyc1 kB
        • utils.cpython-37.pyc4 kB
        • viewer_article.cpython-37.pyc2 kB
      • Article.py8 kB
    • CONFIG.py7 kB
    • TextEntity.py4 kB
    • wikitextparse2.py14 kB
    • TEST_DEV_articles.py197 kB
    • text_support.py103 kB
    • wikitextparse.py18 kB
    • CEREDstats.py8 kB
    • runparseQPQ.sh906 B
    • parseQPQ.py7 kB
    • Entity_QID_Names.py10 kB
    • matcher.py3 kB
    • requirements.txt99 B
    • reduceEntities.py606 B
    • HOWTOREPLICATE.md3 kB
    • runwikitextparse.sh1 kB
    • README.md1 kB
    • evaluator.py681 B

Show simple item record