Czech Relationship Extraction Dataset
Please use the following text to cite this item or export to a predefined format:
Šimečková, Zuzana and Straka, Milan, 2020,
Czech Relationship Extraction Dataset, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11234/1-3265.
Authors
Item identifier
Date issued
2020-07-30
Language(s)
Description
CERED (Czech Relationship Dataset) is a family of datasets created via distant supervision on Czech Wikipedia and Wikidata. It was created as part of a thesis on Relationship Extraction (2020).
CERED0 is the largest dataset, it lacks negative relation and its relation inventory is huge.
CERED*n* is a subset of CERED*n-1* that satisfies some conditions. The methodology of curating the datasets is detailed in the thesis.
The format of the data is jsonL and the tools used to generate the dataset is python.
Subject(s)
Collections
This item isPublicly Available
and licensed under:
Files in this item
- Name
- CERED_DATASETS.zip
- Size
- 305.19 MB
- Format
- application/zip
- Description
- CERED datasets and code that generated it
- MD5
- 1339e52f4df30180f58f44dd3fc25e8f

- CERED
- CERED0_LABELS23 kB
- CERED3_LABELS2 kB
- CERED0_TRAINSET682 MB
- CERED1_LABELS2 kB
- CERED4_TESTSET77 kB
- CERED2_TESTSET2 MB
- CERED0_TESTSET12 MB
- CERED3_TRAINSET68 MB
- CERED4_DEVSET83 kB
- CERED2_DEVSET2 MB
- CERED0_DEVSET12 MB
- CERED1_TRAINSET461 MB
- CERED4_LABELS1 kB
- CERED2_LABELS2 kB
- CERED3_TESTSET1 MB
- CERED4_TRAINSET4 MB
- CERED1_TESTSET8 MB
- CERED3_DEVSET1 MB
- CERED1_DEVSET8 MB
- CERED2_TRAINSET160 MB
- wiki_corpus_generator
- wikidumppreprocess.py5 kB
- test.py332 B
- README.md589 B
- logg_wrapper.py2 kB
- viewer
- viewer.py2 kB
- viewer_stats.py1 kB
- viewer_article.py2 kB
- utils.py3 kB
- viewer_config.py453 B
- __pycache__
- viewer_config.cpython-37.pyc682 B
- Article.cpython-37.pyc7 kB
- viewer_stats.cpython-37.pyc1 kB
- utils.cpython-37.pyc4 kB
- viewer_article.cpython-37.pyc2 kB
- Article.py8 kB
- CONFIG.py7 kB
- TextEntity.py4 kB
- TEST_DEV_articles.py197 kB
- wikitextparse2.py14 kB
- text_support.py103 kB
- wikitextparse.py18 kB
- CEREDstats.py8 kB
- runparseQPQ.sh906 B
- parseQPQ.py7 kB
- Entity_QID_Names.py10 kB
- matcher.py3 kB
- requirements.txt99 B
- reduceEntities.py606 B
- HOWTOREPLICATE.md3 kB
- runwikitextparse.sh1 kB
-
- README.md1 kB
- evaluator.py681 B

