1. Czech Relationship Extraction Dataset
- Creator:
- Šimečková, Zuzana and Straka, Milan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- entity relationship and relationship extraction
- Language:
- Czech
- Description:
- CERED (Czech Relationship Dataset) is a family of datasets created via distant supervision on Czech Wikipedia and Wikidata. It was created as part of a thesis on Relationship Extraction (2020). CERED0 is the largest dataset, it lacks negative relation and its relation inventory is huge. CERED*n* is a subset of CERED*n-1* that satisfies some conditions. The methodology of curating the datasets is detailed in the thesis. The format of the data is jsonL and the tools used to generate the dataset is python.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB