Czech Relationship Extraction Dataset

Name: Czech Relationship Extraction Dataset
License: http://creativecommons.org/licenses/by-nc-sa/4.0/

dc.contributor.author	Šimečková, Zuzana
dc.contributor.author	Straka, Milan
dc.date.accessioned	2020-07-31T13:31:42Z
dc.date.available	2020-07-31T13:31:42Z
dc.date.issued	2020-07-30
dc.identifier.uri	http://hdl.handle.net/11234/1-3265
dc.description	CERED (Czech Relationship Dataset) is a family of datasets created via distant supervision on Czech Wikipedia and Wikidata. It was created as part of a thesis on Relationship Extraction (2020). CERED0 is the largest dataset, it lacks negative relation and its relation inventory is huge. CEREDn is a subset of CEREDn-1 that satisfies some conditions. The methodology of curating the datasets is detailed in the thesis. The format of the data is jsonL and the tools used to generate the dataset is python.
dc.language.iso	ces
dc.publisher	Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.rights	Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/
dc.subject	entity relationship
dc.subject	relationship extraction
dc.title	Czech Relationship Extraction Dataset
dc.type	corpus
metashare.ResourceInfo#ContentInfo.mediaType	text
dc.rights.label	PUB
has.files	yes
branding	LINDAT / CLARIAH-CZ
contact.person	Milan Straka straka@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
files.size	320019756
files.count	1

This item is

Publicly Available

File Preview