Show simple item record

 
dc.contributor.author Szabó, Adam
dc.contributor.author Straka, Milan
dc.date.accessioned 2021-07-19T14:08:15Z
dc.date.available 2021-07-19T14:08:15Z
dc.date.issued 2021-07-22
dc.identifier.uri http://hdl.handle.net/11234/1-3731
dc.description Czech Contracts dataset was created as a part of the thesis Low-resource Text Classification (2021), A. Szabó, MFF UK. Contracts are obtained from the Hlídač Státu web portal. Labels in the development and training set are automatically classified on the basis of the keyword method according to the thesis Automatická klasifikace smluv pro portál HlidacSmluv.cz, J. Maroušek (2020), MFF UK. For this reason, the goal in the classification is not to achieve 100% on the development set, as the classification contains a certain amount of noise. The test set is manually annotated. The dataset contains a total of 97493 contracts.
dc.language.iso ces
dc.publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/4.0/
dc.subject Czech
dc.subject document classification
dc.subject contracts
dc.subject Hlídač státu
dc.title Czech HS Contracts Dataset (CHSC) 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
dc.rights.label PUB
has.files yes
branding LINDAT / CLARIAH-CZ
contact.person Milan Straka straka@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
size.info 97000 texts
files.size 474430503
files.count 2


 Files in this item

 Download all files in item (452.45 MB)
Icon
Name
CHSC-1.0.tar.xz
Size
452.31 MB
Format
application/x-xz
Description
Czech HS Contracts Dataset (CHSC) 1.0
MD5
ace2df821cafeef61984ebfa47b05d99
 Download file  Preview
 File Preview  
  • CHSC-1.0
    • categories.json4 kB
    • dev10.jsonl26 MB
    • test.jsonl43 MB
    • train.jsonl2 GB
    • dev.jsonl268 MB
    • README.md7 kB
    • LICENSE.txt20 kB
Icon
Name
CHSC-1.0.DESCRIPTION.pdf
Size
144.33 KB
Format
PDF
Description
Description of Czech HS Contracts Dataset (CHSC) 1.0
MD5
fdf5e64f529af54d9e4f55e36305efe4
 Download file

Show simple item record