dc.contributor.author | Szabó, Adam |
dc.contributor.author | Straka, Milan |
dc.date.accessioned | 2021-07-19T14:08:15Z |
dc.date.available | 2021-07-19T14:08:15Z |
dc.date.issued | 2021-07-22 |
dc.identifier.uri | http://hdl.handle.net/11234/1-3731 |
dc.description | Czech Contracts dataset was created as a part of the thesis Low-resource Text Classification (2021), A. Szabó, MFF UK. Contracts are obtained from the Hlídač Státu web portal. Labels in the development and training set are automatically classified on the basis of the keyword method according to the thesis Automatická klasifikace smluv pro portál HlidacSmluv.cz, J. Maroušek (2020), MFF UK. For this reason, the goal in the classification is not to achieve 100% on the development set, as the classification contains a certain amount of noise. The test set is manually annotated. The dataset contains a total of 97493 contracts. |
dc.language.iso | ces |
dc.publisher | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) |
dc.rights | Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/4.0/ |
dc.subject | Czech |
dc.subject | document classification |
dc.subject | contracts |
dc.subject | Hlídač státu |
dc.title | Czech HS Contracts Dataset (CHSC) 1.0 |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
dc.rights.label | PUB |
has.files | yes |
branding | LINDAT / CLARIAH-CZ |
contact.person | Milan Straka straka@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) |
size.info | 97000 texts |
files.size | 474430503 |
files.count | 2 |
Files in this item
Download all files in item (452.45 MB)This item is
Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
- Name
- CHSC-1.0.tar.xz
- Size
- 452.31 MB
- Format
- application/x-xz
- Description
- Czech HS Contracts Dataset (CHSC) 1.0
- MD5
- ace2df821cafeef61984ebfa47b05d99
- CHSC-1.0
- categories.json4 kB
- dev10.jsonl26 MB
- test.jsonl43 MB
- train.jsonl2 GB
- dev.jsonl268 MB
- README.md7 kB
- LICENSE.txt20 kB
- Name
- CHSC-1.0.DESCRIPTION.pdf
- Size
- 144.33 KB
- Format
- Description
- Description of Czech HS Contracts Dataset (CHSC) 1.0
- MD5
- fdf5e64f529af54d9e4f55e36305efe4