Natural Language Inference (NLI) datasets in Czech
Please use the following text to cite this item or export to a predefined format:
Javorský, Dávid and Popel, Martin, 2023,
Natural Language Inference (NLI) datasets in Czech, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11234/1-5227.
Authors
Item identifier
Date issued
2023-10-01
Size
1047485 sentences
Description
The goal of the Natural Language Inference (NLI) task is to determine whether a "hypothesis" is true (entailment), false (contradiction), or undetermined (neutral) given a "premise".
The repository contains three NLI datasets, namely snli (https://huggingface.co/datasets/snli), multi_nli (https://huggingface.co/datasets/multi_nli) and qnli (https://huggingface.co/datasets/glue/viewer/qnli/train). These datasets are in two versions, the original English version and our-added Czech translation using CUBBITT, the Charles University Block-Backtranslation-Improved Transformer Translation model (https://lindat.mff.cuni.cz/services/translation/). The record includes target labels for Czech datasets as well, however, note that they could no longer be correct for the Czech translation (because of errors made by the translation model).
The licence of this record (CC BY-SA) holds for the translated part of the dataset. For the original English datasets, follow their respective licence descriptions.
Subject(s)
Collections
Files in this item
- Name
- multi_nli.zip
- Size
- 58.34 MB
- Format
- application/zip
- Description
- multi_nli
- MD5
- bf4c2805fdb2579c14d71fb4334ed163

- source
- validation_matched.premise1 MB
- train.label766 kB
- validation_mismatched.label19 kB
- train.premise42 MB
- validation_mismatched.premise1 MB
- train.hypothesis21 MB
- validation_matched.label19 kB
- validation_matched.hypothesis544 kB
- validation_mismatched.hypothesis605 kB
-
- README.txt626 B
- translation
- validation_matched.premise1 MB
- train.label766 kB
- validation_mismatched.label19 kB
- train.premise44 MB
- validation_mismatched.premise1 MB
- train.hypothesis22 MB
- validation_matched.label19 kB
- validation_matched.hypothesis564 kB
- validation_mismatched.hypothesis625 kB
- Name
- qnli.zip
- Size
- 21.33 MB
- Format
- application/zip
- Description
- qnli
- MD5
- 0611ed45d3bc9818d21662b415986c20

- source
- train.label204 kB
- train.premise5 MB
- train.hypothesis16 MB
- valid.hypothesis919 kB
- test.premise317 kB
- test.label15 kB
- test.hypothesis923 kB
- valid.premise319 kB
- valid.label10 kB
-
- README.txt639 B
- translation
- train.label204 kB
- train.premise6 MB
- train.hypothesis17 MB
- valid.hypothesis949 kB
- test.premise316 kB
- test.label15 kB
- test.hypothesis957 kB
- valid.premise320 kB
- valid.label10 kB
- Name
- snli.zip
- Size
- 21.22 MB
- Format
- application/zip
- Description
- snli
- MD5
- 3a46a1a2083f2ba6014c64df8660ee6a

- source
- train.label1 MB
- train.premise35 MB
- train.hypothesis20 MB
- valid.hypothesis380 kB
- test.premise713 kB
- test.label19 kB
- test.hypothesis378 kB
- valid.premise716 kB
- valid.label19 kB
- translation
- train.label1 MB
- train.premise35 MB
- train.hypothesis18 MB
- valid.hypothesis358 kB
- test.premise710 kB
- test.label19 kB
- test.hypothesis355 kB
- valid.premise715 kB
- valid.label19 kB
-
- README.txt621 B

