This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.

Czech Natural Language Inference Dataset with Explanations

Please use the following text to cite this item or export to a predefined format:
Víta, Martin and Nevěřilová, Zuzana, 2024, Czech Natural Language Inference Dataset with Explanations, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-5548.
Date issued
2024
Size
17210 entries
Language(s)
Description
The dataset contains two parts: the original Stanford Natural Language Inference (SNLI) dataset with automatic translations to Czech, for some items from the SNLI, it contains annotation of the Czech content and explanation. The Czech SNLI data contain both Czech and English pairs premise-hypothesis. SNLI split into train/test/dev is preserved. - CZtrainSNLI.csv: 550152 pairs - CZtestSNLI.csv: 10000 pairs - CZdevSNLI.csv: 10000 pairs The explanation dataset contains batches of pairs premise-hypothesis. Each batch contains 1499 pairs. Each pair contains: - reference to original SNLI example - English premise and English hypothesis - English gold label (one of Entailment, Contradiction, Neutral) - automatically translated premise and hypothesis to Czech - Czech gold label (one of entailment, contradiction, neutral, bad translation) - explanations for Czech label Example record: CSNLI ID: 4857558207.jpg#4r1e English premise: A mother holds her newborn baby. English hypothesis: A person holding a child. English gold label: entailment Czech premise: Matka drží své novorozené dítě. Czech hypothesis: Osoba, která drží dítě. Czech gold label: Entailment Explanation-hypothesis: Matka Explanation-premise: Osoba Explanation-relation: generalization Size of the explanations dataset: - train: 159650 - dev: 2860 - test: 2880 Inter-Annotator Agreement (IAA) Packages 1 and 12 annotate the same data. The IAA measured by the kappa score is 0.67 (substantial agreement). The translation was performed via LINDAT translation service. Next, the translated pairs were manually checked (without access to the original English gold label), with possible check of the original pair. Explanations were annotated as follows: - if there is a part of the premise or hypothesis that is relevant for the annotator's decision, it is marked - if there are two such parts and there exists a relation between them, the relation is marked Possible relation types: - generalization: white long skirt - skirt - specification: dog - bulldog - similar: couch - sofa - independence: they have no instruments - they belong to the group - exclusion: man - woman Original SNLI dataset: https://nlp.stanford.edu/projects/snli/ LINDAT Translation Service: https://lindat.mff.cuni.cz/services/translation/
 Files in this item
Name
CZtrainSNLI.csv
Size
132.19 MB
Format
application/octet-stream
Description
Unknown
MD5
33b2b79cbfe81c6a22eff078b4c51ade
Preview
  File Preview
Name
CSSNLI_relations_8.csv
Size
333.67 KB
Format
application/octet-stream
Description
Unknown
MD5
3cc6dd4eec76c6f1843e7153b3e86c16
Preview
  File Preview
Name
CSSNLI_relations_6.csv
Size
491.68 KB
Format
application/octet-stream
Description
Unknown
MD5
fba0eea1354708e2a5701b3f7d58893d
Preview
  File Preview
Name
CSSNLI_relations_4.csv
Size
365.28 KB
Format
application/octet-stream
Description
Unknown
MD5
45ad1d9920009ec5fb1f113364d08a03
Preview
  File Preview
Name
CSSNLI_relations_5.csv
Size
331.37 KB
Format
application/octet-stream
Description
Unknown
MD5
f6cbd2ecb1dbf67d6c70e0b0821b8524
Preview
  File Preview
Name
CSSNLI_relations_9.csv
Size
407.17 KB
Format
application/octet-stream
Description
Unknown
MD5
f8974890895b2885ba05a443641a7372
Preview
  File Preview
Name
CSSNLI_relations_11.csv
Size
414.12 KB
Format
application/octet-stream
Description
Unknown
MD5
97cce46e20aac1dc7553a28236073a0b
Preview
  File Preview
Name
CSSNLI_relations_10.csv
Size
256.53 KB
Format
application/octet-stream
Description
Unknown
MD5
8c9c43bca68e613b207cc39782da5dbe
Preview
  File Preview
Name
CSSNLI_relations_7.csv
Size
599.63 KB
Format
application/octet-stream
Description
Unknown
MD5
1759ff46a7980687cd2aaf5142474e1b
Preview
  File Preview
Name
CZtestSNLI.csv
Size
2.52 MB
Format
application/octet-stream
Description
Unknown
MD5
328bfc8aba8f10ca68502d7bbf3ca1d6
Preview
  File Preview
Name
CZdevSNLI.csv
Size
2.53 MB
Format
application/octet-stream
Description
Unknown
MD5
90961cb149fc6fbac1b810f20f11afbc
Preview
  File Preview
Name
README.md
Size
3.8 KB
Format
application/octet-stream
Description
Unknown
MD5
bfd9189b667e1c14f0ac9e7be0c200d3
Preview
  File Preview
Name
CSSNLI_relations_12.csv
Size
396.54 KB
Format
application/octet-stream
Description
Unknown
MD5
0af176ef75b7b689b30f086b3e6b8c1c
Preview
  File Preview
Name
CSSNLI_relations_2.csv
Size
477.63 KB
Format
application/octet-stream
Description
Unknown
MD5
b670098020d71c016ffa48d16c6b5592
Preview
  File Preview
Name
CSSNLI_relations_3.csv
Size
467.25 KB
Format
application/octet-stream
Description
Unknown
MD5
50cbf348d7b06f6fe3397d83418fa28b
Preview
  File Preview
Name
CSSNLI_relations_1.csv
Size
407.42 KB
Format
application/octet-stream
Description
Unknown
MD5
32cf8f558b5d1c2fd88e84bcbd2f95fa
Preview
  File Preview