Czech Natural Language Inference Dataset with Explanations
Please use the following text to cite this item or export to a predefined format:
Víta, Martin and Nevěřilová, Zuzana, 2024,
Czech Natural Language Inference Dataset with Explanations, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11234/1-5548.
Authors
Item identifier
Date issued
2024
Size
17210 entries
Language(s)
Description
The dataset contains two parts: the original Stanford Natural Language Inference (SNLI) dataset with automatic translations to Czech, for some items from the SNLI, it contains annotation of the Czech content and explanation.
The Czech SNLI data contain both Czech and English pairs premise-hypothesis. SNLI split into train/test/dev is preserved.
- CZtrainSNLI.csv: 550152 pairs
- CZtestSNLI.csv: 10000 pairs
- CZdevSNLI.csv: 10000 pairs
The explanation dataset contains batches of pairs premise-hypothesis. Each batch contains 1499 pairs. Each pair contains:
- reference to original SNLI example
- English premise and English hypothesis
- English gold label (one of Entailment, Contradiction, Neutral)
- automatically translated premise and hypothesis to Czech
- Czech gold label (one of entailment, contradiction, neutral, bad translation)
- explanations for Czech label
Example record:
CSNLI ID: 4857558207.jpg#4r1e
English premise: A mother holds her newborn baby.
English hypothesis: A person holding a child.
English gold label: entailment
Czech premise: Matka drží své novorozené dítě.
Czech hypothesis: Osoba, která drží dítě.
Czech gold label: Entailment
Explanation-hypothesis: Matka
Explanation-premise: Osoba
Explanation-relation: generalization
Size of the explanations dataset:
- train: 159650
- dev: 2860
- test: 2880
Inter-Annotator Agreement (IAA)
Packages 1 and 12 annotate the same data. The IAA measured by the kappa score is 0.67 (substantial agreement).
The translation was performed via LINDAT translation service.
Next, the translated pairs were manually checked (without access to the original English gold label), with possible check of the original pair.
Explanations were annotated as follows:
- if there is a part of the premise or hypothesis that is relevant for the annotator's decision, it is marked
- if there are two such parts and there exists a relation between them, the relation is marked
Possible relation types:
- generalization: white long skirt - skirt
- specification: dog - bulldog
- similar: couch - sofa
- independence: they have no instruments - they belong to the group
- exclusion: man - woman
Original SNLI dataset: https://nlp.stanford.edu/projects/snli/
LINDAT Translation Service: https://lindat.mff.cuni.cz/services/translation/
Publisher
Subject(s)
Collections
This item isPublicly Available
and licensed under:
Files in this item
- Name
- CZtrainSNLI.csv
- Size
- 132.19 MB
- Format
- application/octet-stream
- Description
- Unknown
- MD5
- 33b2b79cbfe81c6a22eff078b4c51ade

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- CSSNLI_relations_8.csv
- Size
- 333.67 KB
- Format
- application/octet-stream
- Description
- Unknown
- MD5
- 3cc6dd4eec76c6f1843e7153b3e86c16

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- CSSNLI_relations_6.csv
- Size
- 491.68 KB
- Format
- application/octet-stream
- Description
- Unknown
- MD5
- fba0eea1354708e2a5701b3f7d58893d

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- CSSNLI_relations_4.csv
- Size
- 365.28 KB
- Format
- application/octet-stream
- Description
- Unknown
- MD5
- 45ad1d9920009ec5fb1f113364d08a03

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- CSSNLI_relations_5.csv
- Size
- 331.37 KB
- Format
- application/octet-stream
- Description
- Unknown
- MD5
- f6cbd2ecb1dbf67d6c70e0b0821b8524

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- CSSNLI_relations_9.csv
- Size
- 407.17 KB
- Format
- application/octet-stream
- Description
- Unknown
- MD5
- f8974890895b2885ba05a443641a7372

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- CSSNLI_relations_11.csv
- Size
- 414.12 KB
- Format
- application/octet-stream
- Description
- Unknown
- MD5
- 97cce46e20aac1dc7553a28236073a0b

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- CSSNLI_relations_10.csv
- Size
- 256.53 KB
- Format
- application/octet-stream
- Description
- Unknown
- MD5
- 8c9c43bca68e613b207cc39782da5dbe

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- CSSNLI_relations_7.csv
- Size
- 599.63 KB
- Format
- application/octet-stream
- Description
- Unknown
- MD5
- 1759ff46a7980687cd2aaf5142474e1b

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- CZtestSNLI.csv
- Size
- 2.52 MB
- Format
- application/octet-stream
- Description
- Unknown
- MD5
- 328bfc8aba8f10ca68502d7bbf3ca1d6

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- CZdevSNLI.csv
- Size
- 2.53 MB
- Format
- application/octet-stream
- Description
- Unknown
- MD5
- 90961cb149fc6fbac1b810f20f11afbc

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- README.md
- Size
- 3.8 KB
- Format
- application/octet-stream
- Description
- Unknown
- MD5
- bfd9189b667e1c14f0ac9e7be0c200d3

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- CSSNLI_relations_12.csv
- Size
- 396.54 KB
- Format
- application/octet-stream
- Description
- Unknown
- MD5
- 0af176ef75b7b689b30f086b3e6b8c1c

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- CSSNLI_relations_2.csv
- Size
- 477.63 KB
- Format
- application/octet-stream
- Description
- Unknown
- MD5
- b670098020d71c016ffa48d16c6b5592

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- CSSNLI_relations_3.csv
- Size
- 467.25 KB
- Format
- application/octet-stream
- Description
- Unknown
- MD5
- 50cbf348d7b06f6fe3397d83418fa28b

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- CSSNLI_relations_1.csv
- Size
- 407.42 KB
- Format
- application/octet-stream
- Description
- Unknown
- MD5
- 32cf8f558b5d1c2fd88e84bcbd2f95fa

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz

