Show simple item record

 
dc.contributor.author Víta, Martin
dc.contributor.author Nevěřilová, Zuzana
dc.date.accessioned 2024-07-10T09:34:28Z
dc.date.available 2024-07-10T09:34:28Z
dc.date.issued 2024
dc.identifier.uri http://hdl.handle.net/11234/1-5548
dc.description The dataset contains two parts: the original Stanford Natural Language Inference (SNLI) dataset with automatic translations to Czech, for some items from the SNLI, it contains annotation of the Czech content and explanation. The Czech SNLI data contain both Czech and English pairs premise-hypothesis. SNLI split into train/test/dev is preserved. - CZtrainSNLI.csv: 550152 pairs - CZtestSNLI.csv: 10000 pairs - CZdevSNLI.csv: 10000 pairs The explanation dataset contains batches of pairs premise-hypothesis. Each batch contains 1499 pairs. Each pair contains: - reference to original SNLI example - English premise and English hypothesis - English gold label (one of Entailment, Contradiction, Neutral) - automatically translated premise and hypothesis to Czech - Czech gold label (one of entailment, contradiction, neutral, bad translation) - explanations for Czech label Example record: CSNLI ID: 4857558207.jpg#4r1e English premise: A mother holds her newborn baby. English hypothesis: A person holding a child. English gold label: entailment Czech premise: Matka drží své novorozené dítě. Czech hypothesis: Osoba, která drží dítě. Czech gold label: Entailment Explanation-hypothesis: Matka Explanation-premise: Osoba Explanation-relation: generalization Size of the explanations dataset: - train: 159650 - dev: 2860 - test: 2880 Inter-Annotator Agreement (IAA) Packages 1 and 12 annotate the same data. The IAA measured by the kappa score is 0.67 (substantial agreement). The translation was performed via LINDAT translation service. Next, the translated pairs were manually checked (without access to the original English gold label), with possible check of the original pair. Explanations were annotated as follows: - if there is a part of the premise or hypothesis that is relevant for the annotator's decision, it is marked - if there are two such parts and there exists a relation between them, the relation is marked Possible relation types: - generalization: white long skirt - skirt - specification: dog - bulldog - similar: couch - sofa - independence: they have no instruments - they belong to the group - exclusion: man - woman Original SNLI dataset: https://nlp.stanford.edu/projects/snli/ LINDAT Translation Service: https://lindat.mff.cuni.cz/services/translation/
dc.language.iso ces
dc.publisher Masaryk University, NLP Centre
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri http://creativecommons.org/licenses/by-sa/4.0/
dc.subject natural language inference
dc.subject textual entailment
dc.title Czech Natural Language Inference Dataset with Explanations
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
dc.rights.label PUB
has.files yes
branding LINDAT / CLARIAH-CZ
contact.person Zuzana Nevěřilová xpopelk@fi.muni.cz Masaryk University, NLP Centre
size.info 17210 entries
files.size 148973204
files.count 16


 Files in this item

 Download all files in item (142.07 MB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
CZtrainSNLI.csv
Size
132.19 MB
Format
Unknown
Description
Unknown
MD5
33b2b79cbfe81c6a22eff078b4c51ade
 Download file
Icon
Name
CZdevSNLI.csv
Size
2.53 MB
Format
Unknown
Description
Unknown
MD5
90961cb149fc6fbac1b810f20f11afbc
 Download file
Icon
Name
CZtestSNLI.csv
Size
2.52 MB
Format
Unknown
Description
Unknown
MD5
328bfc8aba8f10ca68502d7bbf3ca1d6
 Download file
Icon
Name
README.md
Size
3.8 KB
Format
Unknown
Description
Unknown
MD5
bfd9189b667e1c14f0ac9e7be0c200d3
 Download file
Icon
Name
CSSNLI_relations_1.csv
Size
407.42 KB
Format
Unknown
Description
Unknown
MD5
32cf8f558b5d1c2fd88e84bcbd2f95fa
 Download file
Icon
Name
CSSNLI_relations_2.csv
Size
477.63 KB
Format
Unknown
Description
Unknown
MD5
b670098020d71c016ffa48d16c6b5592
 Download file
Icon
Name
CSSNLI_relations_3.csv
Size
467.25 KB
Format
Unknown
Description
Unknown
MD5
50cbf348d7b06f6fe3397d83418fa28b
 Download file
Icon
Name
CSSNLI_relations_4.csv
Size
365.28 KB
Format
Unknown
Description
Unknown
MD5
45ad1d9920009ec5fb1f113364d08a03
 Download file
Icon
Name
CSSNLI_relations_5.csv
Size
331.37 KB
Format
Unknown
Description
Unknown
MD5
f6cbd2ecb1dbf67d6c70e0b0821b8524
 Download file
Icon
Name
CSSNLI_relations_6.csv
Size
491.68 KB
Format
Unknown
Description
Unknown
MD5
fba0eea1354708e2a5701b3f7d58893d
 Download file
Icon
Name
CSSNLI_relations_7.csv
Size
599.63 KB
Format
Unknown
Description
Unknown
MD5
1759ff46a7980687cd2aaf5142474e1b
 Download file
Icon
Name
CSSNLI_relations_8.csv
Size
333.67 KB
Format
Unknown
Description
Unknown
MD5
3cc6dd4eec76c6f1843e7153b3e86c16
 Download file
Icon
Name
CSSNLI_relations_9.csv
Size
407.17 KB
Format
Unknown
Description
Unknown
MD5
f8974890895b2885ba05a443641a7372
 Download file
Icon
Name
CSSNLI_relations_10.csv
Size
256.53 KB
Format
Unknown
Description
Unknown
MD5
8c9c43bca68e613b207cc39782da5dbe
 Download file
Icon
Name
CSSNLI_relations_11.csv
Size
414.12 KB
Format
Unknown
Description
Unknown
MD5
97cce46e20aac1dc7553a28236073a0b
 Download file
Icon
Name
CSSNLI_relations_12.csv
Size
396.54 KB
Format
Unknown
Description
Unknown
MD5
0af176ef75b7b689b30f086b3e6b8c1c
 Download file

Show simple item record