COSTRA 1.1: A Dataset of Complex Sentence Transformations and Comparisons
Please use the following text to cite this item or export to a predefined format:
Barančíková, Petra and Bojar, Ondřej, 2020,
COSTRA 1.1: A Dataset of Complex Sentence Transformations and Comparisons, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11234/1-3248.
Authors
Item identifier
Date issued
2020-06-15
Size
6968 sentences
Language(s)
Description
Costra 1.1 is a new dataset for testing geometric properties of sentence embeddings spaces. In particular, it concentrates on examining how well sentence embeddings capture complex phenomena such paraphrases, tense or generalization. The dataset is a direct expansion of Costra 1.0, which was extended with more sentences and sentence comparisons.
Acknowledgement
European Union
Project code:EC/H2020/825303
Project name:Bergamot - Browser-based Multilingual Translation
Czech Science Foundation
Project code:19-26934X
Project name:Neural Representations in Multi-modal and Multi-lingual Modelling
Subject(s)
Collections
This item isPublicly Available
and licensed under:
Files in this item
- Name
- data.tsv
- Size
- 796.54 KB
- Format
- application/octet-stream
- Description
- Unknown
- MD5
- e30cd60188074f3006eb5f976eddb993

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- README
- Size
- 3.94 KB
- Format
- application/octet-stream
- Description
- Unknown
- MD5
- ec1d7ad7c25a11b40f9496433a632a3f

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz

