This is a new version of the repository. Do let us know (lindat-help at ufal.mff.cuni.cz) if you encounter any issues.
 

COSTRA 1.1: A Dataset of Complex Sentence Transformations and Comparisons

Please use the following text to cite this item or export to a predefined format:
Barančíková, Petra and Bojar, Ondřej, 2020, COSTRA 1.1: A Dataset of Complex Sentence Transformations and Comparisons, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), http://hdl.handle.net/11234/1-3248.
Date issued
2020-06-15
Size
6968 sentences
Language(s)
Description
Costra 1.1 is a new dataset for testing geometric properties of sentence embeddings spaces. In particular, it concentrates on examining how well sentence embeddings capture complex phenomena such paraphrases, tense or generalization. The dataset is a direct expansion of Costra 1.0, which was extended with more sentences and sentence comparisons.
Acknowledgement
This item isPublicly Available
and licensed under:
 Files in this item
Name
data.tsv
Size
796.54 KB
Format
application/octet-stream
Description
Unknown
MD5
e30cd60188074f3006eb5f976eddb993
Preview
  File Preview
Name
README
Size
3.94 KB
Format
application/octet-stream
Description
Unknown
MD5
ec1d7ad7c25a11b40f9496433a632a3f
Preview
  File Preview