Show simple item record

 
dc.contributor.author Cinková, Silvie
dc.contributor.author Straková, Jana
dc.contributor.author Hajič, Jakub
dc.contributor.author Hajič, Jan
dc.contributor.author Hajič, Jan, jr.
dc.contributor.author Janoušková, Jolana
dc.contributor.author Straka, Milan
dc.contributor.author Urešová, Miroslava
dc.date.accessioned 2016-10-10T15:11:23Z
dc.date.available 2016-10-10T15:11:23Z
dc.date.issued 2016-06-01
dc.identifier.uri http://hdl.handle.net/11234/1-1713
dc.description Czech translation of WordSim353. The Czech translation of English WordSim353 word pairs were obtained from four translators. All translation variants were scored according to the lexical similarity/relatedness annotation instructions for WordSim353 annotators, by 25 Czech annotators. The resulting data set consists of two annotation files: "WordSim353-cs.csv" and "WordSim-cs-Multi.csv". Both files are encoded in UTF-8, have a header, text is enclosed in double quotes, and columns are separated by commas. The rows are numbered. The WordSim-cs-Multi data set has rows numbered from 1 to 634, whereas the row indices in the WordSim353-cs data set reflect the corresponding row numbers in the WordSim-cs-Multi data set. The WordSim353-cs file contains a one-to-one mapping selection of 353 Czech equivalent pairs whose judgments have proven to be most similar to the judgments of their corresponding English originals (compared by the absolute value of the difference between the means over all annotators in each language counterpart). In one case ("psychology-cognition"), two Czech equivalent pairs had identical means as well as confidence intervals, so we randomly selected one. The "WordSim-cs-Multi.csv" file contains human judgments for all translation variants. In both data sets, we preserved all 25 individual scores. In the WordSim353-cs data set, we added a column with their Czech means as well as a column containing the original English means and 95% confidence intervals in separate columns for each mean (computed by the CI function in the Rmisc R package). The WordSim-cs-Multi data set contains only the Czech means and confidence intervals. For the most convenient lexical search, we provided separate columns with the respective Czech and English single words, entire word pairs, and eventually an English-Czech quadruple in both data sets. The data set also contains an xls table with the four translations and a preliminary selection of the best variants performed by an adjudicator.
dc.language.iso ces
dc.language.iso eng
dc.publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri http://creativecommons.org/licenses/by/4.0/
dc.source.uri http://ufal.mff.cuni.cz/wordsim353-cs
dc.subject lexical semantics
dc.subject similarity
dc.subject relatedness
dc.subject evaluation
dc.subject distributional semantics
dc.title WordSim353-cs: Evaluation Dataset for Lexical Similarity and Relatedness, based on WordSim353
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.mediaType text
metashare.ResourceInfo#ContentInfo.detailedType wordList
dc.rights.label PUB
has.files yes
branding LINDAT / CLARIAH-CZ
contact.person Silvie Cinková cinkova@ufal.mff.cuni.cz Charles University in Prague, UFAL
sponsor Czech Science Foundation GA 15-20031S Reviving Zellig S. Harris: More linguistic information for distributional lexical analysis of English and Czech nationalFunds
sponsor Grantová agentura Akademie věd České republiky 1ET201120505 Od jazyka ke znalostem a sémantickému webu nationalFunds
sponsor Univerzita Karlova v Praze (mimo GAUK) PRVOUK PRVOUK nationalFunds
sponsor Ministerstvo školství, mládeže a tělovýchovy České republiky LM2015071 LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat nationalFunds
size.info 634353 other
files.size 330077
files.count 1


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons Attribution Required
Icon
Name
WordSim353-cs.zip
Size
322.34 KB
Format
application/zip
Description
WordSim353-cs + Readme and a paper
MD5
31da4643e80ff87f7a8c95ae5f37db14
 Download file  Preview
 File Preview  
  • WordSim353-cs
    • annotation_preparation
      • ANNOTATION_FORM.csv10 kB
      • respondent_instructions_in_Czech.txt2 kB
    • README.txt6 kB
    • WordSim353-cs.csv87 kB
    • cin_wordsim353cs_tsdDRAFT.pdf241 kB
    • WordSim-cs-Multi.csv143 kB
    • translation
      • translator_instructions_in_Czech.txt1 kB
      • czech353-1-2-3-4-for-adjudication.csv34 kB

Show simple item record