Show simple item record

 
dc.contributor.author Kocmi, Tom
dc.contributor.author Bojar, Ondřej
dc.date.accessioned 2016-06-09T14:02:03Z
dc.date.available 2016-06-09T14:02:03Z
dc.date.issued 2016-05-30
dc.identifier.uri http://hdl.handle.net/11234/1-1730
dc.description We have created test set for syntactic questions presented in the paper [1] which is more general than Mikolov's [2]. Since we were interested in morphosyntactic relations, we extended only the questions of the syntactic type with exception of nationality adjectives which is already covered completely in Mikolov's test set. We constructed the pairs more or less manually, taking inspiration in the Czech side of the CzEng corpus [3], where explicit morphological annotation allows to identify various pairs of Czech words (different grades of adjectives, words and their negations, etc.). The word-aligned English words often shared the same properties. Another sources of pairs were acquired from various webpages usually written for learners of English. For example for verb tense, we relied on a freely available list of English verbs and their morphological variations. We have included 100-1000 different pairs for each question set. The questions were constructed from the pairs similarly as by Mikolov: generating all possible pairs of pairs. This leads to millions of questions, so we randomly selected 1000 instances per question set, to keep the test set in the same order of magnitude. Additionally, we decided to extend set of questions on opposites to cover not only opposites of adjectives but also of nouns and verbs.
dc.language.iso eng
dc.publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri http://creativecommons.org/licenses/by-sa/4.0/
dc.subject syntactic questions
dc.title Extended Morphosyntactic Testset for Word2Vec
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.mediaType text
metashare.ResourceInfo#ContentInfo.detailedType other
dc.rights.label PUB
has.files yes
branding LINDAT / CLARIAH-CZ
contact.person Tom Kocmi kocmi@ufal.mff.cuni.cz Charles University in Prague, UFAL
sponsor Charles University 8502/2016 GAUK 8502/2016 Other
size.info 8000 entries
files.size 55966
files.count 1


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
syntactic_questions.zip
Size
54.65 KB
Format
application/zip
Description
syntactic_questions
MD5
47d9ecbc800c43558fbe317482ad468e
 Download file  Preview
 File Preview  
    • syntactic_questions241 kB

Show simple item record