« Previous |
1 - 10 of 13
|
Next »
Number of results to display per page
Search Results
2. A Gold Standard Word Alignment for English-Swedish (2015-10-12)
- Creator:
- Ahrenberg, Lars and Holmqvist, Maria
- Publisher:
- Linköping University
- Type:
- text, wordList, and lexicalConceptualResource
- Subject:
- word alignment
- Language:
- Swedish and English
- Description:
- A Gold Standard Word Alignment for English-Swedish (GES) is a resource containing 1164 manually word aligned sentences pairs from English and Swedish versions of Europarl v. 2.
- Rights:
- Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB
3. Addressed Arabic Phonetic Rules
- Creator:
- Mustafa, Ebtihal and Bouzoubaa, Karim
- Publisher:
- languages journal
- Type:
- text, wordList, and lexicalConceptualResource
- Subject:
- phonetics and Arabic phonetic System.
- Language:
- Arabic
- Description:
- This xml file describes the Arabic phonetic constraints are to be applied on Arabic root. The first rule category lists the letters that may not occur in the same root, regardless of their order. The second category lists the letters that may not be used together in a root word with a specific order. The third and fourth categories show that each contiguous letters must not be redundant ISLRN: 991-445-325-823-5
- Rights:
- Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB
4. AdjDeriNet: Words Derived from Adjectives in Czech
- Creator:
- Ševčíková, Magda and Žabokrtský, Zdeněk
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text, wordList, and lexicalConceptualResource
- Subject:
- adjectives, derivation, word-formation, and derivational morphology
- Language:
- Czech
- Description:
- Lexical network AdjDeriNet consists of pairs of base adjectives and their derivatives. It contains nearly 18 thousand base adjectives that are base words for more than 26 thousand lexemes of several parts of speech.
- Rights:
- Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0), http://creativecommons.org/licenses/by-nc-sa/3.0/, and PUB
5. Arabic Morphological evaluation corpus
- Creator:
- Jaafar, Younes
- Publisher:
- Ibtikarat team
- Type:
- text, wordList, and lexicalConceptualResource
- Subject:
- morphological analysis and benchmarking corpus
- Language:
- Arabic
- Description:
- An annotated corpus dedicated to the benchmark and evaluation of Arabic morphological analyzers. It consists of 100 words with all their possible analysis. The corpus contains several morphological information such as stem, pattern, root, lemma, etc.
- Rights:
- Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB
6. Broken plural list
- Creator:
- Ouamer, meriem, Bouzoubaa, Karim, and Tajmout, rachida
- Publisher:
- ALELM research group
- Type:
- text, wordList, and lexicalConceptualResource
- Subject:
- Broken plural
- Language:
- Arabic
- Description:
- An LMF conformant XML-based file containing a comprehensive Arabic broken plural list. The file contains 12,249 singular words with their corresponding BPs
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
7. Czech Multiword Expressions
- Creator:
- Nevěřilová, Zuzana
- Publisher:
- Faculty of Informatics, Masaryk University
- Type:
- text, wordList, and lexicalConceptualResource
- Subject:
- multiword expressions
- Language:
- Czech
- Description:
- The dataset contains 4731 frozen continuous Czech multiword expressions. Inflectional word forms are generated for those MWEs where applicable. In total, the dataset contains 24,807 MWE forms.
- Rights:
- Public Domain Mark (PD), http://creativecommons.org/publicdomain/mark/1.0/, and PUB
8. Czech SubLex 1.0
- Creator:
- Veselovská, Kateřina and Bojar, Ondřej
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text, lexicalConceptualResource, and wordList
- Subject:
- subjectivity lexicon, sentiment analysis, opinion mining, and polarity clues
- Language:
- Czech
- Description:
- Czech subjectivity lexicon, i.e. a list of subjectivity clues for sentiment analysis in Czech. The list contains 4626 evaluative items (1672 positive and 2954 negative) together with their part of speech tags, polarity orientation and source information. The core of the Czech subjectivity lexicon has been gained by automatic translation of a freely available English subjectivity lexicon downloaded from http://www.cs.pitt.edu/mpqa/subj_lexicon.html. For translating the data into Czech, we used parallel corpus CzEng 1.0 containing 15 million parallel sentences (233 million English and 206 million Czech tokens) from seven different types of sources automatically annotated at surface and deep layers of syntactic representation. Afterwards, the lexicon has been manually refined by an experienced annotator. and The work on this project has been supported by the GAUK 3537/2011 grant and by SVV project number 267 314.
- Rights:
- Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0), http://creativecommons.org/licenses/by-nc-sa/3.0/, and PUB
9. English gustatory adjectives and lexical synaesthesia - data analysis
- Creator:
- Jurčević, Jana
- Publisher:
- Faculty of Humanities and Social Sciences, University of Rijeka
- Type:
- text, wordList, and lexicalConceptualResource
- Subject:
- lexical synaesthesia, metaphorical collocations, metonymy, cross-modal mapping, and embodiment
- Language:
- English
- Description:
- Data collection has been done by the means of Sketch Engine program. Data were extrapolated from the annotated English web corpus enTenTen20. Data collection and analysis has been done during the period of two months: April and May 2023. Recently, the enTenTen20 corpus has been updated to a newer version - enTenTen21. Nevertheless, the older version is still available, can be worked on and can be compared with the newer one. It has been noticed that the differences between the two versions of the English web corpus did not affect the results of this study. The only apparent difference was seen in slightly different numbers in frequency values for specific collocations. This was expected since the older version of web corpus consists of 36 billion words, while the new version counts 52 billion words. On the other hand, as noted above, these frequency deviations were not significant enough to refute the hypotheses. They have rather confirmed them once again. This study is one of the results of work on a larger scientific-research project called "Metaphorical collocations - syntagmatic relations between semantics and pragmatics". More information about the project is available on the following link: https://metakol.uniri.hr/en/opis-projekta/ The study has been financed by the Croatian science foundation. Working with the data/replicating the study: Data collected for the purposes of this study is available in CSV format. Data for each gustatory adjective (collocate) is presented in a separate CSV file. Upon opening each file, stretch the borders of every column for better visibility of data. Tables show different collocational bases (nouns) which are found in the corpus, in combination with a specific gustatory adjective, their collocate. These nouns are listed by their score number (The Mutual Information score expresses the extent to which words co-occur compared to the number of times they appear separately). Tables show what type of mapping is present in a certain collocation (e.g., intra-modal or cross-modal). Tables show what type of meaning or cognitive process is working in the background of the meaning formation (e.g., metonymic or metaphoric). For every analyzed collocation, we provided a contextualized example of its use from the corpus, along with the hyperlink where it can be found.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
10. FAspell
- Creator:
- QasemiZadeh, Behrang
- Publisher:
- Behrang-QasemiZadeh
- Type:
- text, wordList, and lexicalConceptualResource
- Subject:
- spellchecking, spellchecker, and Evaluation Dataset for Automatic Spell Checking
- Language:
- Persian
- Description:
- FASpell dataset was developed for the evaluation of spell checking algorithms. It contains a set of pairs of misspelled Persian words and their corresponding corrected forms similar to the ASpell dataset used for English. The dataset consists of two parts: a) faspell_main: list of 5050 pairs collected from errors made by elementary school pupils and professional typists. b) faspell_ocr: list of 800 pairs collected from the output of a Farsi OCR system.
- Rights:
- Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB