Comprehensive Arabic LEMmas is a lexicon covering a large list of Arabic lemmas and their corresponding inflected word forms (stems) with details (POS + Root). Each lexical entry represents a lemma followed by all its possible stems and each stem is enriched by its morphological features especially the root and the POS.
It is composed of 164,845 lemmas representing 7,200,918 stems, detailed as follow:
757 Arabic particles
2,464,631 verbal stems
4,735,587 nominal stems
The lexicon is provided as an LMF conformant XML-based file in UTF8 encoding, which represents about 1,22 Gb of data.
Citation:
– Namly Driss, Karim Bouzoubaa, Abdelhamid El Jihad, and Si Lhoussain Aouragh. “Improving Arabic Lemmatization Through a Lemmas Database and a Machine-Learning Technique.” In Recent Advances in NLP: The Case of Arabic Language, pp. 81-100. Springer, Cham, 2020.
Angabe von orthographischen, morphologischen (Wortformenbildung und Wortbildung) sowie semantischen Informationen (Synonymie; Hyperonymie/Hyponymie); Zuordnung der Wörter zu der jeweiligen syntaktischen Kategorie (bei Substantiven zusätzlich Angabe des Genus)
Sanskrit lexicons. The data is made available as scanned images of the works as well as a digitization of the scanned images, which permits computer-aided analyses and displays of the work. Can be downloaded or queried online.
An XML-based file containing the electronic version of al logha al arabia al moassira (Contemporary Arabic) dictionary. An Arabic monolingual dictionary accomplished by Ahmed Mukhtar Abdul Hamid Omar (deceased: 1424) with the help of a working group
This bilingual thesaurus (French-English), developed at Inist-CNRS, covers the concepts from the emerging COVID-19 outbreak which reminds the past SARS coronavirus outbreak and Middle East coronavirus outbreak. This thesaurus is based on the vocabulary used in scientific publications for SARS-CoV-2 and other coronaviruses, like SARS-CoV and MERS-CoV. It provides a support to explore the coronavirus infectious diseases. The thesaurus can be browsed and queried by humans and machines on the Loterre portal (https://www.loterre.fr), via an API and an rdf triplestore. It is also downloadable in PDF, SKOS, csv and json-ld formats. The thesaurus is made available under a CC-by 4.0 license.