CALEM (Comprehensive Arabic LEMmas)
Please use the following text to cite this item or export to a predefined format:
Namly, Driss; Bouzoubaa, Karim and El Jihad, Abdelhamid, 2020,
CALEM (Comprehensive Arabic LEMmas), LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11372/LRT-5102.
Authors
Item identifier
Date issued
2020-10-16
Size
7200918 entries
Language(s)
Description
Comprehensive Arabic LEMmas is a lexicon covering a large list of Arabic lemmas and their corresponding inflected word forms (stems) with details (POS + Root). Each lexical entry represents a lemma followed by all its possible stems and each stem is enriched by its morphological features especially the root and the POS.
It is composed of 164,845 lemmas representing 7,200,918 stems, detailed as follow:
757 Arabic particles
2,464,631 verbal stems
4,735,587 nominal stems
The lexicon is provided as an LMF conformant XML-based file in UTF8 encoding, which represents about 1,22 Gb of data.
Citation:
– Namly Driss, Karim Bouzoubaa, Abdelhamid El Jihad, and Si Lhoussain Aouragh. “Improving Arabic Lemmatization Through a Lemmas Database and a Machine-Learning Technique.” In Recent Advances in NLP: The Case of Arabic Language, pp. 81-100. Springer, Cham, 2020.
Publisher
Subject(s)
Collections
This item isPublicly Available
and licensed under:
Files in this item
- Name
- CALEM.xml
- Size
- 20.02 KB
- Format
- text/xml
- Description
- XML
- MD5
- 1c6a96459de872b9beaa28050bea46eb

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz

