LINDAT/CLARIAH-CZ logo
  • Catalog
  • Repository
  • Education
  • Projects
  • Tools
  • Services
  • About
    Partners Mission Statement CLARIN DARIAH Service integrations Project partnerships
  • DARIAH logo
  • CLARIN logo
  •  Login
  • English čeština
  • LINDAT/CLARIAH-CZ Repository Home
  • View Item
  •  
  • LINDAT/CLARIAH-CZ logo
    CLARIN logo
  •   Browse  
    •    All of the Repository  
      •   Issue Date
      •   Authors
      •   Titles
      •   Subjects
      •   Publisher
      •   Language
      •   Type
      •   Rights Label
  •   My Account  
    •    Login
  •   Statistics  
    •    StatisticsBETA
  •   General Information  
    •    Deposit
    •    Cite
    •    Submission Lifecycle
    •    FAQ
    •    About
    •    Help Desk
 
 

Arabic ACL corpus

 
LRT + Open Submissions
  Authors
Salah Elfahal Elebaed, Hoyam ; Kasbi, Mohammed ; Nasri, Mohammed and Bouzoubaa, Karim
  Item identifier
http://hdl.handle.net/11372/LRT-4768
 Project URL
http://arabic.emi.ac.ma/alelm/?q=Resources
 Demo URL
http://arabic.emi.ac.ma/alelm/?q=Resources
 Referenced by
http://www.ijcstjournal.org/volume-9/issue-6/IJCST-V9I6P8.pdf
 Date issued
2021
 Type
corpus, text
 Size
197 kb
 Language(s)
Arabic
 Description
This corpus constitutes all sentences representing the Arabic Controlled Language (ACL). It contains 551 sentences taken from four textbooks and websites dedicated to teach Arabic language to kids such as: a) First grade book, Republic of Sudan (كتاب الصف الاول جمهورية السودان), b) Al Jazeera Educational Site (موقع الجزيرة التعليمي), c) Bella Preparatory School Girls Forum (منتدى مدرسة بيلا الاعدادية بنات), and d) Albahr website (موقع انا البحر). These sentences are respecting 52 ACL rules. The average number of sentences for each rule is 10.6. All sentences in the corpus were analyzed by Farasa syntactic parser to confirm they are correctly analyzed. The validity of the parsing was done manually by linguist experts. The structure of this corpus is made of a header and a body. The header consists of a set of metadata that describe the corpus, such as the corpus name, the authors, the sources and further meta data. While the header is made of metadata, the body contains rules. Each rule has a code, a structure and all sentences respecting that rule. For each sentence, we store an id, the vowelledand unvowelled text as well as the result of parsing using Farasa.
 Publisher
International Journal of Computer Science Trends and Technology (IJCST)
 Acknowledgement

No

Project code: No

Project name: No

 Subject(s)
Controlled Natural Language Arabic CNL ACL Arabic Corpus and TEI.
 Collection(s)
LRT + Open Submissions Data & Tools
Show full item record
 
 

LINDAT/CLARIAH-CZ

  • Mission Statement
  • Advisory Board
  • Events
  • CLARIN Participation
  • DARIAH Participation

  • FAQ
  • Helpdesk
  • User Feedback Form

  • Acknowledge LINDAT/CLARIAH-CZ

Partners

  • Charles University
    • Faculty of Mathematics and Physics
    • Faculty of Arts
  • Masaryk University
    • Faculty of Arts
    • Faculty of Informatics
  • University of West Bohemia
    • Faculty of Applied Sciences
  • Terezín
    • Terezín Initiative Institute
    • Terezín Memorial
  • Czech Academy of Sciences
    • Czech Language Institute
    • Library of Academy
    • Institute of History
    • Institute of Philosophy
    • Masaryk Institute and Archives
  • Archives, Libraries and Galleries
    • National Library of the Czech Republic
    • Moravian Library in Brno
    • National Gallery Prague
    • National Film Archive
    • National Archives

Services

  • Service Status
  • About and Policies
  • Terms of Use
CLARIN CENTRE B CLARIN CENTRE K CoreTrustSeal Certification
Follow us on Twitter Link to Profile Home Page
THE LINDAT/CLARIAH-CZ PROJECT (LM2023062; formerly LM2010013, LM2015071, LM2018101) IS FULLY SUPPORTED BY THE MINISTRY OF EDUCATION, SPORTS AND YOUTH OF THE CZECH REPUBLIC UNDER THE PROGRAMME LM OF "LARGE INFRASTRUCTURES"
Icons © Smashicons and Freepik from flaticon.com licensed by CC 3.0 BY
website © 2023 by ÚFAL