LINDAT/CLARIAH-CZ logo
  • Catalog
  • Repository
  • Education
  • Projects
  • Tools
  • Services
  • About
    Partners Mission Statement CLARIN DARIAH Service integrations Project partnerships
  • DARIAH logo
  • CLARIN logo
  •  Login
  • English čeština
  • LINDAT/CLARIAH-CZ Repository Home
  • View Item
  •  
  • LINDAT/CLARIAH-CZ logo
    CLARIN logo
  •   Browse  
    •    All of the Repository  
      •   Issue Date
      •   Authors
      •   Titles
      •   Subjects
      •   Publisher
      •   Language
      •   Type
      •   Rights Label
  •   My Account  
    •    Login
  •   Statistics  
    •    StatisticsBETA
  •   General Information  
    •    Deposit
    •    Cite
    •    Submission Lifecycle
    •    FAQ
    •    About
    •    Help Desk
 
 

BulTreeBank Tokenizer

 
LRT + Open Submissions
  Authors
Simov, Kiril and Simov, Kiril
  Item identifier
http://hdl.handle.net/11372/LRT-1240
 Project URL
http://www.bultreebank.org/clark/index.html
 Date issued
2014-07-30
 Type
toolService
 Description
The tokenizer is covering all languages that use Latin1, Laitn2, Latin3 and Cyrillic tables of Unicode. Can be extended to cover other tables in Unicode if necessary. The implementation is as a cascaded regular grammar in CLaRK. It recognizes over 60 token categories. It is easy to be adapted to new token categories.
 Publisher
Linguistic Modeling Department, IPP, Bulgarian Academy of Sciences
 Collection(s)
LRT + Open Submissions Data & Tools
Show full item record
 
 

LINDAT/CLARIAH-CZ

  • Mission Statement
  • Advisory Board
  • Events
  • CLARIN Participation
  • DARIAH Participation

  • FAQ
  • Helpdesk
  • User Feedback Form

  • Hosting Institution
  • Acknowledge LINDAT/CLARIAH-CZ
  • Research Organization Registry

Partners

  • Charles University
    • Faculty of Mathematics and Physics
    • Faculty of Arts
  • Masaryk University
    • Faculty of Arts
    • Faculty of Informatics
  • University of West Bohemia
    • Faculty of Applied Sciences
  • Terezín
    • Terezín Initiative Institute
    • Terezín Memorial
  • Czech Academy of Sciences
    • Czech Language Institute
    • Library of Academy
    • Institute of History
    • Institute of Philosophy
    • Masaryk Institute and Archives
  • Archives, Libraries and Galleries
    • National Library of the Czech Republic
    • Moravian Library in Brno
    • National Gallery Prague
    • National Film Archive
    • National Archives

Services

  • Service Status
  • About and Policies
  • Terms of Use
CLARIN CENTRE B CLARIN CENTRE K CoreTrustSeal Certification
Follow us on Twitter Link to Profile Home Page
THE LINDAT/CLARIAH-CZ PROJECT (LM2023062; formerly LM2010013, LM2015071, LM2018101) IS FULLY SUPPORTED BY THE MINISTRY OF EDUCATION, SPORTS AND YOUTH OF THE CZECH REPUBLIC UNDER THE PROGRAMME LM OF "LARGE INFRASTRUCTURES"
Icons © Smashicons and Freepik from flaticon.com licensed by CC 3.0 BY
website © 2025 by ÚFAL