LINDAT/CLARIAH-CZ logo
  • Catalog
  • Repository
  • Education
  • Projects
  • Tools
  • Services
  • About
    Partners Mission Statement CLARIN DARIAH Service integrations Project partnerships
  • DARIAH logo
  • CLARIN logo
  •  Login
  • English čeština
  • LINDAT/CLARIAH-CZ Repository Home
  • View Item
  •  
  • LINDAT/CLARIAH-CZ logo
    CLARIN logo
  •   Browse  
    •    All of the Repository  
      •   Issue Date
      •   Authors
      •   Titles
      •   Subjects
      •   Publisher
      •   Language
      •   Type
      •   Rights Label
  •   My Account  
    •    Login
  •   Statistics  
    •    StatisticsBETA
  •   General Information  
    •    Deposit
    •    Cite
    •    Submission Lifecycle
    •    FAQ
    •    About
    •    Help Desk
 
 

YAWA - Yet Another Word Aligner

 
LRT + Open Submissions
  Authors
Tufiş, Dan and Ion, Radu
  Item identifier
http://hdl.handle.net/11372/LRT-1300
 Date issued
2014-07-30
 Type
toolService
 Language(s)
English , Romanian
 Description
YAWA is a four stage lexical aligner that uses bilingual translation lexicons produced by [[http://www.clarin.eu/tools/translation-equivalents-extractor|TREQ]] and phrase boundaries detection to align words of a given bitext. Using this alignment, in stage 2 a language dependent module takes over and produces alignments of the remaining lexical tokens within aligned chunks. Stage 3 is specialized in aligning blocks of consecutive unaligned tokens and stage 4 deletes alignments that are likely to be wrong. Developed in PERL, YAWA is language independent, except for the modules that realise alignments specific to the pairs of aligned languages. So far, it works just for Ro-En pair of languages. It requires a parallel corpus in [[http://www.xces.org|XCES]] format, morpho-syntactically annotated and lemmatized (using [[http://www.clarin.eu/tools/ttl-tokenizing-tagging-and-lemmatizing-free-running-texts|TTL]]), and translation dictionaries produced by [[http://www.clarin.eu/tools/translation-equivalents-extractor|TREQ]]. YAWA’s individual F-measure is 81.22%. Currently YAWA is a part of the [[http://www.clarin.eu/tools/cowal-combined-word-aligner|COWAL]] combined lexical alignment platform. More detailed descriptions are available in [[http://www.racai.ro/~tufis/papers|the following papers]]: -- Radu Ion (2007). Word Sense Disambiguation Methods Applied to English and Romanian. (in Romanian). PhD thesis. Romanian Academy, Bucharest -- Dan Tufiş (2007). Exploiting Aligned Parallel Corpora in Multilingual Studies and Applications. In Toru Ishida, Susan R. Fussell, and Piek T.J.M. Vossen (eds.), Intercultural Collaboration. First International Workshop (IWIC 2007), volume 4568 of Lecture Notes in Computer Science, pp. 103-117. Springer-Verlag, August 2007. ISBN 978-3-540-73999-9. -- Dan Tufiş, Radu Ion, Alexandru Ceauşu, and Dan Ştefănescu (2006). Improved Lexical Alignment by Combining Multiple Reified Alignments. In Toru Ishida, Susan R. Fussell, and Piek T.J.M. Vossen (eds.), Proceedings of the 11th Conference EACL2006, pp. 153-160, Trento, Italy, April 2006. Association for Computational Linguistics. ISBN 1-9324-32-61-2.
 Publisher
Research Institute for Artificial Intelligence, Romanian Academy of Sciences
 Subject(s)
word aligner
 Collection(s)
LRT + Open Submissions Data & Tools
Show full item record
 
 

LINDAT/CLARIAH-CZ

  • Mission Statement
  • Advisory Board
  • Events
  • CLARIN Participation
  • DARIAH Participation

  • FAQ
  • Helpdesk
  • User Feedback Form

  • Acknowledge LINDAT/CLARIAH-CZ

Partners

  • Charles University
    • Faculty of Mathematics and Physics
    • Faculty of Arts
  • Masaryk University
    • Faculty of Arts
    • Faculty of Informatics
  • University of West Bohemia
    • Faculty of Applied Sciences
  • Czech Academy of Sciences
    • Czech Language Institute
    • Library of Academy
    • Institute of History
    • Institute of Philosophy
  • Archives, Libraries and Galleries
    • National Library of the Czech Republic
    • Moravian Library in Brno
    • National Gallery Prague
    • National Film Archive

Services

  • Service Status
  • About and Policies
  • Terms of Use
CLARIN CENTRE B CLARIN CENTRE K CoreTrustSeal Certification
Follow us on Twitter Link to Profile Home Page
THE LINDAT/CLARIAH-CZ PROJECT (LM2023062; formerly LM2010013, LM2015071, LM2018101) IS FULLY SUPPORTED BY THE MINISTRY OF EDUCATION, SPORTS AND YOUTH OF THE CZECH REPUBLIC UNDER THE PROGRAMME LM OF "LARGE INFRASTRUCTURES"
Icons © Smashicons and Freepik from flaticon.com licensed by CC 3.0 BY
website © 2023 by ÚFAL