Show simple item record

 
dc.contributor.author Tufiş, Dan
dc.contributor.author Ceauşu, Alexandru
dc.date.accessioned 2014-07-30T21:34:21Z
dc.date.available 2014-07-30T21:34:21Z
dc.date.issued 2014-07-30
dc.identifier.uri http://hdl.handle.net/11372/LRT-1299
dc.description MEBA is a lexical aligner, implemented in C#, based on an iterative algorithm that uses pre-processing steps: sentence alignment ([[http://www.clarin.eu/tools/sal-sentence-aligner|SAL]]), tokenization, POS-tagging and lemmatization (through [[http://www.clarin.eu/tools/ttl-tokenizing-tagging-and-lemmatizing-free-running-texts|TTL]], sentence chunking. Similar to YAWA aligner, MEBA generates the links step by step, beginning with the most probable (anchor links). The links to be added at any later step are supported or restricted by the links created in the previous iterations. The aligner has different weights and different significance thresholds on each feature and iteration. Each of the iterations can be configured to align different categories of tokens (named entities, dates and numbers, content words, functional words, punctuation) in decreasing order of statistical evidence. MEBA has an individual F-measure of 81.71% and it is currently integrated in the platform [[http://www.clarin.eu/tools/cowal-combined-word-aligner|COWAL]]. More detailed descriptions are available in [[http://www.racai.ro/~tufis/papers|the following papers]]: -- Dan Tufiş (2007). Exploiting Aligned Parallel Corpora in Multilingual Studies and Applications. In Toru Ishida, Susan R. Fussell, and Piek T.J.M. Vossen (eds.), Intercultural Collaboration. First International Workshop (IWIC 2007), volume 4568 of Lecture Notes in Computer Science, pp. 103-117. Springer-Verlag, August 2007. ISBN 978-3-540-73999-9. -- -- Dan Tufiş, Radu Ion, Alexandru Ceauşu, and Dan Ştefănescu (2006). Improved Lexical Alignment by Combining Multiple Reified Alignments. In Toru Ishida, Susan R. Fussell, and Piek T.J.M. Vossen (eds.), Proceedings of the 11th Conference EACL2006, pp. 153-160, Trento, Italy, April 2006. Association for Computational Linguistics. ISBN 1-9324-32-61-2. -- Dan Tufiş, Radu Ion, Alexandru Ceauşu, and Dan Ştefănescu (2005). Combined Aligners. In Proceedings of the ACL Workshop on Building and Using Parallel Texts: Data-Driven Machine Translation and Beyond, pp. 107-110, Ann Arbor, USA, June 2005. Association for Computational Linguistics. ISBN 978-973-703-208-9.
dc.language.iso eng
dc.language.iso ron
dc.publisher Research Institute for Artificial Intelligence, Romanian Academy of Sciences
dc.subject word aligner
dc.title MEBA word aligner
dc.type toolService
has.files no
additional.metadata Language(s) of input data (field_tool_input_language):English||Romanian Implementation language(s) (field_tool_implementation_langu):C# Availibility (field_tool_availibility):research purposes Nid:1117 Open source code (field_tool_open_source_code):no Language(s) of output data (field_tool_output_language):English||Romanian
branding LRT + Open Submissions
dc.coverage.placeName Romania
files.size 0
files.count 0


Show simple item record