YAWA - Yet Another Word Aligner

dc.contributor.other	Tufiş, Dan
dc.contributor.other	Ion, Radu
dc.date.accessioned	2014-07-30T21:34:22Z
dc.date.available	2014-07-30T21:34:22Z
dc.date.issued	2014-07-30
dc.identifier.uri	http://hdl.handle.net/11372/LRT-1300
dc.description	YAWA is a four stage lexical aligner that uses bilingual translation lexicons produced by [[http://www.clarin.eu/tools/translation-equivalents-extractor\|TREQ]] and phrase boundaries detection to align words of a given bitext. Using this alignment, in stage 2 a language dependent module takes over and produces alignments of the remaining lexical tokens within aligned chunks. Stage 3 is specialized in aligning blocks of consecutive unaligned tokens and stage 4 deletes alignments that are likely to be wrong. Developed in PERL, YAWA is language independent, except for the modules that realise alignments specific to the pairs of aligned languages. So far, it works just for Ro-En pair of languages. It requires a parallel corpus in [[http://www.xces.org\|XCES]] format, morpho-syntactically annotated and lemmatized (using [[http://www.clarin.eu/tools/ttl-tokenizing-tagging-and-lemmatizing-free-running-texts\|TTL]]), and translation dictionaries produced by [[http://www.clarin.eu/tools/translation-equivalents-extractor\|TREQ]]. YAWA’s individual F-measure is 81.22%. Currently YAWA is a part of the [[http://www.clarin.eu/tools/cowal-combined-word-aligner\|COWAL]] combined lexical alignment platform. More detailed descriptions are available in [[http://www.racai.ro/~tufis/papers\|the following papers]]: -- Radu Ion (2007). Word Sense Disambiguation Methods Applied to English and Romanian. (in Romanian). PhD thesis. Romanian Academy, Bucharest -- Dan Tufiş (2007). Exploiting Aligned Parallel Corpora in Multilingual Studies and Applications. In Toru Ishida, Susan R. Fussell, and Piek T.J.M. Vossen (eds.), Intercultural Collaboration. First International Workshop (IWIC 2007), volume 4568 of Lecture Notes in Computer Science, pp. 103-117. Springer-Verlag, August 2007. ISBN 978-3-540-73999-9. -- Dan Tufiş, Radu Ion, Alexandru Ceauşu, and Dan Ştefănescu (2006). Improved Lexical Alignment by Combining Multiple Reified Alignments. In Toru Ishida, Susan R. Fussell, and Piek T.J.M. Vossen (eds.), Proceedings of the 11th Conference EACL2006, pp. 153-160, Trento, Italy, April 2006. Association for Computational Linguistics. ISBN 1-9324-32-61-2.
dc.language.iso	eng
dc.language.iso	ron
dc.publisher	Research Institute for Artificial Intelligence, Romanian Academy of Sciences
dc.subject	word aligner
dc.title	YAWA - Yet Another Word Aligner
dc.type	toolService
has.files	no
additional.metadata	Documentation language(s) (field_tool_documentation_langua):English Language(s) of input data (field_tool_input_language):English\|\|Romanian Implementation language(s) (field_tool_implementation_langu):Perl Short name (field_tool_short_name):YAWA Availibility (field_tool_availibility):research purposes Nid:1118 Open source code (field_tool_open_source_code):no Language(s) of output data (field_tool_output_language):English\|\|Romanian
branding	LRT + Open Submissions
files.size	0
files.count	0

Show simple item record