Angabe von Wort, Anzahl, Häufigkeitsklasse, Beschreibung, Sachgebiet, Morphologie, Relationen zu anderen Wörtern (z. B. Synonymie), Links zu anderen Wörtern, Dornseiff-Bedeutungsgruppen, Beispielen (u.a. entnommen aus spiegel.de, sueddeutsche.de), signifikanten Kookkurenzen, signifikanten linken und rechten Nachbarn
Tool for designing and performing Word Sense Disambiguation (WSD) experiments. Current version (prototype) facilitates the construction and evaluation of WSD methods in the supervised Machine Learning paradigm.
Xaira is the current name for a new version of SARA, the text searching software originally developed at OUCS for use with the British National Corpus. This new version has been entirely re-written as a general purpose XML search engine, which will operate on any corpus of well-formed XML documents. It is however best used with TEI-conformant documents.
XSH is a powerfull command-line tool for querying, processing and editing XML documents. It features a shell-like interface with auto-completion for comfortable interactive work, but can be as well used for off-line (batch) processing of XML data.
YAWA is a four stage lexical aligner that uses bilingual translation lexicons produced by [[http://www.clarin.eu/tools/translation-equivalents-extractor|TREQ]] and phrase boundaries detection to align words of a given bitext. Using this alignment, in stage 2 a language dependent module takes over and produces alignments of the remaining lexical tokens within aligned chunks. Stage 3 is specialized in aligning blocks of consecutive unaligned tokens and stage 4 deletes alignments that are likely to be wrong.
Developed in PERL, YAWA is language independent, except for the modules that realise alignments specific to the pairs of aligned languages. So far, it works just for Ro-En pair of languages. It requires a parallel corpus in [[http://www.xces.org|XCES]] format, morpho-syntactically annotated and lemmatized (using [[http://www.clarin.eu/tools/ttl-tokenizing-tagging-and-lemmatizing-free-running-texts|TTL]]), and translation dictionaries produced by [[http://www.clarin.eu/tools/translation-equivalents-extractor|TREQ]].
YAWA’s individual F-measure is 81.22%. Currently YAWA is a part of the [[http://www.clarin.eu/tools/cowal-combined-word-aligner|COWAL]] combined lexical alignment platform.
More detailed descriptions are available in [[http://www.racai.ro/~tufis/papers|the following papers]]:
-- Radu Ion (2007). Word Sense Disambiguation Methods Applied to English and Romanian. (in Romanian). PhD thesis. Romanian Academy, Bucharest
-- Dan Tufiş (2007). Exploiting Aligned Parallel Corpora in Multilingual Studies and Applications. In Toru Ishida, Susan R. Fussell, and Piek T.J.M. Vossen (eds.), Intercultural Collaboration. First International Workshop (IWIC 2007), volume 4568 of Lecture Notes in Computer Science, pp. 103-117. Springer-Verlag, August 2007. ISBN 978-3-540-73999-9.
-- Dan Tufiş, Radu Ion, Alexandru Ceauşu, and Dan Ştefănescu (2006). Improved Lexical Alignment by Combining Multiple Reified Alignments. In Toru Ishida, Susan R. Fussell, and Piek T.J.M. Vossen (eds.), Proceedings of the 11th Conference EACL2006, pp. 153-160, Trento, Italy, April 2006. Association for Computational Linguistics. ISBN 1-9324-32-61-2.
A selection of poetic texts (71,490 words) from the Old English Section of the Helsinki Corpus of English Texts, syntactically and morphologically annotated.
Segment from Český zvukový týdeník Aktualita (Czech Aktualita Sound Newsreel) issue no. 51B from 1943 depicts the Youth Basketball Championship organised by the Board of Trustees for the Education of Youth and held in the Great Hall of Lucerna Palace in Prague from 10 to 12 December. The boys´ final was won by the Central Bohemia I team, who beat the Brno Region I team 27:13. The girls´ final was won by the Brno Region I team, who beat the team from Polabí 17:5.