Subject: text mining - LINDAT/CLARIAH-CZ Catalog Search Results

Start Over Subject text mining

Creator:: Gadri, S. and Moussaoui, A.
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: N -grams, language identification, text categorization, text mining, machine learning, Kullback-Leibler distance, X2 distance, and Cavnar-Trenkle distance
Language:: English
Description:: Automatic text classification is a very important task that consists in assigning labels (categories, groups, classes) to a given text based on a set of previously labeled texts called training set. The work presented in this paper treats the problem of automatic topical text categorization. It is a supervised classification because it works on a predefined set of classes and topical because it uses topics or subjects of texts as classes. In this context, we used a new approach based on $k$-NN algorithm, as well as a new set of pseudo-distances (distance metrics) known in the field of language identification. We also proposed a simple and effective method to improve the quality of performed categorization.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

Creator:: Heiden, Serge
Publisher:: ENS de Lyon - CNRS, ICAR Laboratory and Université de Franche-Compté, laboratoire ELLIADD (Edition, Littératures, Langages, Informatique, Arts, Didactique, Discours)
Type:: tool and toolService
Subject:: textometry, xml, tei, nlp, cqp, r, textual data analysis, statistical text analysis, text mining, and concordance
Description:: TXM is a free and open-source cross-platform Unicode & XML based text/corpus analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in.
Rights:: Not specified

Limit your search