Subject: machine learning - LINDAT/CLARIAH-CZ Catalog Search Results

Start Over Subject machine learning

1. A new classification algorithm: optimally generalized learning vector quantization (OGLVQ)

Creator:: Temel, T.
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: machine learning, classification, learning vector quantization, self-organized mapping, supervised learning, and unsupervised learning
Language:: English
Description:: We present a new Generalized Learning Vector Quantization classifier called Optimally Generalized Learning Vector Quantization based on a novel weight-update rule for learning labeled samples. The algorithm attains stable prototype/weight vector dynamics in terms of estimated current and previous weights and their updates. Resulting weight update term is then related to the proximity measure used by Generalized Learning Vector Quantization classifiers. New algorithm and some major counterparts are tested and compared for synthetic and publicly available datasets. For both the datasets studied, it is seen that the new classifier outperforms its counterparts in training and testing with accuracy above 80% its counterparts and in robustness against model parameter varition.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

2. An efficient hybrid machine learning method for time series stock market forecasting

Creator:: Ebadati E., Omid Mahdi and Mortazavi T., Mohammad
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: time series forecasting, stock price prediction, genetic algorithm, back propagation, neural network, and machine learning
Language:: English
Description:: Time series forecasting, such as stock price prediction, is one of the most important complications in the financial area as data is unsteady and has noisy variables, which are affected by many factors. This study applies a hybrid method of Genetic Algorithm (GA) and Artificial Neural Network (ANN) technique to develop a method for predicting stock price and time series. In the GA method, the output values are further fed to a developed ANN algorithm to fix errors on exact point. The analysis suggests that the GA and ANN can increase the accuracy in fewer iterations. The analysis is conducted on the 200-day main index, as well as on five companies listed on the NASDAQ. By applying the proposed method to the Apple stocks dataset, based on a hybrid model of GA and Back Propagation (BP) algorithms, the proposed method reaches to 99.99% improvement in SSE and 90.66% in time improvement, in comparison to traditional methods. These results show the performances and the speed and the accuracy of the proposed approach.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

3. An electronic performance support system based on a hybrid content-collaborative recommender system

Creator:: Basile, Pierpaolo , de Gemmis, Marco, Gentile, Anna Lisa, Iaquinta , Leo, Lops , Pasquale, and Semeraro, Giovanni
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: User modeling, collaboritave filtering, content-based filtering, hybrid recommenders, machine learning, neighborhood formation in recommender systems, EPSS, and WordNet
Language:: English
Description:: An Electronic Performance Support System (EPSS) introduces challenges on contextualized and personalized information delivery. Recommender systems aim at delivering and suggesting relevant information according to users preferences, thus EPSSs could take advantage of the recommendation algorithms that have the effect of guiding users in a large space of possible options. The JUMP project (JUst-in-tiMe Performance support systém for dynamic organizations, co-funded by POR Puglia 2000-2006 - Mis. 3.13, Sostegno agli Investimenti in Ricerca Industriale, Sviluppo Precompetitivo e Trasferimento Tecnologico) aims at integrating an EPSS with a hybrid recommender system. Collaborative and content-based filtering are the recommendation techniques most widely adopted to date. The main contribution of this paper is a content-collaborative hybrid recommender which computes similarities between users relying on their content-based profiles in which user preferences are stored, instead of comparing their rating styles. A distinctive feature of our systém is that a statistical model of the user interests is obtained by machine learning techniques integrated with linguistic knowledge contained in WordNet. This model, named ``semantic user profile'', is exploited by the hybrid recommender in the neighborhood formation process.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

4. Application of a New Set of Pseudo-Distances in Documents Categorization

Creator:: Gadri, S. and Moussaoui, A.
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: N -grams, language identification, text categorization, text mining, machine learning, Kullback-Leibler distance, X2 distance, and Cavnar-Trenkle distance
Language:: English
Description:: Automatic text classification is a very important task that consists in assigning labels (categories, groups, classes) to a given text based on a set of previously labeled texts called training set. The work presented in this paper treats the problem of automatic topical text categorization. It is a supervised classification because it works on a predefined set of classes and topical because it uses topics or subjects of texts as classes. In this context, we used a new approach based on $k$-NN algorithm, as well as a new set of pseudo-distances (distance metrics) known in the field of language identification. We also proposed a simple and effective method to improve the quality of performed categorization.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

5. Cellular generic programming algorithm applied to classification task

Creator:: Takac, Alexandra
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: genetic programming, cellular genetic algorithms, data mining, classification, evolutionary algorithms, knowledge discovery in databases, and machine learning
Language:: English
Description:: The focus of this paper is the application of the genetic programming framework in the problem of knowledge discovery in databases, more precisely in the task of classification. Genetic programming possesses certain advantages that make it suitable for application in data mining, such as robustness of the algorithm or its convenient structure for rule generation to name a few. This study concentrates on one type of parallel genetic algorithms - cellular (diffusion) model. Emphasis is placed on the improvement of efficiency and scalability of the data mining algorithm, which could be achieved by integrating the algorithm with databases and employing a cellular framework. The cellular model of genetic programming that exploits SQL queries is implemented and applied to the classification task. The results achieve are presented and compared with other machine learning algorithms.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

6. Controls on event runoff coefficients and recession coefficients for different runoff generation mechanisms identified by three regression methods

Creator:: Chen, Xiaofei, Parajka, Juraj, Széles, Borbála , Strauss, Peter , and Blöschl, Günter
Format:: počítač and online zdroj
Type:: model:article and TEXT
Subject:: machine learning, event runoff analyses, event runoff coefficient, recession coefficient, and runoff generation
Language:: Slovak
Description:: The event runoff coefficient (Rc) and the recession coefficient (tc) are of theoretical importance for understanding catchment response and of practical importance in hydrological design. We analyse 57 event periods in the period 2013 to 2015 in the 66 ha Austrian Hydrological Open Air Laboratory (HOAL), where the seven subcatchments are stratified by runoff generation types into wetlands, tile drainage and natural drainage. Three machine learning algorithms (Random forest (RF), Gradient Boost Decision Tree (GBDT) and Support vector machine (SVM)) are used to estimate Rc and tc from 22 event based explanatory variables representing precipitation, soil moisture, groundwater level and season. The model performance of the SVM algorithm in estimating Rc and tc is generally higher than that of the other two methods, measured by the coefficient of determination R2, and the performance for Rc is higher than that for tc. The relative importance of the explanatory variables for the predictions, assessed by a heatmap, suggests that Rc of the tile drainage systems is more strongly controlled by the weather conditions than by the catchment state, while the opposite is true for natural drainage systems. Overall, model performance strongly depends on the runoff generation type.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

7. Corpus of contemporary blogs

Creator:: Grác, Marek
Publisher:: Masaryk University, NLP Centre
Type:: text and corpus
Subject:: corpus, blogs, annotation, annotators, sentences, and machine learning
Language:: Czech
Description:: In NLP Centre, dividing text into sentences is currently done with a tool which uses rule-based system. In order to make enough training data for machine learning, annotators manually split the corpus of contemporary text CBB.blog (1 million tokens) into sentences. Each file contains one hundredth of the whole corpus and all data were processed in parallel by two annotators. The corpus was created from ten contemporary blogs: hintzu.otaku.cz modnipeklo.cz bloc.cz aleneprokopova.blogspot.com blog.aktualne.cz fuchsova.blog.onaidnes.cz havlik.blog.idnes.cz blog.aktualne.centrum.cz klusak.blogspot.cz myego.cz/welldone
Rights:: Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0), http://creativecommons.org/licenses/by-nc-nd/3.0/, and PUB

8. Improving feature selection process resistance to failures caused by curse-of-dimensionality effects

Creator:: Somol, Petr, Grim, Jiří, Novovičová, Jana, and Pudil, Pavel
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: feature selection, curse of dimensionality, over-fitting, stability, machine learning, and dimensionality reduction
Language:: English
Description:: The purpose of feature selection in machine learning is at least two-fold - saving measurement acquisition costs and reducing the negative effects of the curse of dimensionality with the aim to improve the accuracy of the models and the classification rate of classifiers with respect to previously unknown data. Yet it has been shown recently that the process of feature selection itself can be negatively affected by the very same curse of dimensionality - feature selection methods may easily over-fit or perform unstably. Such an outcome is unlikely to generalize well and the resulting recognition system may fail to deliver the expectable performance. In many tasks, it is therefore crucial to employ additional mechanisms of making the feature selection process more stable and resistant the curse of dimensionality effects. In this paper we discuss three different approaches to reducing this problem. We present an algorithmic extension applicable to various feature selection methods, capable of reducing excessive feature subset dependency not only on specific training data, but also on specific criterion function properties. Further, we discuss the concept of criteria ensembles, where various criteria vote about feature inclusion/removal and go on to provide a general definition of feature selection hybridization aimed at combining the advantages of dependent and independent criteria. The presented ideas are illustrated through examples and summarizing recommendations are given.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

9. Machine learning and deep learning approaches in breast cancer survival prediction using clinical data

Creator:: Kalafi, E. Y. , Nor, N. A. M. , Taib, N. A., Ganggayah, M. D. , Town, C., and Dhillon, S. K.
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: breast cancer, survival prediction, deep learning, and machine learning
Language:: English
Description:: Breast cancer survival prediction can have an extreme effect on selection of best treatment protocols. Many approaches such as statistical or machine learning models have been employed to predict the survival prospects of patients, but newer algorithms such as deep learning can be tested with the aim of improving the models and prediction accuracy. In this study, we used machine learning and deep learning approaches to predict breast cancer survival in 4,902 patient records from the University of Malaya Medical Centre Breast Cancer Registry. The results indicated that the multilayer perceptron (MLP), random forest (RF) and decision tree (DT) classifiers could predict survivorship, respectively, with 88.2 %, 83.3 % and 82.5 % accuracy in the tested samples. Support vector machine (SVM) came out to be lower with 80.5 %. In this study, tumour size turned out to be the most important feature for breast cancer survivability prediction. Both deep learning and machine learning methods produce desirable prediction accuracy, but other factors such as parameter configurations and data transformations affect the accuracy of the predictive model.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

10. On the road to genetic Boolean matrix factorization

Creator:: Snášel, Václav, Platoš, Jan, Krömer, Pavel, Húsek, Dušan, and Frolov, Alexander A.
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: Data mining, genetic algorithms, Boolean factorization, binary data, machine learning, and facture extraction
Language:: English
Description:: Matrix factorization or factor analysis is an important task helpful in the analysis of high dimensional real world data. There are several well known methods and algorithms for factorization of real data but many application areas including information retrieval, pattern recognition and data mining require processing of binary rather than real data. Unfortunately, the methods used for real matrix factorization fail in the latter case. In this paper we introduce background and initial version of Genetic Algorithm for binary matrix factorization.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

1. A new classification algorithm: optimally generalized learning vector quantization (OGLVQ)

2. An efficient hybrid machine learning method for time series stock market forecasting

3. An electronic performance support system based on a hybrid content-collaborative recommender system

4. Application of a New Set of Pseudo-Distances in Documents Categorization

5. Cellular generic programming algorithm applied to classification task

6. Controls on event runoff coefficients and recession coefficients for different runoff generation mechanisms identified by three regression methods

7. Corpus of contemporary blogs

8. Improving feature selection process resistance to failures caused by curse-of-dimensionality effects

9. Machine learning and deep learning approaches in breast cancer survival prediction using clinical data

10. On the road to genetic Boolean matrix factorization

Limit your search

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Coverage

Creator

Show values starting with

Format

Language

Publisher

Rights

Show values starting with

Subject

Show values starting with

Type

Date

Original context has metadata only

Harvested from