When dealing with the curse of dimensionality (small sample size with many dimensions), feature selection is an important preprocessing strategy for the analysis of biomedical data. This issue is particularly germane to the classification of high-dimensional class-labeled biomedical spectra as is often acquired from magnetic resonance and infrared spectrometers. A technique is presented that stochastically selects feature subsets with varying cardinality for automated discrimination using two types of neural network classifiers. The results are benchmarked against classifiers using the entire feature set with and without averaging. Stochastic feature subset selection had significantly fewer misclassifications than either of the benchmarks.
This work is motivated by the interest in feature selection that greatly affects the detection accuracy of a classifier. The goals of this paper are (i) identifying optimal feature subset using a novel wrapper based feature selection algorithm called Shapley Value Embedded Genetic Algorithm (SVEGA), (ii) showing the improvement in the detection accuracy of the Artificial Neural Network (ANN) classifier with the optimal features selected, (iii) evaluating the performance of proposed SVEGA-ANN model on the medical datasets. The medical diagnosis system has been built using a wrapper based feature selection algorithm that attempts to maximize the specificity and sensitivity (in turn the accuracy) as well as by employing an ANN for classification. Two memetic operators namely include and remove features (or genes) are introduced to realize the genetic algorithm (GA) solution. The use of GA for feature selection facilitates quick improvement in the solution through a fine tune search. An extensive experimental evaluation of the proposed SVEGA-ANN method on 26 benchmark datasets from UCI Machine Learning repository and Kent ridge repository, with three conventional classifiers, outperforms state-of-the-art systems in terms of classification accuracy, number of selected features and running time.
The purpose of feature selection in machine learning is at least two-fold - saving measurement acquisition costs and reducing the negative effects of the curse of dimensionality with the aim to improve the accuracy of the models and the classification rate of classifiers with respect to previously unknown data. Yet it has been shown recently that the process of feature selection itself can be negatively affected by the very same curse of dimensionality - feature selection methods may easily over-fit or perform unstably. Such an outcome is unlikely to generalize well and the resulting recognition system may fail to deliver the expectable performance. In many tasks, it is therefore crucial to employ additional mechanisms of making the feature selection process more stable and resistant the curse of dimensionality effects. In this paper we discuss three different approaches to reducing this problem. We present an algorithmic extension applicable to various feature selection methods, capable of reducing excessive feature subset dependency not only on specific training data, but also on specific criterion function properties. Further, we discuss the concept of criteria ensembles, where various criteria vote about feature inclusion/removal and go on to provide a general definition of feature selection hybridization aimed at combining the advantages of dependent and independent criteria. The presented ideas are illustrated through examples and summarizing recommendations are given.
The paper gives an overview of feature selection techniques in statistical pattern recognition with particular emphasis on methods developed within the Institute of Information Theory and Automation research team throughout recent years. Besides discussing the advances in methodology since times of Perez's pioneering work the paper attempts to put the methods into a taxonomical framework. The methods discussed include the latest variants of the optimal algorithms, enhanced sub-optimal techniques and the simultaneous semi-parametric probability density function modelling and feature space selection method. Some related issues are illustrated on real data by means of the Feature Selection Toolbox software.
This paper deals with the application of saliency analysis to Support
Vector Machines (SVMs) for feature selection. The importance of feature is ranked by evaluating the sensitivity of the network output to the feature input in terms of the partial derivative. A systematic approach to remove irrelevant features based on the sensitivity is developed. Two simulated non-linear time series and five real financial time series are examined in the experiment. Based on the simulation results, it is shown that that saliency analysis is effective in SVMs for identifying important features.