This work is motivated by the interest in feature selection that greatly affects the detection accuracy of a classifier. The goals of this paper are (i) identifying optimal feature subset using a novel wrapper based feature selection algorithm called Shapley Value Embedded Genetic Algorithm (SVEGA), (ii) showing the improvement in the detection accuracy of the Artificial Neural Network (ANN) classifier with the optimal features selected, (iii) evaluating the performance of proposed SVEGA-ANN model on the medical datasets. The medical diagnosis system has been built using a wrapper based feature selection algorithm that attempts to maximize the specificity and sensitivity (in turn the accuracy) as well as by employing an ANN for classification. Two memetic operators namely include and remove features (or genes) are introduced to realize the genetic algorithm (GA) solution. The use of GA for feature selection facilitates quick improvement in the solution through a fine tune search. An extensive experimental evaluation of the proposed SVEGA-ANN method on 26 benchmark datasets from UCI Machine Learning repository and Kent ridge repository, with three conventional classifiers, outperforms state-of-the-art systems in terms of classification accuracy, number of selected features and running time.
Intrusion detection systems (IDSs) are designed to distinguish normal and intrusive activities. A critical part of the IDS design depends on the selection of informative features and the appropriate machine learning technique. In this paper, we investigated the problem of IDS from these two perspectives and constructed a misuse based neurotree classiffier capable of detecting anomalies in networks. The major implications of this paper are a) Employing weighted sum genetic feature extraction process which provides better discrimination ability for detecting anomalies in network trafic; b) Realizing the system as a rule-based model using an ensemble efficient machine learning technique, neurotree which possesses better comprehensibility and generalization ability; c) Utilizing an activation function which is targeted at minimizing the error rates in the learning algorithm. An extensive experimental evaluation on a database containing normal and anomaly trafic patterns shows that the proposed scheme with the selected features and the chosen classiffier is a state-of-the-art IDS that outperforms previous IDS methods.