The GAIA satellite is scheduled for launch in 2010. GAIA will observe spectral data of about 1 billion celestial objects. Part of the preparation of the GAIA mission is the choice of an efficient classification method to classify the observed objects automatically as stars, double stars, quasars or other objects. For this reason, there have been two blind testing experiments on simulated data. In this paper, the blind testing procedure is described as well as the results of a cross-validation experiment to choose a good classifier from a broad class of methods, comprising, e.g., the support vector machine, neural networks, nearest neighbor methods, classification trees and random forests. Because of a lack of information about their nature, no outliers ("other objects"-class) have been simulated. A new strategy to identify outliers based on only "clean" training data independent of the chosen classification method is proposed.
Aiming to hide the real money gains and to avoid taxes, fictive prices are sometimes recorded in the real estate transactions. This paper is concerned with artificial neural networks based screening of real estate transactions aiming to categorize them into "clear" and "fictitious" classes. The problem is treated as an outlier detection task. Both unsupervised and supervised approaches to outlier detection are studied here. The soft minimal hyper-sphere support vector machine (SVM) based novelty detector is employed to solve the task without the supervision. In the supervised case, the effectiveness of SVM, multilayer perceptron (MLP), and a committee based classification of the real estate transactions are studied. To give the user a deeper insight into the decisions provided by the models, the real estate transactions are not only categorized into "clear" and "fictitious" classes, but also mapped onto the self organizing map (SOM), where the regions of "clear", "doubtful" and "fictitious" transactions are identified. We demonstrate that the stability of the regions evolved in the SOM during training is rather high. The experimental investigations performed on two real data sets have shown that the categorization accuracy obtained from the supervised approaches is considerably higher than that obtained from the unsupervised one. The obtained accuracy is high enough for the technique to be used in practice.
Sensing and classification of drought stress levels are very important to agricultural production. In this work, rice drought stress levels were classified based on the commonly used chlorophyll a fluorescence (ChlF) parameter (Fv/Fm), feature data (induction features), and the whole OJIP induction (induction curve) by using a Support Vector Machine (SVM). The classification accuracies were compared with those obtained by the K-Nearest Neighbors (KNN) and the Ensemble model (Ensemble) correspondingly. The results show that the SVM can be used to classify drought stress levels of rice more accurately compared to the KNN and the Ensemble and the classification accuracy (86.7%) for the induction curve as input is higher than the accuracy (43.9%) with Fv/Fm as input and the accuracy (72.7%) with induction features as input. The results imply that the induction curve carries important information on plant physiology. This work provides a method of determining rice drought stress levels based on ChlF.