A boundary vector generator is a data barrier amplifier that improves the distribution model of the samples to increase the classification accuracy of the feed-forward neural network. It generates new forms of samples, one for amplifying the barrier of their class (fundamental multi-class outpost vectors) and the other for increasing the barrier of the nearest class (additional multi-class outpost vectors). However, these sets of boundary vectors are enormous. The reduced boundary vector generators proposed three boundary vector reduction techniques that scale down fundamental multi-class outpost vectors and additional multi-class outpost vectors. Nevertheless, these techniques do not consider the interval of the attributes, causing some attributes to suppress over the other attributes on the Euclidean distance calculation. The motivation of this study is to explore whether six normalization techniques; min-max, Z-score, mean and mean absolute deviation, median and median absolute deviation, modified hyperbolic tangent, and hyperbolic tangent estimator, can improve the classification performance of the boundary vector generator and the reduced boundary vector generators for maximizing class boundary. Each normalization technique pre-processes the original training set before the boundary vector generator or each of the three reduced boundary vector generators will begin. The experimental results on the real-world datasets generally confirmed that (1) the final training set having only FF-AA reduced boundary vectors can be integrated with one of the normalization techniques effectively when the accuracy and precision are prioritized, (2) the final training set having only the boundary vectors can be integrated with one of the normalization techniques effectively when the recall and F1-score are prioritized, (3) the Z-score normalization can generally improve the accuracy and precision of all types of training sets, (4) the modified hyperbolic tangent normalization can generally improve the recall of all types of training sets, (5) the min-max normalization can generally improve the accuracy and F1-score of all types of training sets, and (6) the selection of the normalization techniques and the training set types depends on the key performance measure for the dataset.
We show that prediction of travel time on a 28-km long highway section based on on-line travel time measurements with video is practicable by a data mining method. We introduce a new prediction model, a result of the GUHA style data mining analysis and tlie Total Fuzzy Similarity method. Comparing the results with the existing Traficon model, oiir model improves the travel time class prediction. The results obtained by our method are comparable to the MLP neural network model, too.
KL-Miner [9] is a datarnining procedure that, given input data matrix
M. and a set of parameters, generates patterns of the form R ~ C/7. Here R and C are categorial attributes corresponding to the columns of M, and 7 is a Boolean condition defined in terms of the remaining colums of Ai. The pattern R C means that R and C are strongly correlated on the submatrix of M formed by all the rows of M that satisfy 7. What is meant by “strong correlation” and how are R, C and 7 generated is determined by the input parameters of the procedure. KL-Miner conforms to the GUHA principle forinulated in [1]. It revives two older GUHA procedures described in [2]; it is very much related to CORREL and contains a new implementation of COLLAPS as a module.
In this paper, we mention the motivation that leads to designing of KL-Miner, describing our new implementation of COLLAPS and giving application exarnples that illustrate the main features of KL-Miner.
Relations between two Boolean attributes derived from data can be
quantified by truth functions defined on four-fold tables corresponding to pairs of the attributes. Several classes of such quantifiers (implicational, double implicational, equivalence ones) with truth values in the unit interval were investigated in the frame of the theory of data mining methods. In the fuzzy logic theory, there are well-defined classes of fuzzy operators, namely t-norms representing various types of evaluations of fuzzy conjunction (and t-conorms representing fuzzy disjunction), and operators of fuzzy implications.
In the contribution, several types of constructions of quantifiers using fuzzy operators are described. Definitions and theorems presented by the author in previous contributions to WUPES workshops are summarized and illustrated by examples of well-known quantifiers and operators.
Credit risk assessment, credit scoring and loan applications approval are one of the typical tasks that can be performed using machine learning or data mining techniques. From this viewpoint, loan applications evaluation is a classification task, in which the final decision can be either a crisp yes/no decision about the loan or a numeric score expressing the financial standing of the applicant. The knowledge to be used is inferred from data about past decisions. These data usually consist off both socio-demographic and economic characteristics of the applicant (e.g., age, income, and deposit), the characteristics of the loan, and the loan approval decision. A number of machine learning algorithms can be used for this purpose. In this paper we show how this task can be performed using the LISp- Miner system, a tool that is under development at the University of Economics, Prague. LISp-Miner is primarily focused on mining for various types of association rules, but unlike "classical" association rules proposed by Agrawal, LISp-Miner in- troduces a greater variety of different types of relations between the left-hand and right-hand sides of a rule. Two other procedures that can be used for classification task are implemented in LISp-Miner as well. We describe the 4ft-Miner and KEX procedures and show how they can be used to analyze data related to loan applications. We also compare the results obtained using the presented algorithms with results from standard rule-learning methods.
Population fluctuations of the well-known oak defoliator, the oak processionary moth (Thaumetopoea processionea L.), were studied using light trap data and basic meteorological parameters (monthly average temperatures, and precipitation) at three locations in Western Hungary over a period of 15 years (1988-2012). The fluctuations in the numbers caught by the three traps were strongly synchronized. One possible explanation for this synchrony may be similar weather at the three trapping locations. Cyclic Reverse Moving Interval Techniques (CReMIT) were used to define the period of time in a year that most strongly influences the catches. For this period, we defined a species specific aridity index for Thaumetopoea processionea (THAU-index). This index explains 54.8-68.9% of the variation in the yearly catches, which indicates that aridity, particularly in the May-July period was the major determinant of population fluctuations. Our results predict an increasing future risk of Oak Processionary Moth (OPM) outbreaks and further spread if the frequency of severe spring/summer droughts increases with global warming., György Csóka, Anikó Hirka, Levente Szöcs, Norbert Móricz, Ervin Rasztovits, Zoltán Pödör., and Obsahuje bibliografii