The focus of this paper is the application of the genetic programming
framework in the problem of knowledge discovery in databases, more precisely in the task of classification. Genetic programming possesses certain advantages that make it suitable for application in data mining, such as robustness of the algorithm or its convenient structure for rule generation to name a few. This study concentrates on one type of parallel genetic algorithms - cellular (diffusion) model. Emphasis is placed on the improvement of efficiency and scalability of the data mining algorithm, which could be achieved by integrating the algorithm with databases and employing a cellular framework. The cellular model of genetic programming that exploits SQL queries is implemented and applied to the classification task. The results achieve are presented and compared with other machine learning algorithms.
A group of fuzzy IF-THEN rules is belonging to one of the most popular, most effective, and user-friendliest knowledge representations. For this reason, extraction of these rules is becoming a more-and-more important part of the Data Mining stage in the Knowledge Discovery in Databases Process. In this paper, a direct algorithm for extracting fuzzy IF-THEN rules on the basis of linguistic variable elimination is described. The algorithm is implemented within a designed object-oriented software library Fuzzy Rule Miner. Besides the introduced algorithm, it implements two algorithms for fuzzy rule extraction based on using fuzzy decision trees of ID3 kind. An essential precondition for comparing the implemented algorithms and for verifying the legitimacy of the introduced algorithm is performance of experiments. The goal of experiments is to take in the behavior of algorithms on testing databases from the UCI Repository of Machine Learning Databases and to make comparisons of algorithms with one another. According to the conducted experiments, the introduced algorithm achieves high accuracy levels of discovered knowledge. The paper also contains a classification of rules and a specification of the Fuzzy Rule Discovery in Databases Process.
The frequent patterns discovery is one of the most important data
mining tasks. We introduce RAP, the hrst systém for finding first-order maximal frequent patterns. We describe search strategies and methods of pruning the search space. RAP which generates long patterns much faster than other systems has been ušed for feature construction for propositional cis well as multi-relational data. We prove that a partial search for maximal frequent patterns as new features is competitive with other approaches and results in classification accuracy increase.