A single-step information-theoretic algorithm that is able to identify possible clusters in dataset is presented. The proposed algorithm consists in representation of data scatter in terms of similarity-based data point entropy and probability descriptions. By using these quantities, an information-theoretic association metric called mutual ambiguity between data points is defined, which then is to be employed in determining particular data points called cluster identifiers. For forming individual clusters corresponding to cluster identifiers determined as such, a cluster relevance rule is defined. Since cluster identifiers and associative cluster member data points can be identified without recursive or iterative search, the algorithm is single-step. The algorithm is tested and justified with experiments by using synthetic and anonymous real datasets. Simulation results demonstrate that the proposed algorithm also exhibits more reliable performance in statistical sense compared to major algorithms.
In my previous papers ([18], [19]) the entropy of fuzzy partitions had been defined. The concept of the entropy of a fuzzy partition was used to define the entropy of a fuzzy dynamical system and to propose an ergodic theory for fuzzy dynamical systems ([19], [20]). In this paper, using my previous results related to the entropy of fuzzy partitions, a measure of average mutual information of fuzzy partitions is defined. Some properties concerning this measure are proved. It is shown that the entropy of fuzzy partitions can be considered as a special case of their mutual information. We obtain that subadditivity and additivity of entropy of fuzzy partitions are simple consequences of these properties. The suggested measures can be applied whenever it is need to know the amount of information that we obtain by realization of experiments, the results of which are fuzzy events.
We investigate the sets of joint probability distributions that maximize the average multi-information over a collection of margins. These functionals serve as proxies for maximizing the multi-information of a set of variables or the mutual information of two subsets of variables, at a lower computation and estimation complexity. We describe the maximizers and their relations to the maximizers of the multi-information and the mutual information., Thomas Merkh and Guido Montúfar., and Obsahuje bibliografické odkazy