A single-step information-theoretic algorithm that is able to identify possible clusters in dataset is presented. The proposed algorithm consists in representation of data scatter in terms of similarity-based data point entropy and probability descriptions. By using these quantities, an information-theoretic association metric called mutual ambiguity between data points is defined, which then is to be employed in determining particular data points called cluster identifiers. For forming individual clusters corresponding to cluster identifiers determined as such, a cluster relevance rule is defined. Since cluster identifiers and associative cluster member data points can be identified without recursive or iterative search, the algorithm is single-step. The algorithm is tested and justified with experiments by using synthetic and anonymous real datasets. Simulation results demonstrate that the proposed algorithm also exhibits more reliable performance in statistical sense compared to major algorithms.
Given a fixed dependency graph G that describes a Bayesian network of binary variables X1,…,Xn, our main result is a tight bound on the mutual information Ic(Y1,…,Yk)=∑kj=1H(Yj)/c−H(Y1,…,Yk) of an observed subset Y1,…,Yk of the variables X1,…,Xn. Our bound depends on certain quantities that can be computed from the connective structure of the nodes in G. Thus it allows to discriminate between different dependency graphs for a probability distribution, as we show from numerical experiments.