Decomposable (probabilistic) models are log-linear models generated by acyclic hypergraphs, and a number of nice properties enjoyed by them are known. In many applications the following selection problem naturally arises: given a probability distribution p over a finite set V of n discrete variables and a positive integer k, find a decomposable model with tree-width k that best fits p. If H is the generating hypergraph of a decomposable model and pH is the estimate of p under the model, we can measure the closeness of pH to p by the information divergence D(p:pH), so that the problem above reads: given p and k, find an acyclic, connected hypergraph H of tree-width k such that D(p:pH) is minimum. It is well-known that this problem is NP-hard. However, for k=1 it was solved by Chow and Liu in a very efficient way; thus, starting from an optimal Chow-Liu solution, a few forward-selection procedures have been proposed with the aim at finding a `good' solution for an arbitrary k. We propose a backward-selection procedure which starts from the (trivial) optimal solution for k=n−1, and we show that, in a study case taken from literature, our procedure succeeds in finding an optimal solution for every k.
.
The hypergeometric distributions have many important applications, but they have not had sufficient attention in information theory. Hypergeometric distributions can be approximated by binomial distributions or Poisson distributions. In this paper we present upper and lower bounds on information divergence. These bounds are important for statistical testing and for a better understanding of the notion of exchangeability.
Database records can be often interpreted as state descriptions of some world, system or generic object, states of which occur independently and are described by binary properties. If records do not contain missing values, then there exists close relationship between association rules and propositions about state properties. In data mining we usually get a lot of association rules with large confidence and large support. Since their interpretation is often cumbersome, some quantitative measure of their informativeness would be very helpful. The main aim of the paper is to define a measure of the amount of information contained in an association rule. For this purpose we make use of the tight correspondence between association rules and logical implications. At first a quantitative measure of information content of logical formulas is introduced and studied. Information content of an association rule is then defined as information content of the corresponding logical implication in the situation when no knowledge about dependence among properties of world states is at our disposal. The intuitive meaning of the defined measure is that the association rule that allows more appropriate correction of the distribution of world states, acquired under unfair assumption of independence of state properties, contains also larger amount of information. The more appropriate correction here means a correction of the current probability distribution of states that leads to the distribution that is closer to the true distribution in the sense of Kullback-Leibler divergence measure.
This work studies the standard exponential families of probability measures on Euclidean spaces that have finite supports. In such a family parameterized by means, the mean is supposed to move along a segment inside the convex support towards an endpoint on the boundary of the support. Limit behavior of several quantities related to the exponential family is described explicitly. In particular, the variance functions and information divergences are studied around the boundary.
Standard properties of ϕ-divergences of probability measures are widely applied in various areas of information processing. Among the desirable supplementary properties facilitating employment of mathematical methods is the metricity of ϕ-divergences, or the metricity of their powers. This paper extends the previously known family of ϕ-divergences with these properties. The extension consists of a continuum of ϕ-divergences which are squared metric distances and which are mostly new but include also some classical cases like e. g. the Le Cam squared distance. The paper establishes also basic properties of the ϕ-divergences from the extended class including the range of values and the upper and lower bounds attained under fixed total variation.
This article studies exponential families E on finite sets such that the information divergence D(P∥E) of an arbitrary probability distribution from E is bounded by some constant D>0. A particular class of low-dimensional exponential families that have low values of D can be obtained from partitions of the state space. The main results concern optimality properties of these partition exponential families. The case where D=log(2) is studied in detail. This case is special, because if D<log(2), then E contains all probability measures with full support.