Decomposable (probabilistic) models are log-linear models generated by acyclic hypergraphs, and a number of nice properties enjoyed by them are known. In many applications the following selection problem naturally arises: given a probability distribution p over a finite set V of n discrete variables and a positive integer k, find a decomposable model with tree-width k that best fits p. If H is the generating hypergraph of a decomposable model and pH is the estimate of p under the model, we can measure the closeness of pH to p by the information divergence D(p:pH), so that the problem above reads: given p and k, find an acyclic, connected hypergraph H of tree-width k such that D(p:pH) is minimum. It is well-known that this problem is NP-hard. However, for k=1 it was solved by Chow and Liu in a very efficient way; thus, starting from an optimal Chow-Liu solution, a few forward-selection procedures have been proposed with the aim at finding a `good' solution for an arbitrary k. We propose a backward-selection procedure which starts from the (trivial) optimal solution for k=n−1, and we show that, in a study case taken from literature, our procedure succeeds in finding an optimal solution for every k.
.
We generalize Jiroušek's (\emph {right}) \emph {composition operator} in such a way that it can be applied to distribution functions with values in a "semifield", and introduce (parenthesized) \emph {compositional expressions}, which in some sense generalize Jiroušek's "generating sequences" of compositional models. We say that two compositional expressions are \emph {equivalent} if their evaluations always produce the same results whenever they are defined. Our first result is that a set system H is star-like with centre X \emph {if and only if} every two compositional expressions with "base scheme" H and "key" X are equivalent. This result is stronger than Jiroušek's result which states that, if H is star-like with centre X, then every two generating sequences with base scheme H and key X are equivalent. Then, we focus on \emph {canonical expressions}, by which we mean compositional expressions θ such that the sequence of the sets featured in θ and arranged in order of appearance enjoys the "running intersection property". Since every compositional expression, whose base scheme is a star-like set system with centre X and whose key is X, is a canonical expression, we investigate the equivalence between two canonical expressions with the same base scheme and the same key. We state a graphical characterization of those set systems H such that every two canonical expressions with base scheme H and key X are equivalent, and also provide a graphical algorithm for their recognition. Finally, we discuss the problem of detecting conditional independences that hold in a compositional model.d
In the framework of models generated by compositional expressions, we solve two topical marginalization problems (namely, the \emph{single-marginal problem} and the \emph{marginal-representation problem}) that were solved only for the special class of the so-called "canonical expressions". We also show that the two problems can be solved "from scratch" with preliminary symbolic computation.
The sum-product algorithm is a well-known procedure for marginalizing an "acyclic'' product function whose range is the ground set of a commutative semiring. The algorithm is general enough to include as special cases several classical algorithms developed in information theory and probability theory. We present four results. First, using the sum-product algorithm we show that the variable sets involved in an acyclic factorization satisfy a relation that is a natural generalization of probability-theoretic independence. Second, we show that for the Boolean semiring the sum-product algorithm reduces to a classical algorithm of database theory. Third, we present some methods to reduce the amount of computation required by the sum-product algorithm. Fourth, we show that with a slight modification the sum-product algorithm can be used to evaluate a general sum-product expression.
In probability theory, Bayesian statistics, artificial intelligence and database theory the minimum cross-entropy principle is often used to estimate a distribution with a given set P of marginal distributions under the proportionality assumption with respect to a given "prior'' distribution q. Such an estimation problem admits a solution if and only if there exists an extension of P that is dominated by q. In this paper we consider the case that q is not given explicitly, but is specified as the maximum-entropy extension of an auxiliary set Q of distributions. There are three problems that naturally arise: (1) the existence of an extension of a distribution set (such as P and Q), (2) the existence of an extension of P that is dominated by the maximum entropy extension of Q, (3) the numeric computation of the minimum cross-entropy extension of P with respect to the maximum entropy extension of Q. In the spirit of a divide-and-conquer approach, we prove that, for each of the three above-mentioned problems, the global solution can be easily obtained by combining the solutions to subproblems defined at node level of a suitable tree.