Binary Factor Analysis (BFA) aims to discover latent binary structures in high dimensional data. Parameter learning in BFA faces an exponential computational complexity and a large number of local optima. The model selection to determine the latent binary dimension is therefore difficult. Traditionally, it is implemented in two separate stages with two different objectives. First, parameter learning is performed for each candidate model scale to maximise the likelihood; then the optimal scale is selected to minimise a model selection criterion. Such a two-phase implementation suffers from huge computational cost and deteriorated learning performance on large scale structures. In contrast, the Bayesian Ying-Yang (BYY) harmony learning starts from a high dimensional model and automatically deducts the dimension during learning. This paper investigates model selection on a subclass of BFA called Orthogonal Binary Factor Analysis (OBFA). The Bayesian inference of the latent binary code is analytically solved, based on which a BYY machine is constructed. The harmony measure that serves as the objective function in BYY learning is more accurately estimated by recovering a regularisation term. Experimental comparison with the two-phase implementations shows superior performance of the proposed approach.
In many natural language processing applications two or more models usually have to be involved for accuracy. But it is difficult for minor models, such as “backoff” taggers in part-of-speech tagging, to cooperate smoothly with the major probabilistic model. We introduce a two-stage approach for model selection between hidden Markov models and other minor models. In the first stage, the major model is extended to give a set of candidates for model selection. Parameters weighted hidden Markov model is presented using weighted ratio to create the candidate set. In the second stage, heuristic rules and features are used as evaluation functions to give extra scores to candidates in the set. Such scores are calculated using a diagnostic likelihood ratio test based on sensitivity and specificity criteria. The selection procedure can be fulfilled using swarm optimization technique. Experiment results on public tagging data sets show the applicability of the proposed approach.