Font Size: a A A

An Improved Ips Algorithm By Partitioning Cliques

Posted on:2017-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:J B SunFull Text:PDF
GTID:2180330503979689Subject:Statistics
Abstract/Summary:PDF Full Text Request
Graphical models intuitively and clearly express the structural relationship between variables via graph and decompose complex large-scale global problems into relatively simple local problems, which not only greatly reduces computations, but also improves the efficacy of inference. In the inference and calculation of graphical models, parameter estimation and model selection are the two core problems. This paper will study the maximum likelihood estimation of complete data and the model selection for incomplete data.The maximum likelihood estimation of the graphical models is generally solved by the iterative proportional scaling(IPS) algorithm. Due to the stability and ease of implementation of the IPS algorithm, since it was first proposed by Deming and Stephan, many scholars have done further research and improvement in many aspects, such as geometric interpretation, convergence, space complexity and time complexity. However, many practical problems are often complex and involve a lot of variables, so the complexity of the IPS algorithm is high.This paper further studies the IPS procedure from the perspective of local computation. Inspired by Xu et al. paper published in JCGS for local computation in Gaussian graphical models by partitioning cliques, we give the theoretical results for the local computation by partitioning cliques of graphical models with multinomial districution. By these theoretical results, the local computation by partitioning cliques does not apply the conditional independence relations between variables, so even if the graph structure is very complex, partitioning cliques strategy is still valid, which is the advantage that many previous algorithms proposed by senior scholars do not have. Further, we propose the algorithm implementation, called the IPSP algorithm, by partitioning cliques strategy. In this implementation, we first divide cliques into non-overlapping and non-empty blocks, and then adjust locally and successively the clique marginals in each block. It is not difficult to find a partition resulting in less complexity than the IPS algorithm, but it is difficult to find the overall optimal partition. This paper uses the simulated annealing(SA) algorithm to find an approximate optimal partition. In addition, we also prove that the optimal partition of the n-cycle graphical model is the continuous bisection partition. The partitioning cliques strategy does not apply graphical model structure and the conditional independence relations, and the UPS-JT algorithm proposed by Teh and Welling does not have this property. So the partitioning cliques strategy can speed up the calculation of the fitting steps of the UPS-JT algorithm and improve the computational efficiency of the UPS-JT algorithm. We call it the UPSP-JT algorithm. Then, the calculation efficiency of IPS, IPSP, UPS-JT and UPSP-JT are compared via simulation study in this paper. We found that the UPS-JT algorithm and UPSP-JT algorithm with using the junction trees are better than the IPS algorithm and IPSP algorithm without using the junction trees, and that the IPSP algorithm and UPSP-JT algorithm run respectively faster than the IPS algorithm and UPS-JT algorithm. Hence the local computation by partitioning cliques can effectively improve computational efficiency.For model selection with incomplete data, the traditional method solves the maximum likelihood estimation of each alternative model, and then uses the information criterion(such as BIC, etc.) to select the model. However, Jiang et al. paper published in JASA points out that the traditional method may not converge, or even may choose the wrong model, and Jiang et al. proposes the E-MS algorithm which is convergent under certain conditions. In this paper, we will use the E-MS algorithm to solve the model selection problem of the graphical model with incomplete data. But due to the nested computations in the E-MS algorithm, the computation complexity increases rapidly with the increase of the number of variables. To reduce the computation complexity, we will maximize the Q function of the E-MS algorithm by the IPSP algorithm. Simulation study is carried out.This paper is organized as follows. In the first chapter, we briefly introduce the research background and development of the IPS algorithm, as well as the organization structure of this paper. In the second chapter, we briefly introduce the definitions, concepts and notations of graphical model. In the third chapter, this paper describes the joint probability distribution of contingency tables of categorical data and the classical IPS procedure. We give the theoretical results of local computation by partitioning cliques, and propose the IPSP algorithm. We also propose the MHDP algorithm to find the approximate optimal partition used in simulated annealing algorithm. It is proved that the optimal partition for n-cycle graphical model is the continuous bisection partition. Then, we use the clique partition strategy to improve the UPS-JT algorithm, and compare the computation efficiency of IPS, IPSP, UPS-JT and UPSP-JT via simulation study. In the fourth chapter, the theoretical achievement for model selection with incomplete data is introduced, namely the E-MS algorithm, and we analyze its complexity. Then, the simulation study for model selection of graphical models with incomplete data is implemented. In the fifth chapter, summary of this thesis is given. And prospect of improved IPS algorithm by local computation in Bayesian estimation are discussed.
Keywords/Search Tags:Graph model, IPS algorithm, Local computation, Clique partition, E-MS algorithm
PDF Full Text Request
Related items