Font Size: a A A

Research And Application On Sparse Structure Learning Of Bayesian Networks Based On Regularization

Posted on:2020-02-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:M GuoFull Text:PDF
GTID:1367330575460490Subject:Statistics
Abstract/Summary:PDF Full Text Request
Bayesian network is an effective modeling tool to represent the joint distribution of random variables and is widely used in the modeling and reasoning of uncertain systems.It uses directed acyclic graphs to reveal the direct or indirect relationships among all random variables from a global perspective,and quantifies the degree of dependence among variables by using the probability distribution of random variables.Automatic identification of the optimal Bayesian network structure from data is a hot and difficult research topic.In the context of high-dimensional data,the traditional Bayesian networks structure learning methods that are suitable for small and medium-sized networks have encountered challenges in searching for optimal Bayesian networks.The space of directed acyclic graphs will increase exponentially with the increase of the number of nodes of Bayesian networks.The emergence of high-dimensional data makes the space of directed acyclic graphs explode,such that searching for the optimal Bayesian network in this space requires a large amount of computation,high time cost and poor effectiveness.At the same time,high-dimensional data also will make the structure learning of Bayesian network more complex,resulting in overfitting,poor generalization and interpretation ability.In view of the problems faced by the structure learning of Bayesian network in high-dimensional context,researchers simplify the learning task of large and complex network structure by applying soft or hard constraints on the network structure to sparse the network structure.Hard constraints are the artificial setting of the maximum number of adjacent nodes or parent nodes as a smaller constant,which ignores the inhomogeneity of the connection density of different nodes.Soft constraints are to add1 or0 norm constraints to the Bayesian network structure,so as to realize the automatic learning of sparse network structure from the data.This kind of learning method is called Bayesian network structure learning based on regularization,which to some extend alleviates the problems of Bayesian network structure learning in high-dimensional context.Regularized sparse learning is an effective tool in generalized linear model,covariance estimation,matrix decomposition,image processing,etc.But it is still a relatively new research direction in Bayesian network structure learning.This thesis will focus on the methods and applications of Bayesian network structure learning based on regularization.The main work and innovation of this thesis are as follows:?1?Safe feature screening for generalized linear models with penalty term is adopted for learning the structure of Bayesian networks.Although regularization methods have obvious advantages in structure learning of large-scale Bayesian networks,the existing regularization-based Bayesian networks structure learning methods have slower running speed and lower learning accuracy.For the very large networks,the existing sparse learning algorithms may not be used at all because the cost of computation involved is very high.Sparse learning of large Bayesian networks often contains hundreds of generalized linear models with penalty term,and safe feature screening can accelerate the running speed of a single generalized linear model with penalty term in high dimensional background.?2?By incorporating GAP safe feature screening strategy into structure learning of Gaussian Bayesian networks based on1 regularization,a sparse structure learning algorithm of Gaussian Bayesian network,GBN-GAP,is proposed,which is for the high-dimensional data.Because the complexity of structure learning of discrete Bayesian networks is much greater than that of Gaussian Bayesian networks,Gaussian Bayesian networks based on1 regularization term are selected as a starting point of research.GBN-GAP algorithm adopts two-stage method based on L1MB-DAG algorithm.In the first stage,the constraints of directed acyclic graph are ignored,and a series of solutions integrating LASSO with GAP safe screening strategy are used to construct the skeleton of undirected graph.In the second stage,the undirected graph skeleton constructed in the first stage is used to reduce the search space of directed acyclic graphs,and then Hill Climbing algorithm is used to find the Bayesian network with the best BIC score in the reduced space.GAP safe screening rules can identify and eliminate most variables whose coefficients are 0 in the optimal sparse solution of LASSO in advance,and the optimal solution on the reduced design matrix is same as one on the original LASSO problem.Therefore,there are the some advantages of GBN-GAP algorithm.The learning task of the first stage integrates the GAP screening process with the iterative solution process of LASSO,in which LASSO problems are solved on a reduced-scale design matrix.So,the learning speed of Bayesian network structure is accelerated without losing the learning accuracy of L1MB-DAG algorithm.Meanwhile,the results based on GAP can be used as constraints to limit the size of DAGs search space,which will further reduce the time complexity of GBN-GAP algorithm.Simulation results show that the GBN-GAP algorithm is particularly suitable for large Gaussian Bayesian networks with limited samples and it can captured the key structure of the network in the effective time to make up for the shortcomings of the traditional Bayesian network learning method when confronted with high-dimensional data.?3?Based on Bootstrap and GBN-GAP algorithm,this thesis constructs an average Bayesian network structure.For the practical application aiming at identifying the network structure,the quality of the local structure of the network should be sought as accurately as possible for ensuring the overall quality of the network.For the high-dimensional data with small sample sizes,the GBN-GAP algorithm compared with other algorithms can correctly learn more edges in the underlying network.But it may also learn more edges that are not present in the skeleton of the underlying network.Therefore,in order to make the final estimation graph contain as few spurious edges as possible,an average Bayesian network structure is constructed by using the idea of model averaging.The edges that constitute the average Bayesian network structure are edges whose confidence values are larger than or equal to a certain confidence threshold.Simulation results show that when the confidence threshold of edges is gradually increased,the number of true positive edges that are included in the average network structure will be decreased,and the number of false positive edges will be decreased more rapidly.Therefore,the reasonable confidence threshold should be set according to actual requirements in application.Compared with the learning results of GBN-GAP and other classical Bayesian network methods,the Bayesian network structure based on model averaging outperforms them in precision.?4?The thesis has studied the application of the sparse structure learning of Gaussian Bayesian networks based on GBN-GAP algorithms in Arabidopsis Thaliana gene data.Gene expression data are usually high-dimensional continuous data with small sample size.So the GBN-GAP algorithm is adopted to identify the sparse Arabidopsis Thaliana gene network from the gene expression data of Arabidopsis Thaliana,and the effectiveness of this algorithm is evaluated.The GBN-GAP algorithm can identify the dependence paths between difference genes in Arabidopsis Thaliana gene network,and can also discover some hub nodes with regulation effect.After obtaining the structure characteristics of Arabidopsis Thaliana gene network,the average model based on GBN-GAP algorithm is used to get Arabidopsis Thaliana gene network with higher accuracy and more sparse structure from the gene data.?5?The thesis also has discussed the applications of sparse structure learning of discrete Bayesian networks based on group regularization in P2P re-lending.When studying practical application problems in high-dimensional context,it is usually faced with mixed data composed of continuous data and discrete data.In this case,it is not suitable to use the Gaussian Bayesian network sparse structure learning or the traditional discrete Bayesian network structure learning to identify the probability dependence between different variables.Therefore,this thesis employs Bayesian network sparse learning method based on Multi-Logit regression and group regularization to explore the key factors and paths affecting the re-lending of P2P borrowers.At present,the researchers focus on the behavior of P2P online lending platform,lender's investment behavior and other aspects,but less on P2P re-lending.In addition,in the research of the impact of online social relationships on online lending,it is rare to distinguish the effects of directed and undirected social relationships,different types and intensities of directed social relationships on online lending.This thesis refines the online social relationship and studies its impact on P2P re-lending.Specifically,the sparse Bayesian network structure that affects P2P re-lending from different product perspectives is studied.Then,by combining with the Bayesian posterior parameter estimation and the results of Bayesian network precise reasoning,the factors and paths affecting P2P borrowers're-lending under different financial credit products are quantitatively analyzed in order to provide some basis for the implementation of various marketing plans and other activities in the network lending platform.The study results show that the factors affecting the re-lending of P2P borrowers are obviously different from the perspective of different products,which are mainly manifested in the factors composed of online social relations and borrowers'bank lending and consumption behavior.In addition,due to the lower behavioral maturity of P2P borrowers on the mobile end of the network lending platform studied,the network structures obtained from different product perspectives all reveal that the mobile behavioral maturity of P2P borrowers on the mobile end has no influence on the P2P re-lending.To sum up,this thesis has carried out basic research on the sparse learning method of Bayesian network structure and the applications under the background of high-dimensional data.In the future,GBN-GAP algorithm can be improved to further improve its learning accuracy;an attempt is made to speed up the sparse learning method of discrete Bayesian networks by using GAP safe screening strategy;in order to improve the accuracy of Bayesian networks,more prior information about network structure should be incorporated into the construction of regularized Bayesian network sparse learning model;the application of sparse Bayesian network structure learning method will be discussed in a deeper and broader scope.
Keywords/Search Tags:regularization, sparse structure learning of Bayesian network, GAP safe rules, model averaging, P2P lending
PDF Full Text Request
Related items