| With the rapid development and popularization of Internet,data acquisition,information processing and artificial intelligence technology,the systems of intelligent recommendation,intelligent consultation and intelligent search are growing increasingly,which have become a hot issue in the academic and industry.The entities in the intelligent system often have qualitative and quantitative dependencies,the inference and search based on dependency relationships among entities are the vital foundation for personalized recommendation,disease prediction and decision support.Without loss of generality,the dependency relationships among entities could be commonly modeled as a graph structure,where the nodes are used to represent the entities and the edges are used to represent the dependency relationships.To provide computational methods and scientific bases for the practical applications of intelligent system,such as analysis,prediction,inference and decision-making,how to implement the qualitative and quantitative representation of dependency relationships and knowledge inference based on the graphical model is of great significance.Bayesian network(BN)is a directed acyclic graph(DAG)with condition probability tables(CPTs)of each node,which could well express the qualitative and quantitative dependencies among nodes.Therefore,we propose to use BN as the basic framework of knowledge representation and inference.In the BN constructed by the aforementioned applications,such as intelligent recommendation and intelligent consultation,the values of nodes that are used to describe the user preference and patient’s disease could be not directly observed.Hence,we adopt latent variables to represent the nodes with missing values,and introduce the latent variables to BN.Moreover,we construct the model of BN with latent variables(BNLV)to describe the dependency relationships of intelligent system accurately and intuitively from the global perspective,and use latent variables to simplify the network structure and improve the interpretability of knowledge.The probabilistic inferences of BN could provide the effective calculation methods for the inference of information such as user preference and patient’s diseases,as well as intelligent analysis tasks such as similarity search.Therefore,we focus on the construction,inference and application of a BN,and study the methods of BNLV’s learning,probabilistic inference and similarity search.Nevertheless,there are three challenges for the learning of BNLV model,BN’s multiple inferences and BN-based similarity search respectively:(1)the parameter learning methods of BNLV based on likelihood iteration have high accuracy(e.g.,Expectation-Maximization(EM)algorithm),but which maximize the expected loglikelihood function on the fractional samples with high time complexity;(2)the accurate or approximate inference methods take exponential time,when performing multiple inferences on the same and large-scale BN and repeatedly querying the CPTs of the nodes in the inference process;(3)the similarity search on a BN is equal to perform multiple inferences on a BN which also takes exponential time.Graph embedding aims to map the high-dimensional graph into the low-dimensional vector space which provides a high-efficient method of graph calculation based on the low-dimensional vectors.In this paper,BN embedding method,as an extension method of graph embedding for a specific graphical model,aims to convert the complex probability calculation on a large-scale BN into the similarity calculation of lowdimensional embeddings.To improve the efficiency of construction of BN model,BN’s multiple inferences and BN-based similarity search,we propose the parameter learning method of BNLV based on dynamic embeddings,BN embedding method based on singular value decomposition(SVD),and the approximate inference method and similarity search method based on BN embedding,respectively.The work is described in detail as follows.(1)Parameter learning method of BNLV based on dynamic embeddings.EM is viewed as a classical method of BNLV.To lower the time complexity of E-step of EM,we propose the reconstruction of E-step which adopts the similarities between the node embeddings to update the weights of fractional samples.Then,we propose the DAG and updated CPTs that are transformed to dynamic PMI matrix to preserve the DAG and updated CPTs.Following,we propose point mutual information(PMI)matrix factorization method based on SVD to generate the low-dimensional and dynamic embeddings.The experimental results tell us that our proposed parameter learning method outperforms other comparison methods,is far faster than classical EM,converges fastest,and implements the construction of BNLV efficiently.(2)BN embedding based on SVD.We propose to transform the DAG and CPTs of a BN to a set of directed weighted graphs that is preserved by a PMI matrix,which has addressed the problem how to use PMI matrix to preserve a BN.To lower the time complexity of calculation of PMI,we propose to use the maximum likelihood estimation to approximately calculate the PMI between two nodes on the random samples.Considering that PMI matrix is sparse and has negative entries,we propose the PMI matrix factorization based on SVD.The experimental results tell us that BN embedding method could map the BN into the low-dimensional vector space and preserve the DAG and CPTs.(3)Probabilistic inferences and similarity search based on BN embedding.To improve the computational efficiency of multiple inferences on the same BN,we propose to sample the non-evidence variables according to the topology of the BN,use the embedding similarity between the sampling node and its neighbor nodes to assign the value of sampling node,and generate multiple independent random samples to approximately calculate the posterior probability distribution.In addition,to address the high time complexity of BN-based similarity search and improve the efficiency and accuracy of K nearest neighbor search on graph index,we propose the improved nearest neighbor-descent algorithm,edge selection strategy and spanning method of a depth-firstsearch tree by using the navigating node as the root.The experimental results tell us that our method performing multiple inferences on the embeddings will be more efficient without affecting the accuracy of inference,and our graph index method could better express the neighbor relationship between indexes and only check fewer neighbor points than others when the search precision is the same.Under the condition of guaranteeing the accuracy of parameter learning,inference and search,our proposed BN embedding method based on matrix factorization effectively solves the problems of inefficiency of parameter learning of BNLV,probabilistic inferences of BN and BN-based similarity search,extends the traditional methods of learning and inference of BN,and makes the intelligent systems based on BN more practical.To provide the high-effective calculation means for the practical applications,such as intelligent recommendation,intelligent consultation and intelligent search,our proposed method builds the method system of representation,inference and applications of uncertain knowledge based on matrix factorization and graph embedding techniques. |