| In recent years,Heterogeneous Information Network(HIN)has been widely used because of its excellent data modeling ability.HIN analysis has become a hot research issue in many fields such as data mining and machine learning.HINs can integrate complex and diverse relationship structures in a comprehensive and interconnected manner,which ensures the credibility and accuracy of the analysis results.However,HIN is a kind of irregular non-Euclidean data with highly interconnected objects,making it difficult for current machine learning methods that are rooted in Euclidean space with conditional independence assumptions to carry the HIN analysis tasks.In this background,this thesis is committed to studying structure-based machine learning methods on HIN for HIN analysis.Structure learning methods are based on link-based node similarity to fully exploit structural information,use explicit structure to interpret existing structure and predict future structure.Therefore,our research starts from node similarity computation and then studies the information recommendation problem,which is a validation of structural learning methods.At present,node similarity computation on HIN mainly faces 3 challenges:(1)since the edges in HIN contain rich semantics,similarity computation not only needs to identify the node types,but also needs to identify and analyze the semantic information of edges to provide the results with semantic logic;(2)the link structure of nodes is irregular,so the similarity calculation needs to consider the structure equivalence;(3)the similarity calculation needs to guarantee the credibility through the global structure of HINs,so efficiency must be considered especially in the large-scale HINs.Existing recommendation methods on HIN are mainly based on complex matrix decomposition and deep learning.These methods usually have the following problems:(1)feature extraction or graph embedding on HIN will lead to effective information loss and noise introduction;(2)user behavior has a mainstream tendency,reflected in the extreme irregularity of node structure on HINs,which causes that traditional recommendation algorithms tend to recommend popular items and ignore the long tail items;(3)traditional machine learning methods have some common drawbacks,including difficulty in explaining the learning parameters,unstable performance on large-scale datasets and limited incremental learning ability.Especially on HINs,the learning model usually have high complexity and low robustness.Sometimes the optimization function of deep learning may be non-differentiable,and the hidden layers are limited.In summary,traditional machine learning methods are difficult to fully and deeply mine the structure information in HIN.Therefore,there is an urgent need for structure learning methods to solve data mining tasks on HIN in a comprehensive,interconnected,and developmental manner.In this thesis,we first study node similarity computation on HIN,laying the foundation for the research of structure learning methods.Then,we study information recommendation on HIN and apply structure learning methods to HIN analysis tasks.The main work and contributions are summarized as follows:1.The general form of SimRank is proposed to measure similarities on HIN,called General SimRank(GSR).Through semantic recognition and analysis,GSR provides a link-based global measure with logical semantics.In the pairwise surfers of GSR,it identifies the semantics behind edges to ensure the semantic consistency of any two meeting paths,so as realizes that "any neighbor pair can transform its similarity to the target node pair only if the two edges between corresponding nodes share the same semantic." Secondly,GSR analyzes the contribution weights of different semantics to similarity and proposes a weight analysis algorithm based on entropy.Then,we show the good properties of GSR with corresponding proofs,and propose the equivalent form of GSR formula,i.e.,Constrained Expected Meeting Distance.Finally,we proof the convergence of GSR and give the iterative solution method,and analysis the upper bound of the convergence speed.2.A fast computation algorithm of all-pair General SimRank is proposed based on linear system.Under the premise of accuracy,it greatly improves the computational efficiency on both static and dynamic HINs.General SimRank inherits the recursive form of SimRank and needs many iterations to reach the convergence state in the solution process.Consequently,its time and space complexity are very high,which limits its applications on large-scale HINs.To speed up the computation of GSR,we convert the "pairwise surfer model" of GSR on HIN into a new random walk model on "node pair graph".Based on the new model,an equivalent linear system is proposed for GSR.Then we design a novel local push based algorithm to compute all-pair GSR scores with accuracy guarantee.Further,we extend the local push method into dynamic HINs and propose the incremental algorithm.The experimental results show that the proposed method based on local push is significantly better than the original iterative method in computational efficiency.3.We propose the graph filtering(GF)model for recommendation on HINs.It is a purely structure-based machine learning method,which uses structure to interpret and predict structure.Graph filtering model also has interpretability and evolutionary learning capability.The graph filtering algorithm first models the user behavior as a rating pair on the HIN,then uses the GSR method to calculate the similarity between user behaviors and finally predicts the future behavior based on the similar historical behaviors.Further,in order to analyze the contributions of different types of data on users’ behaviors,an adaptive learning method is designed to calculate the weight of the relevant edges of objects(users,goods)in HIN.Compared with deep learning and other traditional recommendation methods,the structure-based graph filtering method has the following advantages:it directly skips the steps of feature extraction and information transformation,and avoids the loss of effective information and the introduction of noise;since GSR has complete semantic logic,the learning parameters of graph filtering based on similar behavior are also interpretable;without limitation of path length,GSR can deeply mine various semantic combinations and efficiently integrate complex information in HINs;The optimization algorithm of GSR makes graph filtering capable for processing large-scale data.Finally,we verify the evolutionary learning ability of GF in dynamic environments.4.A novel long tail recommendation method based on HIN is proposed,which models the recommendation task as a structure learning problem on HIN.We design a peer similarity measure to find long tail objects and make predictions based on graph filtering.Further,an enhancement method is proposed based deep neural network to fit recommendation results under different factors.Most existing recommender systems tend to push popular items to users and ignore long tail items.There are two main reasons for this phenomenon:(1)information is not enough to help the system train accurate personalized model due to data sparsity;(2)users’ behaviors have public tendency,which will inevitably lead to the subordination of individual behavior to the public behavior since the learning model always uses global objective function mining personality characteristics.Therefore,we first integrate and explore the auxiliary data through HIN to handle the data sparsity problem,and then propose a degree aware GSR(dGSR)measure to calculate peer similarities of structure.Finally,dGSR is applied to the graph filtering model for the basic long tail recommendation.Further,in order to estimate the contributions of multi-typed auxiliary data,we decompose the HIN into many independent single-aspect networks.On each network,we compute the predication via the basic method for a rating pair,and then compose a feature vector to feed a deep neural network for long tail recommendation.Experiments show that the proposed algorithms have much higher precision and recall rate for the long tail objects recommendation than current classic methods. |