| With the development of the big data era,real-world data increasingly exhibits diverse entity types and complex interactions among these entities,which is modeled as heterogeneous information networks(HINs),such as academic networks,biological networks,and social networks.The goal of heterogeneous information network representation learning is to learn low-dimensional representations of entities and their interactions in the network while preserving features and structural characteristics.However,as network scale rapidly increases,efficient representation of complex interactions among diverse entity types faces several challenges.Firstly,due to the sparsity of data,it is difficult to capture the structural characteristics of entities.Secondly,as the heterogeneity of the network gradually increases,expert knowledge is required to do a lot of work on the data in advance to describe the complex heterogeneous structures in the network and often with the problem of semantic redundancy.Thirdly,the incompleteness of real network data is increasingly apparent,reflecting incomplete characteristics and structures.How to effectively utilize raw data without losing information or introducing noise is also an urgent issue that needs to be addressed.This paper tackles the problem of learning representations for heterogeneous information networks by focusing on high-order structures.Four research objectives are pursued,following a logical progression from simple to complex node structures and from the local to global evolution,including node neighborhood high-order structures representation learning,metapaths and metastructures integrated representation learning,generalized heterogeneous graphlets representation learning and incomplete high-order structure representation learning.Through background research,model design,and experimental verification,this paper systematically proposes innovative models for learning representations of high-order structures in heterogeneous information networks.The main contributions of this paper are summarized as follows:(1)To explore the representation ability of high-order neighborhood structures on nodes,this paper proposes a node neighborhood structures similarity-based graph neural network model.The model takes the degree sequences of node neighborhoods as the structural features of nodes,and proposes a similarity calculation algorithm for degree sequences with different sizes to calculate the structural similarity between nodes,realizing the characterization of highorder structural characteristics of node neighborhoods.Meanwhile,the structural similarity is used to improve the message passing mode between nodes on the graph convolutional network,achieving information aggregation between structurally similar nodes.Node classification experiments are conducted on open datasets to verify the representation ability of high-order structures on nodes.The results show that the model improves the accuracy of node classification when combined with node structural similarity.(2)To explore the representation ability of metapaths and metastructures,this paper proposes a metapath and metastructure integrated heterogeneous graph neural network to address the problem that metapaths can only represent chain structures.The model learns the weights of nodes within metapaths or metastructures through a local attention mechanism,and learns the weights between different metapaths and metastructures through a global attention mechanism,and aggregates node information based on these adaptive weights.By supplementing the semantics of metastructures to metapaths,the problem that metapaths can only represent simple structures is solved.This paper conducts experiments on several open datasets,and explore the representation ability of different structure combinations.The results verify the effectiveness of the model and demonstrate the different roles of metapaths and metastructures in the results.(3)In order to address the limitations that metapaths and metastructures can only represent limited kinds of structures and depend on expert knowledge in predefining heterogeneous structures,this paper proposes a generalized heterogeneous graphlets-based heterogeneous graph neural network.The model proposes a frequent heterogeneous graphlets mining algorithm,which uses the geometric distribution to randomize the mining process of heterogeneous graphlets and automatically generate generalized heterogeneous graphlets to solve the problem of single symmetric structure and dependence on expert knowledge.To address the problem of semantic redundancy in multiple structures,this paper proposes a redundancy-free sorting method to find optimized high-order structure combinations and transforms heterogeneous graph elements into corresponding Eulerian-trails.By learning adaptive weight aggregation of node information on the Eulerian-trails,the model achieves node representations with the minimal semantic redundancy.Experiments conducted on several open heterogeneous network datasets demonstrate the effectiveness of the proposed model.(4)To address the problem of incomplete high-order structures in real-world data,this paper proposes a lattice-based incomplete heterogeneous graph neural network model.The model constructs characteristic lattices and structure lattices by calculating node features and the partial order relationships of structures,thus solving the problem of losing information and introducing noise during the process of incomplete heterogeneous graph modeling.The model updates node representations by aggregating information from the characteristic lattice and structure lattice,and the generated structure lattice contains all structure combinations that appear in the original data and fully covers the semantic of.The validity and stability of the model in representing incomplete heterogeneous graphs are verified through experiments on several open incomplete graph datasets.In summary,this paper provides a comprehensive investigation into the challenge of learning high-order structure representations in heterogeneous information networks.To this end,corresponding models are proposed to enhance the representation capability of high-order structures,reduce the structural semantic redundancy,and enrich the semantic diversity of node structures. |