| Real life is an interactive network composed of various types of entities and related relationships.Studying its heterogeneous characteristics from the perspective of the network,combined with the current mainstream machine learning technology to assist in improving the quality of production and life is an important research direction of data processing.Heterogeneous network embedding aims to mine the structural information and semantic information in the network to form a feature representation and provide data support for downstream task processing.At present,the research on heterogeneous network embedding algorithms has achieved many results,but there are still the following problems: first,the implicit semantic information of the network is difficult to discover and mine,and there is a problem of incomplete network information capture;second,the traditional network embedding model does not incorporate the impact of downstream community detect tasks on the embedding process that makes embedding results difficult to apply to downstream tasks.This thesis conducts in-depth research on the above two issues,and the research results are as follows:(1)Aiming at the problem of insufficient implicit semantic mining in the process of heterogeneous network embedding,a heterogeneous network embedding algorithm CEIHNE that fuses explicit and implicit information is proposed.First,we process the original terrorism crime data set,extract the multi-type entities and the multi-type relationships between entities,and form a network of terrorist crime behaviors;secondly,capture network information comprehensively from both explicit and implicit perspectives,and passing through the subnet extraction,subnetwork embedding and subnetwork fusion,we obtain the embedding representation of nodes;finally,the effectiveness of the embedding algorithm CEIHNE is verified by comparing other baseline models in two evaluation experiments of node classification and link prediction.(2)Starting from the fact that the traditional network embedding model does not fully consider the impact of community detect tasks on its embedding process,an attribute heterogeneous network embedding algorithm IICDAHE that integrates community detect effects is proposed.The model introduces semantic information and high-order structural information preservation methods based on meta-paths,uses unsupervised loss functions to measure the structural information of attribute heterogeneous networks,and uses supervised clustering loss to measure the impact of community detect on network embedding.The steps of mapping,feature aggregation and loss calculation obtain the final embedding representation of nodes.Finally,the effectiveness of the embedding algorithm IICDAHE is verified by comparing other baseline models in two evaluation experiments of node clustering and node classification.(3)According to the actual needs of the background project,an application visualization system based on heterogeneous network embedding algorithm was designed and implemented.The system mainly includes three parts: data layer,data processing and feature extraction layer,and application and visualization layer.Data layer mainly includes the specific data content and data storage methods used by the system.Data processing and feature extraction layer mainly preprocesses the data in the system,and obtains object feature representations in combination with feature extraction methods such as heterogeneous network embedding algorithms.Application and visualization layer is mainly to build a visualization platform to visualize the feature extraction results and the prediction results of actual application tasks. |