| Network representation learning aims to study the low-dimensional representation of nodes in the network,and then apply this low-dimensional representation to scenarios such as clustering,classification,community discovery,and link prediction.Using random walks to generate sequences,and extracting low-dimensional features of nodes in the network from the sequence,is a commonly used method for network representation learning.For this type of algorithm,the generation of random walk sequences is crucial.In the case of limited computing resources,a random walk sequence that can contain more information is generated,which is helpful for the feature extraction phase of subsequent network table learning.This paper focuses on the two shortcomings of the traditional random walk,discusses the methods of random walk sequence generation,and explores the more effective random walk sequence generation method.First,the traditional random walk method only retains local information and ignores global information.For the entire network,the adjacency information represents the local similarity of the network,while the high-order similarity preserves the global information of the network.In this paper,an improved method is proposed for this deficiency.This method calculates the second-order similarity between the central node and surrounding neighbor nodes before random walk.Starting from a certain central point,when selecting the next node,according to the second order The probability distribution produced by the similarity ratio is chosen instead of a random equal probability selection.The author applies this improved random walk method to the DeepWalk algorithm.Through comparisons of experimental scenarios such as clustering,classification,and community discovery,it is found that this improvement is indeed superior to the original traditional random walk in various indicators.The network that is walking represents the learning method.Second,traditional random walks do not consider attribute similarity.For a network with nodes with attributes,random walks should retain not only their local similarity and global similarity,but also their attribute similarity.For such networks,if they use traditional random walks directly,Only local information is retained.In view of this shortcoming,this paper proposes another improved method of randomwalk.Calculate the similarity of attributes for each point and surrounding points,and generate the probability distribution of the similarity of the attributes,and select the next according to the probability distribution.node.The algorithm integrates the attribute features of nodes into the routing process of random walks.The generated random walk sequence contains not only the adjacent information of the network,but also the attribute information of the nodes.The author conducts experiments on the citation dataset,which proves that the clustering result is more stable when the "interference side" increases. |