| As an important subject in the field of complex networks,link prediction is one of the important bridges between complex networks and information science.Link prediction is to predict the possibility of connection between two nodes in the network that have not yet generated edges through known network nodes and network structure.Network representation learning abstracts the complex system into a network model:nodes represent entity objects and edges represent the relationship between entity objects.The nodes in the network are mapped to the low dimensional vector space through a specific algorithm.On the premise of shielding the characteristics of high dimensionality and sparsity of complex networks,the network structure information is captured to reduce the time and space complexity of the algorithm.Based on network representation learning,this thesis studies link prediction algorithms on multiple real data sets.The main contents are as follows:(1)In view of the shortcomings that the current link prediction algorithm does not make full use of the network structure information:it only considers the node information,and does not consider the contribution of the edge information and community attributes to the formation of the link.This thesis makes full use of the information of nodes,links and community structure,and adopts two quantitative indicators:link prediction algorithm combining collective influence and edge clustering information(CELP),which integrates collective influence and edge clustering information,measures the importance of nodes and edges at the same time;The Jaccard coefficient of Community relevance(CRJC)measures Community relevance.At the same time,in view of the deficiency that the random walk process of Deepwalk algorithm can not fully mine the structural information of the network,combined with the above two indicators,the random walk is truncated based on CELP value and CRJC value respectively,and the walk sequence is input into Skip-gram model for training to obtain the corresponding node vector representation matrix respectively,and the similarity of node pairs is measured by the euclidean distance of node vector,The weighted sum is carried out to obtain the node similarity ranking,so as to measure the link probability between node pairs.This algorithm is named as NSCA-LPA,a link prediction algorithm based on node similarity and community attribute.The experimental results of several real data sets verify the performance of the algorithm,and also confirm that the information such as nodes,links and community structure plays a positive role in the formation of edges between nodes.(2)Since the calculation of quantitative index in NSCA-LPA algorithm can not be applied to large-scale complex networks,a generative adversarial network link prediction algorithm WBGGAN-LPA based on wasserstein distance is proposed.On the premise of retaining the characteristics of network structure,the link prediction task is completed under low time complexity.Firstly,the difficulty in training and convergence of GraphGAN model is analyzed.The reason is that when the discriminator is approximately optimal,the loss function of the generator is equivalent to minimizing the JS divergence between the generated sample distribution and the real sample distribution.When there is no overlap between the generated distribution and the real distribution or the overlap can be ignored,the JS divergence is constant,resulting in the gradient obtained by the generator is 0 and the gradient disappears,The generator cannot proceed with optimization.Firstly,wasserstein distance and gradient penalty term are introduced to construct loss function,which not only speeds up the training and convergence speed,but also provides meaningful gradient.Secondly,considering that GraphGAN model samples negative sample nodes by constructing node connected tree,the sampling strategy is inefficient and the sampling time of each node is uneven,the truncation offset random walk sampling negative sample node is used to replace the original sampling strategy.While improving the sampling efficiency,the walk mode can be controlled through parameter setting to make the sampling strategy more flexible.The improved model is named WBG-GAN.Finally,aiming at the sparse characteristics of large-scale complex networks with"more nodes and fewer links",NetFold algorithm is used to compress the network scale on the premise of preserving the first-order proximity of the network.After initializing the sub network graph with the smallest scale,node2vec algorithm is used to input it into WBG-GAN model for generation countermeasure training,and then the WBG-GAN model is trained layer by layer.The WBG-GAN parameters of each layer use the parameters of the previous layer as the initial parameters,Finally,the vector representation matrix of the nodes in the original network graph is obtained.The euclidean distance of the node vector is used to measure the similarity of the node pairs,so as to measure the link probability between the node pairs and complete the link prediction task.The experimental results of multiple real data sets verify the superior performance of the algorithm in dealing with large-scale complex networks,overcome the shortcomings of GraphGAN model,and significantly improve the link prediction effect. |