Font Size: a A A

Research On Clustering-based Hyperlink Prediction

Posted on:2020-12-21Degree:MasterType:Thesis
Country:ChinaCandidate:P F QiFull Text:PDF
GTID:2370330575489336Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Different entities in the real world connect to each other to form information networks.In the research of information network,link prediction,as an important direction,has achieved fruitful research results.However,the traditional link prediction algorithm mainly predicts the existence of links between two nodes.In the real world,many network links may exist between multiple nodes,so the concept of hyperlink is proposed.A hyperlink is a group of nodes with any number of the same or different types,which are connected together to form a multipath relationship.Therefore,the problem of hyperlink prediction is to predict the relationship between multiple nodes.It breaks the limitation of link prediction which is limited to the prediction between nodes,and can predict the rich and diverse information in the network.Therefore,it is necessary to study the problem of hyperlink prediction.Existing hyperlink prediction methods are often based on the similarity of hyperlinks in the whole network.If the number of hyperlinks in one category is larger or the relationship between one type of hyperlinks is closer than that of other categories,and the information of other types of hyperlinks is concealed,the types of hyperlinks predicted tend to be more of these types of hyperlinks,ignoring the number of observed samples.The information contained in insufficient hyperlinks can not well reflect the overall picture of the network.For example,if there are more spicy recipes in the recipe network,the predicted recipes are also spicy,and the information of other flavors will be masked and can not be predicted.In view of the shortcomings of existing hyperlink prediction methods,a clustering-based hyperlink prediction method is proposed.Firstly,the observed hyperlinks are clustered by clustering algorithm,and then the hyperlink prediction model is established for each cluster.Even if the number of hyperlinks observed in a class is small,the prediction results can be obtained on the cluster as long as the cluster is formed.This method can make full use of the information contained in the observation samples of each cluster to ensure the comprehensive coverage of the prediction results.The main work of this thesis is as follows:(1)Clustering hyperlinks.Because the matrix dimension formed by hyperlinks is large,traditional clustering methods are not easy to cluster them,so this thesis uses non-negative matrix decomposition to decompose the hyperlink matrix,uses low-dimensional feature matrix to represent the hyperlink matrix,and then uses K-means algorithm to cluster the low-dimensional matrix.The results show that the hyperlinks in each cluster have high similarity,while the hyperlinks between clusters have low similarity.(2)A clustering-based hyperlink prediction algorithm is proposed.The hyperlink prediction model is established for each cluster,which makes full use of the information contained in the observation samples of each cluster to ensure that the prediction results can cover the whole network,overcome the incomplete types of prediction results,and shorten the prediction time.(3)Three real data sets are used to validate the proposed method.The effectiveness and efficiency of clustering-based hyperlink prediction algorithm are investigated from three aspects:the accuracy of prediction,the type of coverage of prediction results and the efficiency of algorithm execution.The results are compared with those of other hyperlink prediction algorithms.The results show that the clustering-based hyperlink prediction algorithm has good performance.(4)A prototype system of hyperlink prediction based on clustering is designed and implemented based on C#language and MVC logic.The system intuitively presents three modules,namely data preprocessing module,hyperlink clustering module and hyperlink prediction module,which completely restores the method used in this thesis.
Keywords/Search Tags:Information network, Link prediction, Clustering, Hyperlink prediction
PDF Full Text Request
Related items