Font Size: a A A

Link Prediction Algorithm Based On Time Series Combined From Node Similarities And Link Occurrences

Posted on:2018-08-03Degree:MasterType:Thesis
Country:ChinaCandidate:G W XuFull Text:PDF
GTID:2310330515974039Subject:Engineering
Abstract/Summary:PDF Full Text Request
In networks,nodes represent entities and links represent the relations between them.With the acquisition of more and more real-world network data,there has been a lot of interests in mining some valuable rules by analyzing networks.Link prediction,which is one of the most important problems in link mining,aims to estimate the existence likelihood of a link between two nodes by considering other links and the node attributes.Driven by many applications powerfully,the research on link prediction has made the rich achievement.Until now,the methods based on nodes similarity has been used widely,which estimate the existence likelihood of a link by the score of nodes similarity.Nodes similarity can be mainly estimated by the network structure and nodes attributes,besides,community information may be helpful.The network tends to evolve all the time in the real world,while the static link prediction methods make a good achievement in some applications,they have defects as the neglect of temporal information.Changes of network are ignored in static methods.If only the most recent snapshot of network is concerned,the prediction performance would decline when the network changes frequently.If the snapshots of all time periods are overlapped,the methods can't be used in domains where the interest is recurring links,for example,phone call and mail communication.As a consequence,the research pay more and more attention to temporal information.At present,there are mainly two directions on the research of link prediction.On the one hand,it aims to improve the static methods,to make better use of the observed network information,such as topology information,attribute information,community information.On the other hand,it aims to add temporal information to the spatial structure,to describe the change of network over time consequently and to predict preferably.Time series performs well in making use of temporal information,which describe the networks in past periods as discrete snapshots.Firstly,link occurrences between nodes can be modeled by time series.For the prediction of future links,the connections between nodes over time are the inputs.Decent prediction performance close to static neighborhood based methods are obtained by predicting future links based only on the past links.The method is also extended to combine time series method with other static methods,and the combination performs even preferably.Using the number of link occurrences instead of existence of a link is a major advantage of the approach,and the time series model make good use of change over time and recency.The hybrid model combines the ability to predict new links by static methods and the ability to predict recurring links by time series,which is a comprehensive method.The problem is,the hybrid model degrades to static similarity method for new links as a result of missing of link occurrences time series;besides,the hybrid model is the product of the final predicted value of static method and time series method,which is hard to catch the network information of each period.Another time series method makes an improvement by the use of nodes similarity time series,which predict future nodes similarity by the nodes similarities of past periods.This method also tries to combine nodes similarity computed from the entire network and the real link occurrence that period.The hybrid model adds nodes similarity to link occurrence after normalize them,which is the input of time series.Afterwards,the predicted value by the time series model is used as future link score.However,the performance of the hybrid model is not as good as the similarity time series model because the hybrid model is too simple too catch the correlation between nodes similarity and link occurrence,and they do not evolve in the same way.For the above shortcomings,this paper presents a new link prediction method based on a time series model combined from nodes similarities and link occurrences called SOTS.Firstly,compute nodes similarities of all the past periods by a tendency random walk.Secondly,combine it with the real link occurrence between the nodes pair that period by two time series model,which is used to predict the probability of link between the nodes pair in next period.The time series model can describe the changes of nodes similarities and real link occurrences well,at the same time,we discuss the correlation between nodes similarity and real link occurrence,which is creative.This method can be used to predict new links and recurring links in evolving networks.The contributions of this paper are as follows:(1)Combine node attributes and network topology by a new approach to computesimilarity scores.(2)Study the correlation between link occurrence and nodes similarity.(3)Integrate nodes similarity with link occurrence,to catch information in eachperiod adequately.Especially the bivariate time series model,which describesthe correlation between link occurrence and nodes similarity more reasonably.By the analysis of time series and static information,we combine the temporal dimension and spacial dimension of the network.Our method utilizes more comprehensive network information and works more effectively over traditional methods.We test this method by detail experiments against traditional methods on a Chinese DBLP dataset,which shows almost 15 percent accuracy improvement over state-of-the-art,which achieves the desired goal of this paper.
Keywords/Search Tags:Link Prediction, Nodes Similarity, Time Series, Evolving Networks
PDF Full Text Request
Related items