Research On Signle-source Top-k Query Algorithm Based On SimRank

Posted on:2022-12-23

Degree:Master

Type:Thesis

Country:China

Candidate:M T Min

Full Text:PDF

GTID:2480306779471624

Subject:Computer Software and Application of Computer

Abstract/Summary:

PDF Full Text Request

SimRank is used to evaluate node similarity based on graph topology.The single-source Top-k query problem based on SimRank is to find the k nodes with the highest similarity to a given query node,which is widely used in large-scale graph mining,collaborative filtering and other fields.When calculating the Top-k results,existing algorithms need to calculate the SimRank between all nodes and the query node,and then sort and filter out the Top-k result.Because the existing algorithms have many invalid calculations in the calculation process,the efficiency of solving the Top-k result is low.To this problem,this paper studies the single-source Top-k query problem.The specific research content is as follows.First,this paper proposes a Top-k query algorithm HitSimbased on the upper and lower bounds of SimRank.Unlike existing algorithms that directly calculate the SimRank of all nodes,HitSimimproves the efficiency of Top-k query by reducing the number of nodes to be calculated.The algorithm first classifies the nodes,obtains the corresponding upper bound of SimRank for different types of nodes,and then maintains the lower bound of the Top-k SimRank during calculation.By comparing the upper and lower bounds,the number of nodes to be calculated can be reduced.Secondly,this paper proposes an optimization algorithm HitSim-OPT for dense graphs.When HitSimclassifies the nodes of dense graphs,the nodes may be concentrated in one node set,resulting in poor pruning effect of HitSim.To solve this problem,HitSim-OPT directly calculates the SimRank upper bound of all nodes,maintains the SimRank lower bound of the Top-k result during the calculation process,and determines whether the corresponding node needs to be processed by comparing the upper and lower bounds.At the same time,for the nodes that need to be processed,the upper bound of SimRank can be used to reduce the number of random walks.Finally,based on 8 real datasets for testing,the experimental results show that the HitSimalgorithm and HitSim-OPT algorithm are nearly three orders of magnitude faster than the existing algorithms in terms of query efficiency,while at the same time ensures the accuracy of query results.

Keywords/Search Tags:

data mining, node similarity, single-source SimRank, Top-k query, random walk

PDF Full Text Request

Related items

1	Top-k Query In Graph With Simrank-based Similarity Measure
2	Research On Network Node Similarity Measurement Based On Relative Entropy
3	Research On Important Nodes Identification Of Complex Network Based On Random Walk
4	Research Of Link Prediction Algorithm Based On Network Structure And Random Walk Theory
5	Node Similarity Study Of Complex Networks Based On The Relative Entropy Of Multi-order Path Distribution
6	Identification Of Similarity Between Complex Network Systems And Its Application
7	Random Walk Problems On Two Classes Of Weighted Self-similar Networks
8	Random Walk And Its Applications In Risk Theory
9	Research On Node Influence Mining Algorithm In Complex Networks
10	Research On Semantic Query Method Of Genome Variation Data Based On Ontology