| With the rapid development of information technology,query and recommendation play an increasingly important role in many aspects.People need to recommend based on the closeness of similarity.SimRank,as a commonly used similarity measure model,has been widely used in many fields such as recommendation system,collaborative filtering and link prediction.The single source Top-k problem based on SimRank metric in graph attracts more and more attention due to its wide application.Most of the existing algorithms face the problem of inefficient query time and space when solving the single source Top-k problem,especially in relatively dense graph.Therefore,it is necessary to design a single source Top-k algorithm that is efficient in time and space.This paper proposes two strategies to solve this problem.The first strategy uses the local detection method to solve the single source Top-k problem,mainly by local path enumeration and combined with pruning technology to reduce the detection space and accelerate the solution process.The second strategy uses Monte Carlo simulation to solve the single source Top-k problem,and it designs new sampling methods.Path sampling is used instead of path enumeration to solve the single source Top-k problem efficiently.The specific work is as follows:Single source Top-k algorithm based on local detection.The method mainly uses local detection to avoid global calculation,and uses the upper bound of SimRank value to prun the detection space to accelerate the solution of Top-k.Firstly,the idea and basic method of local detection are given,and then the efficient LST algorithm is further proposed.The algorithm mainly consists of two steps:In the first step,an initial Top-k candidate set is generated according to the neighborhood information of the vertex to be queried;In the second step,the candidate set is pruned to reduce the detection space by using the SimRank value upper bound of the vertices that do not enter the candidate set and the current SimRank minimum of the candidate set.The time complexity of the algorithm is O(d2l),and the space complexity is O(dl),where d is the average indegree of the vertices in the graph,and l is the maximum step size of the random walk.Single source Top-k algorithm based on Monte Carlo simulation.This method mainly uses path sampling instead of path enumeration to reduce computational complexity.Firstly,the LMST-US algorithm is proposed,which performs uniform sampling from the vertices to be queried to detect Top-k candidate set.Then the LMST-SS algorithm based on stratified sampling is proposed.Based on it,the LMST-Tree algorithm using sampling path tree structure and the LMST-δalgorithm using threshold truncation are proposed respectively.The time complexity of the algorithm is O(rdl)and the space complexity is O(dl),where r is the number of samples.Different methods of this kind have different effects in practice,and the appropriate method can be selected according to the specific situation.Experimental results and analysis.The experiment first tested the influence of different parameters on the proposed algorithm.Then the methods proposed in this paper are compared with mainstream TopSim class algorithm,KM-SR algorithm and Sling algorithms in terms of query time,space and accuracy.The experimental results show that the proposed algorithms have excellent performance in terms of query time and space while maintaining a certain accuracy.In terms of query time,they are better than KM-SR and TopSim algorithms.Among them,the LST algorithm and the Sling that needs to be preprocessed are close in query time,and LST is even better than Sling in some data.In terms of space,the algorithms proposed in this paper take up less space,especially the Monte Carlo simulation class algorithm space occupancy is about 10%-55%of the comparison algorithm. |