| Online social networks(OSNs),as an important part of the Internet,are receiving more and more attention because of the huge economic value they contain.How to estimate the number of users of online social networks efficiently and accurately has been a fundamental problem in the field of network science.Usually,it is almost impossible to get the exact number of users of the whole social network,because the huge user size of social networks makes it costly to acquire and process the data with huge storage and computation costs.Even if it is possible to spend huge resources to process these data,it is likely to miss the best time when facing time-sensitive tasks such as business decisions.Therefore,the sampling estimation method is a more practical approach.In this paper,simple random walk sampling(RW)and MH random walk sampling(MHRW)are used as the main sampling methods to focus on the shortcomings in the traditional algorithms for estimating the number of social network users,and the research revolves around both sampling methods as well as statistical inference to solve the problems of the traditional estimation algorithms such as the excessive relative errors in the estimation results.The main work of this paper is as follows.1.The estimation algorithm based on random wandering sampling achieves the estimation of the number of social network users by randomly visiting some network users and counting the number of repeated visits of each user.However,the relative errors of various current social network user size estimation algorithms are very large when the sampling rate is small.It is found that the main reason for the large relative error is the large number of "spurious collisions" generated by the low degree nodes in the sampling process,which makes the actual number of collisions in the sample much larger than its expected value.In order to improve the problem of large errors when the sampling rate is low,this study proposes a correction algorithm based on collision counts based on the estimation algorithm of random walk sampling to improve the accuracy of the estimation algorithm based on random walk sampling by correcting the collision errors generated by low nodes.The experimental validation using real social network dataset shows that the error correction algorithm can effectively reduce the relative error of estimation compared with the traditional estimation algorithm based on random wandering sampling.2.The traditional social network user size estimation algorithm based on MH random walk sampling usually has a large estimation error.It is found that because MH random walk is more likely to accept low-degree nodes into the sample,the sampling is easily caught in a loop at low-degree nodes,making some nodes to be repeatedly sampled,resulting in a large relative error of estimation results.Therefore,a subgraph sampling algorithm based on MH random wandering is proposed.First,by sampling a very small portion of nodes on the original social network using a breadth-first search algorithm to form a subgraph,the subgraph retains the key topological features of the original graph.Then an iterative algorithm is designed to achieve a more accurate estimate of the social network user size by executing the MH random walk sampling algorithm on the subgraph and then iteratively correcting the estimation error of the original graph. |