Font Size: a A A

Research And Application Of Social Network Data Acquisition Strategy

Posted on:2019-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:B X ChenFull Text:PDF
GTID:2428330545496911Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Due to the rise of the Internet and the development of communication technologies,online social networks(OSN)have become a part of people's life.Online social networks have infiltrated people's lives and are nowadays the most important mobile Internet applications.Many organizations are interested in online soc ial networks where sociologists collect relevant data to study online user behavior.Market investigators use OSN's data to designate marketing strategies.Social network providers improve the user experience by understanding social graph and user behavior,optimizing data-storage design and cloud services,or providing personalized service.The huge amount of data in social networks has brought many challenges to research.First of all,considering the business secrets and information security of users,service providers are unwilling to share their company's data even in anonymized form.Secondly,It is impractical to get a complete data set from large-scale OSNs because it is extremely time-consuming to capture hundreds of millions of user data.At the sa me time,high-performance computer clusters need a lot of time to deal with such a large social graph.Finally,the number of online social networks users grows rapidly and the relationships between users change frequently.Therefore,it is very important to designing effective sampling algorithms in social networks.A breadth-first search algorithm(BFS)is a commonly used traverse method for graphs,but BFS excessively selects nodes with high degree,and this offset is hard to correct.Random walk(RW)is a classic method of network crawling,but it also tends to extract some high degree nodes and have been low sampling efficiency.Metropolis-Hastings random walk(MHRW)is a typical unbiased sampling algorithm,but the algorithm repeatedly collects low degree nodes in some high-clustered subnet.The main research results of this paper are as follows:(1)The purpose of this paper is to study an unbiased sampling method that can extract a representative sample from the social graph.We propose an improved algorithm based on MHRW,called Unbiased Delay sampling(UD algorithm).We find that UD can adapt to all kinds of different network connectivity.O n one hand,UD has a better degree distribution when the sample does not consider repeated nodes;on the other hand,UD algorithm can reduce the probability of reiterated nodes selected to sample and improve the ability of network discovery.(2)In the real social network Weibo we implemented the UD sampling algorithm,and use other sampling algorithm BFS,RW and MHRW collected 100,000 data in Weibo respectively.The experiment found that UD sampling algorithm in the real network environment can reduce the sample repetition rate.(3)In actual,sampling algorithm does not know when to stop sampling,stop sampling earlier will make the data inadequate.We propose an algorithm to determine whether the data is adequate during the sampling process.The algorithm is based on the Geweke convergence criterion.Experiments show that our data convergence method can guide the safety of the sampling stop,and determine whether the sampling data is sufficient or not.
Keywords/Search Tags:social network, Metropolis-Hastings random walk, degree distribution, convergence criterion, topological attributes
PDF Full Text Request
Related items