Font Size: a A A

Research And Design Of Online Social Network Sampling Based On Sample Subgraph

Posted on:2024-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:Z LiFull Text:PDF
GTID:2530306917956769Subject:Engineering
Abstract/Summary:PDF Full Text Request
Online Social Networks(OSNs)maintain a huge amount of data,and how to sample the most representative subgraphs at a lower cost becomes an important research topic.Most of the existing sampling algorithms are only embodied in the unbiased characteristics of the sample list.In many cases,the induced subgraphs constructed from the sampled samples are difficult to represent the original graph structure.Some sampling algorithms overestimate the degree of the original graph,or the clustering coefficient of the collected subgraph is too low.This study investigated the unbiased characteristics of existing sampling algorithms and the structural characteristics of sample induced subgraphs,and proposed a new sampling method.This algorithm recalculates the sampling jump probability based on the original random walk algorithm,correcting the deviation of sample induced subgraphs,so that they can better represent the original image.The following three aspects have been improved:Firstly,the sampling algorithm in this paper collects adjacent nodes by calculating weights.After correcting the deviation during sampling,the obtained sample subgraphs are closer to the original image.Secondly,due to the elimination of the self circulation process,the sampling efficiency is greatly improved.Lastly,in the actual crawling process,when a node with a higher degree jumps,it is more complex to obtain the information of its adjacent nodes one by one.This paper provides a formula for estimating the expected jump probability,which roughly estimates the jump probability of the current node according to the degree distribution.For nodes with a high jump probability,it is not necessary to crawl their neighboring nodes one by one,in exchange for other sampling crawler strategies to reduce the frequency of OSNs access and improve crawler efficiency.The experimental results show that the sampling algorithm proposed in this paper is more close to the original image attribute structure in terms of degree distribution,cluster,and assortativity.In most cases,the algorithm perfects better than the existing sampling algorithm.This paper takes a variety of social network graphs as the research object,and compares them with the current mainstream random walk algorithm.Taking the Douban community network as the real research object for social network sampling,combined with the algorithm proposed in this paper,a social network sampling system for crawling,analyzing,and storing data on Douban websites is implemented.
Keywords/Search Tags:online social networkks (OSNs), Vertex sampling, Unbiased
PDF Full Text Request
Related items