Font Size: a A A

Network Sampling And Statistical Inference On Social Networks

Posted on:2019-02-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:S R ChenFull Text:PDF
GTID:1360330611993114Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Nowadays the explosive growth of social network data has brought not only the wonderful opportunities but also the significant challenges to the development of network science.Researchers are trying to acquire and analyze Big and Whole network data.But in reality,it is very difficult and infeasible to obtain,store,retrieve,analyze and mine the whole data of the large-scale social networks with billions of nodes and edges.Even if the whole network data can be obtained and analyzed by using costly hardwares and computing resources,we can not draw the conlcusions within a reasonable period of time and will lose the best time to deal with the time-sensitive tasks in the public opinion monitoring,the business decision-making,the emergency management and the control and prevention of diseases...To solve the problem,the methodology of network sampling and statistical inference is proposed and well studied.It can support us to extract nodes and edges from a network and to estiamte the network characteristics under a complete theoretical system.Due to the advantages of acquiring local data of networks scientifically,estimating the population characteristics of networks efficiently and reducing the time and cost of the mining tasks,network sampling method has become a critical technique for studing the large-scale complex systems like social networks.This thesis focuses on the shortcomings in the theories and applications of network sampling and statistical inference on social networks,which include the discussion on the performance of existing techniques,the improvement of sampling and estimation theories,and the application of the classic methods.Specifically,the works of this thesis are summarized as follows.(1)The feasibility and effectiveness of applying network sampling methods in the bipartite network are studied.Considering the fact that little attention has been paid to the study of sampling methods for bipartite networks,we focus on the eight popular crawling methods which are widely used in one-mode networks and evaluate the feasibility and effectiveness of applying them in bipartite networks.From the simulations,we analyze the effects of different network structures and sampling settings on these sampling methods,and study the factors which can affect the performance of these methods.Through the synthesis comparisons,we summarize the performance of these methods and list suggestions for the selection of crawling methods on bipartite networks under different situations.(2)The methods for estimating the in-degree information and the population variables in directed networks are proposed.To deal with the problem that the tranditional method depending on the out-degreee information will introduce large biases when the studied networks are directed,we propose a new method to conduct statistical inferences on the directed networks.Combined with the framework of the random walk-based sampling,this method establishes the relationship between the visited frequency and inclusion probability of sample nodes,and provides a good estimation on in-degree information which is naturally hidden during the random walk process.Then we use the estimated in-degree information to adjust the samples and provide a new estimator for the population variables of directed networks.Although the estimates obtained by these proposed methods are biased,they are all closer to the true values compared with the tranditional method which depends on the out-degreee information(3)The ego-centric network sampling method has been proposed.To deal with the problem that the sampling method will generate biases or become invalid when the collected attribute information of nodes is unreliable or unknown,we propose the ego-centric network sampling method.Instead of directly collecting the information of ego nodes(samples),the proposed method collects the information of their neighbors and generates the population variables deponded on the potential structures of reciprocal edges.In addition,we also propose a bootstrap-based method to construct the confidential intervals.Finally,two improved versions of this method have been provided.One can conduct the inference with the information of a small amount of neighbors,and the other can reduce the effect of activity ratio on the performance of the method by the second-order information.(4)A local immunization strategy based on network sampling has been proposed.For the application of network sampling methods,we propose a local immunization strategy based on random walk-based sampling.Without sorting the nodes through the global degree information,this strategy well combines the existing random walk-based sampling technique,and can determine whether a visited node needs to be immunized by the local information obtained during the sampling process.In addition,the efficiency of the proposed method is just following that obtained with the targeted strategy which requires comprehensive global information,and much better than that obtained with the acquaintance strategy and random strategy which are based on the random sampling method for the eradication of epidemics.
Keywords/Search Tags:network sampling, statistical inference, social networks, big data of social network, random walk, directed network, bipartite network, ego-centric sampling, the estimation on the population characteristics of networks, local immunization strategy
PDF Full Text Request
Related items