Font Size: a A A

Dynamic Sampling Design For Internet Data

Posted on:2021-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:F L XiaoFull Text:PDF
GTID:2370330623972761Subject:Statistics
Abstract/Summary:PDF Full Text Request
The level of information technology in today’s society is constantly improving,and information digitization is the inevitable result of technological progress in modern society.In the era of advanced information technology,all the information in life will be gradually digitized.These data are very large,which can be guaranteed by the highly developed scientific and technological level.In the current life,a lot of information will generate data,and the generation of these data comes from a variety of things.The largest source of data is the network channels,which are huge and complicated at all times in the network.These data have four characteristics of big data,including four characteristics: large amount of data,complex types and more,low value density and high speed and timeliness,that is,Volume,Variety,Value,Velocity.In the current research on the Internet,a large number of researches are aimed at improving the efficiency of data analysis.In order to improve the efficiency of data analysis,it is necessary to change the traditional sampling method to obtain samples.Only in this way can the statistics of the original overall network be more accurate and useful information obtained efficiently,and the design of sampling schemes has become a hot spot in current Internet research.This paper is based on this background.To the social network data,by designing two sampling methods: dynamic sampling based on the dynamic sampling frame with equal time interval and dynamic sampling based on the dynamic sampling frame with equal data amount,retain the data and obtain the relevant sample network.Finally,compare the effect obtained by the dynamic sampling method with the estimated effect of the original overall network,and analyze which effect is better.First,simulation experiments will be used to verify the feasibility of the constructed dynamic sampling frame.Second,it will use real-life data.In the article,the real user film review data of Douban movies will be used as the research object to conduct an empirical analysis of the correlation.The actual results obtained by the two dynamic sampling methods are analyzed and compared in detail from the aspects of word cloud map and LDA topic model.During the design of this study,the experimental results of random simulation experiments were empirically analyzed,and the final conclusions were:(1)Compared with the traditional sampling method,the sampling ability of the two sampling methods will be more effective.Moreover,both belong to the non-probability sampling method,the efficiency of obtaining the sample is higher,and the sample quality is also very good.(2)The dynamic sampling based on the dynamic sampling frame with equal time interval can describe the regularity of the hot events in the process of propagation,which vividly reflects the network data flow rate of each explosion point from the generation to the death of the hot events;The dynamic sampling based on the dynamic sampling frame with equal data amount can accurately reflect the time point of the hotspot event burst point.In practical applications,grasping the time point of the hotspot event burst point is of vital importance to enterprises and governments.Public relations,public opinion,and other aspects are of great significance,and can provide a reference for government and enterprises to conduct public opinion analysis and control.(3)In the random simulation phase,both sampling methods reflect good adaptability,and the feasibility of the two sampling methods is verified.In actual cases,different dynamic sampling frames can be selected according to the amount of data generated by hot events.Among them,the dynamic sampling method based on the dynamic sampling frame with equal time interval is more convenient than the dynamic sampling method based on the dynamic sampling frame with equal data amount.The simulation experiment takes less time and has higher sampling efficiency.(4)In the sampling of actual social network data,Douban film review data was selected.After text mining,semantic analysis,and comparison of word cloud maps and LDA topic models,it was verified again that the dynamic sampling method designed in this paper has practical value.They can provide a new idea for sampling in the current era of big data.
Keywords/Search Tags:Big data, Dynamic sampling frame design, Text analysis, LDA theme model
PDF Full Text Request
Related items