Font Size: a A A

Density Bias Sampling Algorithm Based On Big Data And Its Application Research

Posted on:2018-11-19Degree:MasterType:Thesis
Country:ChinaCandidate:C Y PanFull Text:PDF
GTID:2357330539475024Subject:Statistics
Abstract/Summary:PDF Full Text Request
Along with the concept of big data was proposed,which becomes a focus in information technology field.While it needs more time to compute when mining big data.It is the crucial question how to figure out the huge data efficiently.Recently,we found that through two main aspects to improve the cluster analysis efficiency of the implementation: one is to improve the classical clustering algorithm,other is to reduce the size of the original data set by using the sampling technique.Facing the rapid growth of data,the growth rate of data is far greater than the speed of algorithm improvement and update.Therefore,sampling technique in cluster analysis becomes particularly important.When we use traditional sampling technique to sample the data set with large deviations and unknown distribution,it will contribute to many dilemma,such as poor sampling effect,poor sample representation,class loss and so on.While density biased sampling can effectively cope with the sampling problem of uneven distribution of data.In this thesis,we attach importance that the research of density sampling algorithm,and come up with a more effective sampling algorithm for large data with uneven distribution.In recent years,the study of density deviations sampling algorithm mainly focus on how to divide the grid space consistent with the data sets.Be aimed at taking more time to establish a variable grid,improving the existing variable meshing method.Firstly,we determine the partition granularity of the dimension data flexibly by the mean value of each dimension data in the original data set.Secondly,we complete the division of the variable grid with density similarity of interval to adjust interval.Finally,a new optimization algorithm of density biased sampling based on variable grid division is proposed which grid space and density biased sampling principles.Through the verification and analysis of the algorithm,the results represent that the algorithm has better sampling effect and sample quality.The sample data set only reflect the distribution characteristics of the original data set and avoid the loss of the class,but also has certain superiority in execution efficiency.
Keywords/Search Tags:Big data, Data mining, Density biased sampling, Variable grid division
PDF Full Text Request
Related items