Font Size: a A A

Design And Implementation Of Online Aggregation Technology Based On Stratified Sampling

Posted on:2016-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:C XinFull Text:PDF
GTID:2310330509959728Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the face of the increasing amount of data, approximate query can get approximate results fast by dealing with a small amount of data and, compared with the precise query, it improved the efficiency of data query. Online aggregation technology is a widely used method of approximate query, but it is affected by data distribution. It may lead to large error and may require get more approximate result by processing the majority of data when target data distribute sparsely in the original data. Especially, there is no reliable upper bound when the amount of data is very lager. Sampling method can guarantee the running time in the acceptable range by reducing the scale of the data.Presenting a method for online aggregation based on stratified sampling. Created several different samples based on different query column set to satisfy the most precise queries, at the same time to reduce the size of data and ensure the upper bound within an acceptable range. Using continuous query technology based on the stratified sampling not only makes the accuracy of the result increases with the waiting time extend, but also can terminate query when users satisfied with the results. According to the sequential access policies of online aggregation, it could get approximate result with higher accuracy and smaller error by specifying each continuous query processing data that were taken from each stratification. Therefore it reduces the influence of the skewed distribution data and the average waiting time.Detailed description the key issues and implementation of the method, including the stratified strategy, the method of basic domain matching merge, the implementation of stratified sampling by reverse index and optimization, the method of continuous query, sample data storage mode, and the calculation and analysis on correctness of results. The experimental results show that the method can effectively reduce the influence of the skewed distribution data.
Keywords/Search Tags:Stratified sampling, Online aggregation, Approximate query processing, Data distribution
PDF Full Text Request
Related items