Computing and querying datacubes

Posted on:2002-11-16

Degree:Ph.D

Type:Thesis

University:Columbia University

Candidate:Zaman, Kazi Atif-Uz

Full Text:PDF

GTID:2468390014450259

Subject:Computer Science

Abstract/Summary:

PDF Full Text Request

Datacube queries compute aggregates over database relations at a variety of granularities, and they constitute an important class of decision support queries. In this thesis we study problems pertaining to the computation of datacubes and frameworks for querying them.; Often one wants only datacube output tuples whose aggregate value satisfies a certain condition, such as exceeding a given threshold. For example, one might ask for all combinations of model, color, and year of cars (including the special value “ALL” for each of the dimensions) for which the total sales exceeded a given amount of money.; Computing a selection over a datacube can naively be done by computing the entire datacube and checking if the selection condition holds for each tuple in the result. However, it is often the case that selections are relatively restrictive, meaning that a lot of work computing datacube tuples is “wasted” since those tuples don't satisfy the selection condition.; Our approach is to develop algorithms for processing a datacube query using the selection condition internally during the computation. By making use of the selection condition within the datacube computation, we can safely prune parts of the computation and end up with a more efficient computation of the answer. Our first technique, called “specialization”, uses the fact that a tuple in the datacube does not meet the given threshold to infer that all finer level aggregates cannot meet the threshold. We propose a scheme of specialization transformations on the underlying data sets, using properties of the aggregates and threshold functions.; Our second technique is called “generalization”, and applies in the case where the actual value of the aggregate is not needed in the output, but used just to compare with the threshold. We refer to these as “projected datacube” queries. Generalization uses the fact that a tuple meets the given threshold to infer that all coarser level aggregates also meet the threshold. We also propose a scheme of generalization transformations. We demonstrate that computing the median is easier for projected datacubes.; In the second major piece of work we study a main memory based framework for querying datacubes. For large datasets with many dimensions, the complete datacube may be very large. In order to support on-line access to datacube results, one would like to perform some precomputation to enhance query performance.; We propose a main memory based framework which provides rapid response to queries and requires considerably less maintenance cost than a disk based scheme in an append-only environment. (Abstract shortened by UMI.)...

Keywords/Search Tags:

Datacube, Queries, Computing, Selection condition, Querying, Aggregates

PDF Full Text Request

Related items

1	Querying Optimization Research Based On XML Database
2	Near Ultraviolet Holographic Storage Based On The Modulation Of Merocyanine Aggregates
3	The Research Of Approximate Computing For Big Data Based On Sampling And Coreset
4	Queries And Computations On Ciphertexts In The Cloud
5	Computing Sliding Window Aggregates over Data Streams in a Scientific Workflow System
6	Study On Complex Query Processing Techniques For Advanced Spatial Applications
7	Research Of Querying Moving Object Trajectories In Spatio-Temporal Databases
8	Skyline Query Research For Massive RDF Data Under Distributed Computing Environments
9	An interface to grid file method for datacube storage and range sum query
10	Processing range-monitoring queries in mobile computing environment