Computing and querying datacubes | | Posted on:2002-11-16 | Degree:Ph.D | Type:Thesis | | University:Columbia University | Candidate:Zaman, Kazi Atif-Uz | Full Text:PDF | | GTID:2468390014450259 | Subject:Computer Science | | Abstract/Summary: | PDF Full Text Request | | Datacube queries compute aggregates over database relations at a variety of granularities, and they constitute an important class of decision support queries. In this thesis we study problems pertaining to the computation of datacubes and frameworks for querying them.; Often one wants only datacube output tuples whose aggregate value satisfies a certain condition, such as exceeding a given threshold. For example, one might ask for all combinations of model, color, and year of cars (including the special value “ALL” for each of the dimensions) for which the total sales exceeded a given amount of money.; Computing a selection over a datacube can naively be done by computing the entire datacube and checking if the selection condition holds for each tuple in the result. However, it is often the case that selections are relatively restrictive, meaning that a lot of work computing datacube tuples is “wasted” since those tuples don't satisfy the selection condition.; Our approach is to develop algorithms for processing a datacube query using the selection condition internally during the computation. By making use of the selection condition within the datacube computation, we can safely prune parts of the computation and end up with a more efficient computation of the answer. Our first technique, called “specialization”, uses the fact that a tuple in the datacube does not meet the given threshold to infer that all finer level aggregates cannot meet the threshold. We propose a scheme of specialization transformations on the underlying data sets, using properties of the aggregates and threshold functions.; Our second technique is called “generalization”, and applies in the case where the actual value of the aggregate is not needed in the output, but used just to compare with the threshold. We refer to these as “projected datacube” queries. Generalization uses the fact that a tuple meets the given threshold to infer that all coarser level aggregates also meet the threshold. We also propose a scheme of generalization transformations. We demonstrate that computing the median is easier for projected datacubes.; In the second major piece of work we study a main memory based framework for querying datacubes. For large datasets with many dimensions, the complete datacube may be very large. In order to support on-line access to datacube results, one would like to perform some precomputation to enhance query performance.; We propose a main memory based framework which provides rapid response to queries and requires considerably less maintenance cost than a disk based scheme in an append-only environment. (Abstract shortened by UMI.)... | | Keywords/Search Tags: | Datacube, Queries, Computing, Selection condition, Querying, Aggregates | PDF Full Text Request | Related items |
| |
|