Font Size: a A A

Cost estimation techniques for database systems

Posted on:2003-10-05Degree:Ph.DType:Dissertation
University:The University of Wisconsin - MadisonCandidate:Aboulnaga, Ashraf IsmailFull Text:PDF
GTID:1469390011985739Subject:Computer Science
Abstract/Summary:
This dissertation addresses three issues related to selectivity and cost estimation for database system query optimizers.; The first part of the dissertation deals with estimating the cost of spatial selections where the query regions and the data objects are general polygons. Previously proposed cost estimation techniques only handle rectangular query regions over rectangular data objects, thus ignoring the significant cost of exact geometry comparison. We develop a cost model for spatial selections that takes this cost into account. We also introduce a new type of histogram for spatial data that helps in estimating the parameters of the cost model. We experimentally verify the accuracy of our proposed techniques on real and synthetic spatial data.; The second part of the dissertation introduces self-tuning histograms. While similar in structure to traditional histograms, self-tuning histograms are built not by examining the data, but by using feedback from the query execution engine about the selectivities of range selections on the histogram attributes to progressively refine the histogram. Since self-tuning histograms have a low up-front cost and the cost of building them is independent of the data size, they are attractive as a low-cost alternative to traditional histograms, especially multi-dimensional histograms. We present an experimental study of the accuracy of self-tuning histograms using real and synthetic data.; The third part of this dissertation is about estimating the selectivity of XML path expressions. XML is increasingly becoming the data representation format of choice for data on the Internet. Queries over XML data use path expressions to navigate through the structure of the data, and optimizing these queries requires estimating the selectivity of these path expressions. We propose two techniques for estimating the selectivity of simple XML path expressions over complex large-scale XML data: path trees and Markov tables. Both techniques work by summarizing the structure of the XML data in a small amount of memory and using this summary for selectivity estimation. We experimentally demonstrate the accuracy of our proposed techniques over real and synthetic XML data and explore the different situations that would favor one technique over the other.
Keywords/Search Tags:Cost estimation, XML data, XML path expressions, Estimating the selectivity, Real and synthetic, Self-tuning histograms, Dissertation
Related items