Cost estimation techniques for database systems

Posted on:2003-10-05

Degree:Ph.D

Type:Dissertation

University:The University of Wisconsin - Madison

Candidate:Aboulnaga, Ashraf Ismail

Full Text:PDF

GTID:1469390011985739

Subject:Computer Science

Abstract/Summary:

This dissertation addresses three issues related to selectivity and cost estimation for database system query optimizers.; The first part of the dissertation deals with estimating the cost of spatial selections where the query regions and the data objects are general polygons. Previously proposed cost estimation techniques only handle rectangular query regions over rectangular data objects, thus ignoring the significant cost of exact geometry comparison. We develop a cost model for spatial selections that takes this cost into account. We also introduce a new type of histogram for spatial data that helps in estimating the parameters of the cost model. We experimentally verify the accuracy of our proposed techniques on real and synthetic spatial data.; The second part of the dissertation introduces self-tuning histograms. While similar in structure to traditional histograms, self-tuning histograms are built not by examining the data, but by using feedback from the query execution engine about the selectivities of range selections on the histogram attributes to progressively refine the histogram. Since self-tuning histograms have a low up-front cost and the cost of building them is independent of the data size, they are attractive as a low-cost alternative to traditional histograms, especially multi-dimensional histograms. We present an experimental study of the accuracy of self-tuning histograms using real and synthetic data.; The third part of this dissertation is about estimating the selectivity of XML path expressions. XML is increasingly becoming the data representation format of choice for data on the Internet. Queries over XML data use path expressions to navigate through the structure of the data, and optimizing these queries requires estimating the selectivity of these path expressions. We propose two techniques for estimating the selectivity of simple XML path expressions over complex large-scale XML data: path trees and Markov tables. Both techniques work by summarizing the structure of the XML data in a small amount of memory and using this summary for selectivity estimation. We experimentally demonstrate the accuracy of our proposed techniques over real and synthetic XML data and explore the different situations that would favor one technique over the other.

Keywords/Search Tags:

Cost estimation, XML data, XML path expressions, Estimating the selectivity, Real and synthetic, Self-tuning histograms, Dissertation

Related items

1	The Research Of Cost Estimating On WY Real Estate Project
2	Estimating nonlinear cross section and panel data models with endogeneity and heterogeneity
3	Modeling approaches for cost and cost-effectiveness estimation using observational data
4	Research On The Current Situation And Improvement Of General Contractors' Cost Estimating Competence In Real Estate Development Projects
5	Study On The Estimation Of Environmental Cost Of The Regional Industrial Development
6	Research On The Selectivity And Market-timing,persistence Of Chinese Big Data Funds
7	Reform Of The Engineering Cost Evaluation Methods-Practice And Study Of The Analytical Cost Estimating Method
8	Method For Quickly Estimating Process Cost Of Customized Products Based On Knowledge Map And Its Application
9	Estimating and predicting merger effects and pass-through rates
10	Estimation And Analysis Cost Of Equity Capital Of Listed Companies