Font Size: a A A

Research On Vector Approximation Method In High-dimensional Index Technology

Posted on:2006-11-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:J T CuiFull Text:PDF
GTID:1118360182960131Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
An important research issue in the field of image databases is the content-basedimage retrieval. High-dimensional indexing has been considered an important means toimprove the performance in querying large image databases, and it has been a veryactive research area in multimedia and database applications over the last few years. Ithas been observed that the performance of traditional indexing methods deterioraterapidly as dimensionality increases-a phenomenon which has become known as the'curse of dimensionality'. Efficient indexing schemes for high-dimensional data arerequired for real-time querying in large-scale image database. High-dimensionalindexing also can be used in other research area, such as data mining, patternrecognition, machine learning and data analysis.The phenomenon of 'curse of dimensionality'is introduced in this dissertationfirstly, and the current status and future development trend of high-dimensionalindexing are summarized. The vector approximation approach is an efficienthigh-dimensional indexing method, which can reduce the difficulty of 'curse ofdimensionality'. Most of the researches on high-dimensional indexing are based onvector approximation approach. This dissertation focuses research on the k-nearestneighbor search in large-scale image databases. Some new modified methods based onvector approximation approach are presented. Main innovative contributions are asfollows.1. The vector approximation approaches suggest accelerating the sequential scanby use of data compression, and they need to access all of the approximate vectors. Anew high-dimensional indexing structure based vector approximation method wasintroduced. The approximate vector is built at the discrete wavelet transform domain,and the first component is chosen as principal component. A sequence index is builtaccording to the principal component, which uses B+ tree to manage the approximatevectors. When performing k-nearest neighbor search, the partial distortion searchingalgorithm is used to reject the improper approximate vectors. Only a small set ofapproximate vectors are accessed during the search, which can reduce thecomputational complexity and I/O cost. The new method also can support quadraticform distance and L1 distance. For quadratic form distance, singular valuedecomposition technique is used to get the principal component. For L1 distance, anew multi-resolution data structure is built according to the sum of adjacent elements.The experiment results on large image databases show that the new method can improvethe performance of vector approximation approach.2. A new high-dimensional indexing structure called PCR-tree (principalcomponents R tree) was presented, which employs low-dimensional R-tree to managethe approximate vectors at principal components. The principal components in thePCR-tree are multi-dimensional, and the low dimensional MBRs (minimum boundingrectangles) can be built on approximate vectors at principal components. Whenperforming k-nearest neighbor search, a new lower-bound filtering algorithm is used toreject the improper nodes of PCR-tree. It's very important to select the dimension ofprincipal components. When the dimension of principal components is low, there ismore information lost leading more nodes to be examined. When it is high, the nodesaccess cost also becomes high due to dimensionality curse. The experiment results onlarge image databases show that the new approach provides a faster search speed thanother tree-structured vector approximation approaches.3. The approximate vectors are built according to scalar quantization scheme. Anew high-dimensional indexing structure based vector quantization is introduced. Thevector quantization is the best quantization scheme, which achieves a good balancebetween quality of vector reconstruction and the number of bits used to describe thevectors. A vector can be described using the cell in which the vector lies. In the newstructure, approximate vector is the index of codebook. The sphere is used to managethe data in the same cell. According to the sphere, the lower and upper bound ofdistance between query and vector can be calculated. When performing k-nearestneighbor search, the reproduction vector and the sphere can be used to filter theapproximate vectors. In order to reduce the size of codebook, the split vectorquantization structure is adapted. The experiment results on high-dimensional imagefeature databases show that the new approach can reduce the I/O cost dramatically thanvector approximation approach.4. Relevance feedback techniques are important approaches closing up thesemantic gap between high-level concepts and low-level features in image retrieval. Anew k-nearest neighbor search algorithm based on vector approximation approach forrelevance feedback image retrieval is introduced. Based on the feedback, thecorrelations of the underlying similarity metric between two consecutive searches isexploited, then the search result and feedbacks are used to filtering the approximatevectors in the next search round. Experiments show a remarkable reduction of vectorsaccessed, and they also show an improvement on the indexing performance comparedwith the existing search algorithms.5. In the image retrieval based on low-level feature, the extraction of featurevectors is a heuristic process that attempts to approximately capture relevantinformation. Approximate similarity search is an efficient way to reduce the difficulty ofcurse of dimensionality. For all of the indexing structures presented in this dissertation,the approximate similarity search algorithms are also introduced. (1) For the linear scanstructure, early termination strategies are used, in which search algorithms areprematurely stopped when current result is judged to be satisfying the approximationrequirements. (2) For PCR-tree, the indexing structure is modified to satisfy theapproximate search. (3) For the approximate search algorithm based on vectorquantization, a new structure is presented which store the approximate vectors byInverted File and organize the inverted file using Hash table. Experiments results showa remarkable speedup of performance in k-nearest neighbor search allowing a smallreduction of result quality.
Keywords/Search Tags:High-dimensional indexing, Content-based image retrieval, curse of dimensionality, approximate similarity search, relevance feedback, vector approximation approach, principal component sorting, principal components R tree, vector quantization
PDF Full Text Request
Related items