Font Size: a A A

The Theory And Application Of The Dimension Reduction On The High-Dimensional Data Set

Posted on:2006-01-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:L TanFull Text:PDF
GTID:1100360155472165Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
The analysis of high-dimensional data, like spaceflight remote sensing biology network and money market, etc, is faced with two puzzles. The first one is the curses of dimensionality which has challenged the pattern recognition and discovering formulas on high-dimensional data. The second is the blessings of dimensionality which shows that the abundance information of the high-dimensional data set means the new feasibility. How to express the high-dimensional data in the low-dimensional space and discover the intrinsic structure is the pivotal problem of high-dimensional information processing. Thereinto, dimension reduction as the availability method to overcome the curses of dimensionality has arouse the broad notice. The correlative research is in the ascendant. This thesis introduces and analyses in detail the concept of data set structure as well as the structure's application to the dimension reduction.The main achievements in this paper are as follows:Part one summarizes the development of the high-dimensional data processing and some of its drawbacks, like curses of dimensionality, high-dimensional space geometry, etc. At the same time, an instinctive analysis is made on the linear methods, including PC A (Principle Component Analysis), PP (Projection Pursuit), and the nonlinear, including MDS (Multidimensional Scalar), ISOMAP (Isometric Map), LLE (Locally Linear Embedding), Laplacian Eigenmap. The results show that the process of dimension reduction, either linear one or nonlinear one, can be divided into three independent and related phases. The first one is to describe quantitatively the data set structures. The second is to put forward the structure measurement by the first phase. The last is to construct the reduction rule. Therefore, the discovery and formation of a reduction method contains the following three steps: one is to construct the corresponding mathematic model on data set structures, another is to introduce a measurement for the structure or a how-to-choose rule, the last one is to construct the rule of reduction or lose based on the data set structures.Part two brings forward and probes into the concept of data set structures and some corresponding issues, such as structure's characteristics, choiceness, the relationship between the structure and the data set, etc. The analysis indicates the following results: Firstly, the notion of structure can better explain the originality of linear and nonlinear dimension reduction, the choiceness of dimension reduction results, the differences between different methods and the choices of methods on different occasions, etc; Secondly, the efficiency of present dimension reduction method can be improved and better methods can be explored by thoroughly studying data set structures and new structures; Thirdly, new methods of dimension reduction,which combines the superiority of linear methods and characteristics of nonlinear methods, can be sought out through studying both as a whole.Part three describes the linear (PCA, PP) and nonlinear (LLE, Laplacian Eigenmap) methods based on the data set structures. It explains what are the data set structures and what is the rule of the dimension reduction. Put forward are a new linear dimension reduction called Locality Unchanged Projection, a new nonlinear method, called dimension reduction with originally topological structure, an improvement to LLE called Robust Locally Linear Embedding, and a new method to estimate the intrinsic dimension, called Intrinsic Dimension Estimation Based On LLE. Finally, the experiences confirm their validities.Part four illustrates two applications of dimension reduction methods in the field of multi-sources data processing, namely, image recognition and extraction from the interesting region. The result shows that data set structure and dimension reduction based on data set structure can provide a new application stage to the analysis of high-dimensional data, increase the intellect to the high-dimensional data processing, improve the methods availability, and little by little, enter into the other fields of the high-dimensional data processing. At last, it can influence the data processing greatly.
Keywords/Search Tags:High-dimensional data set structure, dimension reduction, stucture projection, reduction rule, multi-sources space data
PDF Full Text Request
Related items