Font Size: a A A

Low Dimension/Rank Data Representation And Embedding

Posted on:2015-02-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y XieFull Text:PDF
GTID:1228330467456144Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Machine learning techniques have been widely applied to many scientific domains as diverse as engineering, astronomy, biology, remote sensing, and economics. The dimensionality of scien-tific data could be more than thousands, such as digital images and videos, gene expressions and DNA copy numbers, documents, and financial time series. As a result, data analysis on such data sets suffers from the curse of dimensionality. To solve this problem, the dimensionality reduction/low-rank algorithms have been proposed to project the original high-dimensional feature space to a low-dimensional space, wherein the important statistical properties are well preserved.We consider the problem of obtaining more clean and clear data drawn from the subspaces and corrupted by noise. We define this problem as a noise reduction problem, where the goal is to decompose the data matrix as the sum of a clean, self-expressive, low-rank matrix plus a matrix of noise. Our key contribution is to show that, for noise data sets, this non-convex problem can be solved very efficiently and in closed form for the SVD of the noisy data matrix, it can reduce the rank and the noise simultaneously. In this paper, we study a robust data noise reduction model which use h norm for robust data representation and Frobenius norm for convex proposed model. We perform extensive experiments in clustering work and semi-unsupervised image classification, compare the proposed method in both works with other methods. Then analyze the accuracy param-eter and subspace finding. Results validate the effectiveness of the proposed approach, including some interesting results of noise reduction, accuracy improvement, and increased performance on both tasks. Experiments present out our approach perform quite well, we can get great accuracy improvement.Many real life applications brought by model technologies often bring much noisy data from different sources. We study a robust low-rank representation model to handle the noisy real world data. We also develop a novel optimization approach to learn the presented model which is guar-anteed to converge to the global optimizers. Unlike the recent low-rank representation, we study to compute the sparsest representation of a collection of vectors jointly by denoising the data with a fraction corrupted and enforcing its rank reduced to the inner rank of the entire data. As a result of our method, we provide another approach to approximate the well known low-rank optimiza-tion problem. Both theoretical and experimental results show that our low-rank representation is a promising tool for data preprocessing.Lasso-type variable selection has increasingly expanded its machine learning applications. In this paper, uncorrelated Lasso is proposed for variable selection, where variable de-correlation is considered simultaneously with variable selection, so that selected variables are uncorrelated as much as possible. An effective iterative algorithm, with the proof of convergence, is presented to solve the sparse optimization problem. Experiments on benchmark data sets show that the proposed method has better classification performance than many state-of-the-art variable selection methods.Many real life applications often bring much high-dimensional and noise-contaminated data from different sources. In this paper, we consider de-nosing as well as dimensionality reduction by proposing a novel method named Robust Integrated Locally Linear Embedding. The method combines the two steps in LLE into a single framework and deals with de-nosing by solving a l2,1-l2mixed norm based optimization problem. We also derive an efficient algorithm to build the proposed model. Extensive experiments demonstrate that the proposed method is more suitable to exhibit relationship among data points, and has visible improvement in de-noising, embedding and clustering tasks.
Keywords/Search Tags:low-rank, data expression, data embedding
PDF Full Text Request
Related items