Font Size: a A A

Error correction for high-dimensional data via convex programming

Posted on:2010-09-11Degree:Ph.DType:Thesis
University:University of Illinois at Urbana-ChampaignCandidate:Wright, John NFull Text:PDF
GTID:2440390002973073Subject:Engineering
Abstract/Summary:
Modern data processing applications in signal and image processing, web search and ranking, and bioinformatics are increasingly characterized by large quantities of very high-dimensional data. As datasets grow larger, however, methods of data collection have necessarily become less controlled, introducing corruption, outliers, and missing data. This thesis addresses the basic question of when and how simple data representations can be recovered from such non-ideal observations. In particular, we develop theoretical explanations for the efficacy of convex programming approaches to error correction for both vector and matrix-valued data.;For vector-valued observations, we prove that if a signal is sufficiently sparse with respect to a highly correlated basis, then as long as the fraction of errors is bounded away from one, it can be recovered from almost any error by solving a simple convex program. This result suggests that accurate recovery of sparse and nonnegative signals is possible and computationally feasible even with almost all of the observations corrupted.;For matrices, we consider the fundamental problem of recovering a low-rank matrix from large but sporadic corruption. We prove that "almost all" matrices of low enough rank can be efficiently and exactly recovered from almost all error sign-and-support patterns, again by solving a simple convex program. This result holds even when the rank is proportional to the observation dimension (up to a logarithmic factor) and a non-vanishing fraction of the observations are corrupted. It also implies the first proportional growth result for completing a low-rank matrix from a small fraction of its entries.;Finally, we show how these theoretical developments lead to simple, scalable, and robust algorithms for face recognition in the presence of varying illumination and occlusion. The idea is extremely simple: seek the sparsest representation of the test image as a linear combination of training images plus a sparse error term due to occlusion. In addition to achieving excellent performance on public databases, this approach sheds light on several important issues in face recognition, such as the choice of features and robustness to corruption and occlusion.
Keywords/Search Tags:Data, Error, Convex
Related items