Randomized algorithms for matrix operations

Posted on:2004-12-31

Degree:Ph.D

Type:Thesis

University:Yale University

Candidate:Drineas, Petros S

Full Text:PDF

GTID:2460390011970997

Subject:Computer Science

Abstract/Summary:

PDF Full Text Request

In this thesis, we present fast, Monte-Carlo algorithms for performing useful computations on large matrices (e.g. matrix multiplication, Singular Value Decomposition (SVD), etc.). Our algorithms are quite different from traditional numerical analysis approaches and generally fit the following model: we assume that the input matrices are prohibitively large to store in RAM and only external memory storage is possible. We are allowed to read the matrices a few times and keep a “sketch” of the matrices in RAM. We only allow constant processing time after reading each element of the matrices; computing the “sketch” should be a very fast process. Indeed, one of the contributions of our work is to demonstrate that a “sketch” consisting only of a random sample of rows and/or columns of the matrices is adequate for efficient approximations. A crucial issue is how to form the random sample; an obvious choice is “blind” sampling (e.g. uniform sampling). Instead, we use adaptive sampling, namely, in one pass through the data we compute sampling probabilities—we keep rows/columns with probability proportional to the square of their lengths—and, in a second pass, we draw the sample. Adaptive sampling offers substantial gains; it allows us to approximately solve problems in sparse matrices as well as dense matrices. After these two passes, we perform computations only on the “sketch” (rows/columns that we kept).; The main focus of this thesis is a thorough presentation of the algorithms; we also address the issue of their optimality. We expect our algorithms to be useful in a large variety of settings where large data sets are modelled by matrices; indeed, we demonstrate their power by using them as tools to design approximation algorithms for various problems (mostly Information Retrieval and Data Mining applications). We believe that the underlying principle of this thesis—using adaptive sampling to create “sketches” of the data in a small number of passes—constitutes an appealing and intuitive direction to address the size and nature of modern data sets.

Keywords/Search Tags:

Algorithms, Matrices, Data, Large

PDF Full Text Request

Related items

1	Iterative solvers for large, dense matrices
2	Fast Algorithms For Some Structured Matrices
3	Parallel Algorithms And Preconditioning For Large Sparse Linear Sytems
4	Homogeneity Test Of K Covariance Matrices For Large-dimensional Data
5	Convergence Of The Empirical Spectral Distribution Function Of Large Dimensional Random Matrices
6	Algorithms for Large Scale Problems in Eigenvalue and SVD Computations and in Big Data Applications
7	On The Population Covariance Matrix In Large Dimensional Data
8	Efficient algorithms for eigenspace decompositions of Toeplitz matrices
9	Hypothesis Testing On High-dimensional Covariance Matrices And Beyond
10	Algorithms for normal forms for matrices of polynomials and Ore polynomials