Font Size: a A A

Randomized algorithms for matrix operations

Posted on:2004-12-31Degree:Ph.DType:Thesis
University:Yale UniversityCandidate:Drineas, Petros SFull Text:PDF
GTID:2460390011970997Subject:Computer Science
Abstract/Summary:PDF Full Text Request
In this thesis, we present fast, Monte-Carlo algorithms for performing useful computations on large matrices (e.g. matrix multiplication, Singular Value Decomposition (SVD), etc.). Our algorithms are quite different from traditional numerical analysis approaches and generally fit the following model: we assume that the input matrices are prohibitively large to store in RAM and only external memory storage is possible. We are allowed to read the matrices a few times and keep a “sketch” of the matrices in RAM. We only allow constant processing time after reading each element of the matrices; computing the “sketch” should be a very fast process. Indeed, one of the contributions of our work is to demonstrate that a “sketch” consisting only of a random sample of rows and/or columns of the matrices is adequate for efficient approximations. A crucial issue is how to form the random sample; an obvious choice is “blind” sampling (e.g. uniform sampling). Instead, we use adaptive sampling, namely, in one pass through the data we compute sampling probabilities—we keep rows/columns with probability proportional to the square of their lengths—and, in a second pass, we draw the sample. Adaptive sampling offers substantial gains; it allows us to approximately solve problems in sparse matrices as well as dense matrices. After these two passes, we perform computations only on the “sketch” (rows/columns that we kept).; The main focus of this thesis is a thorough presentation of the algorithms; we also address the issue of their optimality. We expect our algorithms to be useful in a large variety of settings where large data sets are modelled by matrices; indeed, we demonstrate their power by using them as tools to design approximation algorithms for various problems (mostly Information Retrieval and Data Mining applications). We believe that the underlying principle of this thesis—using adaptive sampling to create “sketches” of the data in a small number of passes—constitutes an appealing and intuitive direction to address the size and nature of modern data sets.
Keywords/Search Tags:Algorithms, Matrices, Data, Large
PDF Full Text Request
Related items