Multi-tree Monte Carlo methods for fast, scalable machine learning

Posted on:2010-04-09

Degree:Ph.D

Type:Dissertation

University:Georgia Institute of Technology

Candidate:Holmes, Michael P

Full Text:PDF

GTID:1440390002975798

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

As modern applications of machine learning and data mining are forced to deal with ever more massive quantities of data, practitioners quickly run into difficulty with the scalability of even the most basic and fundamental methods. We propose to provide scalability through a marriage between classical, empirical-style Monte Carlo approximation and deterministic multi-tree techniques. This union entails a critical compromise: losing determinism in order to gain speed. In the face of large-scale data, such a compromise is arguably often not only the right but the only choice. We refer to this new approximation methodology as Multi-Tree Monte Carlo. In particular, we have developed the following fast approximation methods: (1) Fast training for kernel conditional density estimation by injecting Monte Carlo into state-of-the-art dual-tree methods. Speedups as high as 105 have been shown on datasets of up to 1 million points. (2) Fast training for general kernel estimators (kernel density estimation, kernel regression, etc.) by injecting multiple trees into Monte Carlo. Speedups as high as 106 have been shown on tens of millions of points. (3) Fast singular value decomposition using a new form of sampling tree called the cosine tree. Speedups as high as 10 5 have been shown on dataset matrices containing billions of entries.;The level of acceleration shown by our methods represents improvement over the prior state of the art by several orders of magnitude. Such improvement not only speeds existing applications, it represents a qualitative shift, a commoditization, that opens doors to all manner of new applications and method concepts that were previously invisible, outside the realm of practicality. Further, we show how the diverse operations of our approximation methods can be unified in a Multi-Tree Monte Carlo meta-algorithm which lends itself as scaffolding to the development of fast approximations for other methods we have not yet considered. Thus, our contribution includes not just the particular algorithms we have derived but also the Multi-Tree Monte Carlo methodological framework, which we hope will lead to many more fast algorithms that can provide the kind of scalability we have shown here to other important methods from machine learning and related fields.

Keywords/Search Tags:

Methods, Monte carlo, Machine, Fast, Shown

PDF Full Text Request

Related items

1	Quantum Monte Carlo Methods And Their Applications In The Condensed Matter Physics
2	Opportunities For Observing Fast Radio Bursts With Wide-field Telescopes Based On Monte Carlo Simulations
3	Monte Carlo and quasi-Monte Carlo methods and their applications
4	Advanced Monte Carlo methods for analysis of very high temperature reactors: On-the-fly Doppler broadening and deterministic/Monte Carlo methods
5	Improved Least Square Monte-Carlo Methods And Applications
6	Exploring Phase Transitions Using Conventional Monte Carlo Simulations and Machine Learning Technique
7	Multilevel Quasi-Monte Carlo Methods With Applications In Finance And Statistics
8	Fast MC Modeling And Inverse Method For Determination Of Optical Parameters Of Turbid Medium
9	Contributions to the theory of Monte Carlo and quasi-Monte Carlo methods
10	Population Monte Carlo methods with applications in Bayesian statistics