Semi-analytical method for analyzing models and model selection measures

Posted on:2010-11-02

Degree:Ph.D

Type:Dissertation

University:University of Florida

Candidate:Dhurandhar, Amit

Full Text:PDF

GTID:1448390002476887

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

Considering the large amounts of data that is collected everyday in various domains such as health care, financial services, astrophysics and many others, there is a pressing need to convert this information into knowledge. Machine learning and data mining are both concerned with achieving this goal in a scalable fashion. The main theme of my work has been to analyze and better understand prevalent classification techniques and paradigms which are an integral part of machine learning and data mining research, with an aim to reduce the hiatus between theory and practice.;Machine learning and data mining researchers have developed a plethora of classification algorithms to tackle classification problems. Unfortunately, no one algorithm is superior to the others in all scenarios and neither is it totally clear as to which algorithm should be preferred over others under specific circumstances. Hence, an important question now is, what is the best choice of a classification algorithm for a particular application? This problem is termed as classification model selection and is a very important problem in machine learning and data mining. The primary focus of my research has been to propose a novel methodology to study these classification algorithms accurately and efficiently in the non-asymptotic regime. In particular, we propose a moment based method where by focusing on the probabilistic space of classifiers induced by the classification algorithm and datasets of size N drawn independently and identically from a joint distribution (i.i.d.), we obtain efficient characterizations for computing the moments of the generalization error. Moreover, we can also study model selection techniques such as cross-validation, leave-one-out and hold out set in our proposed framework. This is possible since we have also established general relationships between the moments of the generalization error and moments of the hold-out-set error, cross-validation error and leave-one-out error. Deploying the methodology we were able to provide interesting explanations for the behavior of cross-validation. The methodology aims at covering the gap between results predicted by theory and the behavior observed in practice.

Keywords/Search Tags:

Model selection, Machine learning and data mining

PDF Full Text Request

Related items

1	Semi-analytical method for analyzing models and model selection measures
2	Research On Application Of Machine Learning And Data Mining In Bioinformatics
3	Research On Machine Learning Algorithm With Environmental Data Prediction
4	Studies Of Several Mathematical Models And Algorithms In Data Mining
5	Data Mining Model Selection Based On Multiple Criteria Decision Making
6	Analysing Correctness Of Implementations Of Machine Learning Algorithms By Machine Learning
7	Research On Subcontractor Selection Based On Machine Learning Under Big Data
8	The Graduates’ Income Forecast And Analysis Research Based On Machine Learning
9	Application Of Data Mining On Producing Process Of Stream Model Industry
10	Mathematical programming approaches to machine learning and data mining