Research On Several Classification Problems

Posted on:2016-10-11

Degree:Master

Type:Thesis

Country:China

Candidate:X Lin

Full Text:PDF

GTID:2180330473965234

Subject:Probability theory and mathematical statistics

Abstract/Summary:

PDF Full Text Request

Research on Several Classification ProblemsThis is a paper that focuses on the comparison among linear methods and nonlinear methods on classification problem.For linear methods, I will compare OLS (ordinary least squares), LDA (linear discriminant analysis) and logistic regression, and also the situation combining OLS, LDA with dimension reduced data using PCA and LDA. OLS is one of the fundamental linear model which is commonly used, not only regression problem, it can also deal with classification problem. The only difference is that the response variable becomes an indicator matrix but not a single vector like regression. In the indicator matrix, row represents observations while column represents classes, and in each row, number one under each column represents the observation belongs to that class while number zero means opposite. For linear classification problem, most of the time OLS will give a good results, however it also suffers a masking problem especially when classes distribute parallel in the space, it will then completely ignore the one in the middle. For LDA, similar as OLS, it is also sensitive to linear classification problem, but it is better because it avoid the masking problem OLS has. For logistic regression, which is designed for classify two classes originally, uses the probability ration to convert 0-1 response variable to continuous response variable and then solve the classification problem. Here, I improve it to solve multi classification problem, because of the characteristics of the model, it always performs well on classification.For nonlinear methods, this paper focuses on SVM (support vector machine), Tree, Bagging (bootstrap aggregating) and random forest. For SVM, by adjusting the kernel argument to ’linear’,’polynomial’ or ’radial’, it can adapt to all linear, polynomial and radial classification boundaries. And these options make it a strong classification method. For a single decision tree, because of its structure, it always has a high variance with low accuracy, especially when the classification boundary is linear. When it comes to bagging, with the help of ’taking average’of a plenty of trees, it basically solves the high variance and low accuracy problem of a single decision tree. However, if there is one variable which is especially sensitive to the classification problem, in most of the trees bagging builds, they may all have that variable as the top node, so these trees are correlated that reduce the efficiency of bagging procedure. At last, for random forest, it forces to random select variables for different trees, so that it will not suffer the correlation problem.Finally, by analyzing the real data ISOLET, I compare all the methods above and find the best ones for this specific data set.

Keywords/Search Tags:

classification, LDA, logistic regression, FDA, SVM, decision tree, bagging, random forest

PDF Full Text Request

Related items

1	Research On Personal Credit Scoring Mixed Model Based On Random Forest And Logistic Regression
2	The Cumulative Logistic Regression Classification Of Students' Poverty Data
3	Land Cover Classification Using The Combination Of Sentinel-2 Multi-temporal Data,Gradient Boosting Decision Tree And Random Forest
4	Improved Random Forest Algorithm Based On Pruning And Its Application To The Classification Of Remote Sensing Image
5	Effect Of Random Forest-Lasso Logistic Regression Model On Screening Health Risk Factors Of Fatty Liver
6	Research On High-resolution Image Classification Method Based On Multi-feature Parameters
7	Land Cover Classification From Remote Sensing Images Using A Hierarchical Decision Tree Integrated OBIA Method
8	Classification Variables Of Logistic Regression Model And Its Application Research
9	A New Algrithm Designed For Weighted Samples Classification And Some New Boosting Algrithms Designed For Classification Based On Additive Logistic Regression Model
10	Landslide Susceptibility Assessment Based On Logistic Regression?Artificial Neural Networks And Random Forests Model In Lanzhou City