Font Size: a A A

Application Of Decision Tree And Boosting Algorithm In Practical Problems

Posted on:2019-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:L L ZhangFull Text:PDF
GTID:2429330563458864Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Decision tree is an algorithm commonly used in machine learning,and it is a powerful classifier.It uses the tree structure to establish a model for the relationship between feature traits and potential results.It is different from neural networks and SVM.The prediction model of decision tree is easy to understand and the results are easy to explain.Decision tree are insensitive to the missing values and distribution of variables,and almost can be applied to any type of data sets.Therefore,they are used in various fields widely.However,when fitting a model,one must consider the cost of making different types of errors,and find ways to optimize the model.In this paper,a decision tree fitting model is adopted.The specific algorithms used are C5.0 algorithm and CART algorithm.Boosting algorithm is added to each algorithm to improve model performance.When we compare the model performance,often used to compare the accuracy or error rate of the model.But in practical problems,when comparing the performance of different models,can not simply compare the accuracy of the model in the test set.The selection of the test set has a certain factor.And it is not reliable to compare the accuracy of the model when there is a problem has cost matrix.Therefore,sensitivity of the model and the hypothetical test in statistics are used in this paper.Due to the lack of independence between groups of samples,when comparing the sensitivities of multiple models groups,Friedman's nonparametric statistical test was used to.The test results showed that there were significant differences between the samples.When comparing the performance of the two models,because the samples are not independent,but the samples are tested by the normality,so the t test is used to compare the two samples.The test results showed that the C5.0 algorithm model has the best sensitivity and can best classify potential customers.It provides the most effective guidance for the achievement of banking business indicators.
Keywords/Search Tags:Decision tree, Boosting algorithm, Hypothesis Test
PDF Full Text Request
Related items