| In recent years,the development and application of machine learning based on statistical theory has shown a vigorous trend,which is mainly reflected in the development of integrated learning,deep learning and other algorithms.This thesis mainly explores the improvement of model goodness of fit learning through the combination of statistical methods such as hypothesis testing and reinforcement learning.For binary classification problem,combining logistic regression and tree model is used to improve the weak predictive ability of single logistic regression or tree model.By referring to the idea of combining multiple weak classifiers into a strong classifier,the basic model is first constructed by CART algorithm,that is,the linear relationship between independent variables and dependent variables can be mined as well as the nonlinear relationship.Multiple tree models are generated step by step according to the sequence of dependence,and a comprehensive model result is generated by comprehensive fitting of the results of each tree model with logistic regression.First of all,this thesis introduces the relevant theories of logistic regression and decision tree,from the least square loss function based on the sum of squares of errors in linear regression to the description of the gradient descent method for the partial derivative of unknown parameters,and then from the probability perspective of maximum likelihood estimation to binary regression algorithm logistic regression and classification regression tree theory.Furthermore,the decision tree and logistic regression fusion algorithm were introduced to conceive the dependent mathematical relationship between multiple tree models.After the fitting error analysis based on the results of the previous round of model results,the sample weight was adjusted and then the next round of model training was started.This process was iterated repeatedly until the model with significant improvement could not be generated by adjusting the sample weight.Secondly,by comparing with the models of logistic regression,support vector machine,random forest and XGBoost algorithm on the fitting and prediction results of different data samples,the research and application of the fusion algorithm of decision tree and logistic regression in the binary classification problem in this thesis has the practical significance of exploring the learning method,and has a certain effect improvement compared with logistic regression and support vector machine.Compared with random forest,the effect is slightly better.Compared with XGBoost,it can save more time without losing larger generalization ability.Finally,it summarizes and studies the fusion algorithm of decision tree and logistic regression,which can not only increase the further application of these two kinds of algorithms,but also master more flexible methods to improve model prediction ability. |