Font Size: a A A

Research On Sufficient Dimension Reduction Method Based On Regression Tree

Posted on:2021-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:B W WuFull Text:PDF
GTID:2370330620468097Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,the data is becoming more and more complex.The theory of sufficient dimension reduction is of great significance for studying such complex data.In the case of multivariate responses,there exists many problems.Therefore,this paper mainly focuses on the method of sufficient dimension reduction based on regression tree,which solves the problem of dimensional disaster in the case of multivariate responses.In the scenario of univariate response,traditional method usually use the slice method to divide the response variable.However,with the increase of the dimension,this method can easily lead to the lack of sample points in many slices.The regression tree method can divide the multi-dimensional space and the value of the leaf nodes is just the mean value after the space division.Based on this idea,this paper presents a new sufficient dimension reduction method based on regression tree,which can be GBDT,RF,Xgboost,etc.For SIR,SAVE and DR,this paper gives the method for estimating the kernel matrix.Finally,a lot of simulations and one example is used to verify the effectiveness of the method in multivariate responses.Compared with the existing methods,the method in this paper performs better in the case of high-dimensional variables.Regardless of whether it is a linear or non-linear model,the method can better estimate the dimension reduction direction in the presence of a certain degree of noise.When the sample size is small,RF works better.When the sample size is relatively large,the performance of GBDT,RF,and Xgboost is equivalent.Because ensemble learning models often have many hyperparameters,there is still no theoretical basis for the setting of hyperparameters,but generally using the default parameters can achieve good results.The ensemble learning model can handle missing values efficiently,so this paper applies it to sufficient dimension reduction with missing values and analyzes the effect of sufficient dimension reduction on missing values in response variables.After using sample information with missing values,the dimension reduction effect performs significantly better than just discarding these samples with missing values.
Keywords/Search Tags:regression tree, ensemble learning, sufficient dimension reduction, multivariate responses
PDF Full Text Request
Related items