Research On Sufficient Dimension Reduction Method Based On Regression Tree

Posted on:2021-01-10

Degree:Master

Type:Thesis

Country:China

Candidate:B W Wu

Full Text:PDF

GTID:2370330620468097

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

With the advent of the era of big data,the data is becoming more and more complex.The theory of sufficient dimension reduction is of great significance for studying such complex data.In the case of multivariate responses,there exists many problems.Therefore,this paper mainly focuses on the method of sufficient dimension reduction based on regression tree,which solves the problem of dimensional disaster in the case of multivariate responses.In the scenario of univariate response,traditional method usually use the slice method to divide the response variable.However,with the increase of the dimension,this method can easily lead to the lack of sample points in many slices.The regression tree method can divide the multi-dimensional space and the value of the leaf nodes is just the mean value after the space division.Based on this idea,this paper presents a new sufficient dimension reduction method based on regression tree,which can be GBDT,RF,Xgboost,etc.For SIR,SAVE and DR,this paper gives the method for estimating the kernel matrix.Finally,a lot of simulations and one example is used to verify the effectiveness of the method in multivariate responses.Compared with the existing methods,the method in this paper performs better in the case of high-dimensional variables.Regardless of whether it is a linear or non-linear model,the method can better estimate the dimension reduction direction in the presence of a certain degree of noise.When the sample size is small,RF works better.When the sample size is relatively large,the performance of GBDT,RF,and Xgboost is equivalent.Because ensemble learning models often have many hyperparameters,there is still no theoretical basis for the setting of hyperparameters,but generally using the default parameters can achieve good results.The ensemble learning model can handle missing values efficiently,so this paper applies it to sufficient dimension reduction with missing values and analyzes the effect of sufficient dimension reduction on missing values in response variables.After using sample information with missing values,the dimension reduction effect performs significantly better than just discarding these samples with missing values.

Keywords/Search Tags:

regression tree, ensemble learning, sufficient dimension reduction, multivariate responses

PDF Full Text Request

Related items

1	Optimal sufficient dimension reduction for the multivariate conditional mean in multivariate regression
2	Robust Dimension Reduction Based On MCD Method In Sufficient Dimension Reduction
3	Some Extensions For Multi-response Sufficient Dimension Reduction Methods
4	Research On Dimension Reduction Methods Of Several Important Data Types
5	Extending The Scope Of Sufficient Dimension Reduction Theory And Its Related Methods
6	Supervised Dimension Reduction Of Multiple Compositional Data In Simplex Space
7	Multivariate dimension reduction and graphics
8	High-dimensional Missing Data Dimensionality Reduction Based On Slice Inverse Regression
9	Sparse SIR:Optimal Rates And Adaptive Estimation
10	Dimension reduction and variable selection in regression