Font Size: a A A

Research And Application Of Dimensionality Reduction Algorithm Based On SIR

Posted on:2019-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:Z C XieFull Text:PDF
GTID:2417330566999459Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,a lot of high-dimensional data have been generated in many fields.How to characterize the internal structure of high-dimensional data and extract useful information from them is very important.The sufficient dimension reduction method is one of the effective ways to solve this problem.At present,the quality of the sufficient dimension reduction method(dimensionality reduction effect)is usually expressed by the squared multiple correlation coefficient.For the sufficient dimension reduction,based on the application,this paper puts forward an improved dimension reduction method for the data sets of three different types of data structures.The main work is as follows.First,taking the dataset containing outliers as the research object,in this paper,we combine the K-medoids clustering algorithm with sliced inverse regression method,and propose the K-medoids inverse regression algorithm.Through the simulation experiment of simulation data generated by linear and nonlinear models of the discovery,the algorithm compared with sliced inverse regression,sliced average variance estimates and principal Hessian directions of traditional sufficient dimension reduction method,not only has high dimension accuracy,but also maintain the good effect of reducing dimension.In this paper,the above method is applied to the actual data,and the data analysis further illustrates the effectiveness of the new method.Secondly,the traditional sufficient dimension reduction method represented by slicing inverse regression is mostly unavailable with the data set of the response variable as two variables as the research object.In this paper,three methods of sliced inverse regression,sliced average variance estimates and principal Hessian directions are studied,and the specific form and corresponding test method of the kernel matrix of the three methods are given.Through the research,it is found that the new improved method can be applied to the dataset with two variables and has a very good reduction effect for the data set of the structure.Finally,taking the dataset which the predictive variables contain categorical attributes as the research object,combined with the idea of virtual variable method,a new sufficient dimension reduction method is proposed based on the object similarity measurement clustering algorithm instead of the original slice method.By comparing with the partial slicing inverse regression algorithm,it is found that the new method solves the regression problem in the actual problem when the prediction variable contains more classified variables or the classification variables contain more classified attributes.This method has made full use of classification attributes,and improved regression method with prediction variables.The application of mixed data sets containing categorical attributes shows that this method has good dimension reduction effect.
Keywords/Search Tags:sufficient dimension reduction, squared multiple correlation coefficient, sliced inverse regression, sliced average variance estimates, principal hessian directions
PDF Full Text Request
Related items