Font Size: a A A

Comparative Study And Application Of Several Variable Selection Methods

Posted on:2024-07-17Degree:MasterType:Thesis
Country:ChinaCandidate:J L XuFull Text:PDF
GTID:2557307061477184Subject:Master of Statistics in Applied Statistics
Abstract/Summary:PDF Full Text Request
In today’s rapid development of science and technology,data constitutes life,and each part contains a huge amount of information,but these data are not all useful,the ensuing problem is how to organize and analyze these data,variable selection method is one of the key steps,which is related to the model’s interpretation and prediction ability and antiinterference ability.Firstly,the basic theoretical knowledge of various penalty functions and variable selection involved in this paper is introduced and summarized.Subsequently,the performance comparison of stability,accuracy,and computational complexity of several variable selections under different types of data was carried out.First,under different degrees of correlation:it was found that PCR performed better when the correlation level was below medium in the regular data,When the correlation coefficient is greater than 0.8,the spcr and spcLasso methods show their advantages in processing high correlation data;second,under different data dimensions:under conventional data,the stability of methods containing sparse principal components is poor,but with the increase of dimensionality,the stability and quasi-determination of several methods are reduced,but the stability and accuracy of spcLasso methods are still maintained at a good level;third,the running time in different data environments:the sparse principal classification method takes a long time due to its high iterative complexity.These results indicate that a more applicable variable selection method is to be selected for different types of data sets.The above analysis was conducted to uncover the characteristics of the respective methods,not to show that one method has an absolute advantage,but to illustrate that different models are applicable to different types of data.No studies in this area have been found,and to provide a reference for making variable selections for practical applications.Finally,taking "China’s railway passenger traffic research" as a case,the relevant index data from 2010 to 2022 are selected,and the several methods mentioned in the article are applied to the above,and the case analysis results are of practical significance,indicating that the variable selection model obtained above can provide new ideas for dealing with real problems.
Keywords/Search Tags:High-dimensional data, Variable selection, Prediction accuracy, Stability
PDF Full Text Request
Related items