Font Size: a A A

Random Lasso Method In Logistic Regression

Posted on:2019-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:J WuFull Text:PDF
GTID:2370330545479163Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
The Lasso has opened a new way for the study of variable selection and has been used by many experts and scholars to solve the problem of variable selection.However,there are two limitations in the practical application of Lasso.It is necessary to improve the Lasso to reduce its limitation in application.The main work of this paper is as follows:(1)For a single dataset,a Random Lasso method based on Logistic regression is studied,which is used to deal with the selection of variables in the classification problem.The method consists of two main steps.In the first step,the Lasso method based on Logistic regression is applied to many bootstrap samples,each using a set of randomly selected variables.The purpose of this step is to produce a measure of importance for each variable.In the second step,a similar procedure to the first step is implemented with the exception that for each bootstrap sample,a subset of variables is randomly selected with unequal selection probabilities determined by the variables' importance measure.The final set of variables and their coefficients are determined by averaging bootstrap results obtained from the second step.Random Lasso method based on Logistic regression alleviates some limitations of the Lasso and the Elastic-Net method in the case of microarray data analysis.The method tends to select all highly correlated variables,especially in the case of different coefficient signs,the coefficient of estimation can maintain maximum flexibility,and the number of selected variables is no longer limited by the size of the sample.A large number of simulation studies have shown that the Random Lasso method based on Logistic regression has better prediction performance than other methods.This method is applied to the analysis of gene expression data of acute leukemia and achieved good results.(2)For multiple datasets,a Random Lasso method based on Meta-analysis is studied.When the dimensions of the datasets are high,variable selection is incorporated into the Meta-analysis to improve the predictability and interpretability of the model.Heterogeneity of data is prevalent in Meta-analysis.The Random Lasso method based on Meta-analysis considers heterogeneity among datasets.By measuring the importance measure of average variables,it not only improves the ability to identify important variables among multiple datasets,but also maintains the flexibility of selection among datasets.Simulation studies show that Random Lasso method based on Meta-analysis can select almost all important variables and remove unimportant variables.This method is applied to the Meta-analysis of five cardiovascular and cerebrovascular diseases.The results show that the method has a good selection performance.
Keywords/Search Tags:variable selection, Logistic regression, Lasso-Logistic, Random Lasso, Meta-analysis
PDF Full Text Request
Related items