Font Size: a A A

Bayesian Variable Selection Method Based On The Non-local Prior And Its Application In The Ultra-high Dimensional Data Analysis

Posted on:2018-12-18Degree:MasterType:Thesis
Country:ChinaCandidate:X Q DongFull Text:PDF
GTID:2334330536474425Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Objective:We compare the performance of bayesian variable selection method based on the non-local prior,ISIS-SCAD,ISIS-MCP by simulation study in the ultra-high dimensional data analysis,and apply them to the DLBCL gene expression data to find genes associated with classification of DLBCL and provide basis for clinical diagnosis and treatment of DLBCL in the paper.Methods:We introduced the the basic principle of bayesian variable selection method based on the non-local prior(piMOM),and applied it to the binary logistic regression with ISIS-SCAD,ISIS-MCP.During the simulation analysis,we seted three correlation degree based on the covariance structure: independent,compound symmetry correlation,autoregressive correlation;sample size n= 50,100,200,400,600;variable dimension p = 1000,3000,and evaluate the performance of three kinds of variable selection method in different ultra-high dimensional situations from the model consistency and model prediction accuracy aspects.In real data analysis,we splited the DLBCL data(350 patients and 3237 genes)into training(n=245)and test sets(n=105),and applied piMOM,ISIS-SCAD,ISIS-MCP to fit model,validate and evaluate them with AUC respectively.Results:The simulation result showed that average number of TP in three kinds of method is approximately equal,and average number of FP,PMSE,RMSE of ISIS-SCAD and ISIS-MCP is obviously higher than non-local prior method,and non-local prior method is more stable than ISIS-SCAD,ISIS-MCP with the increase of the dimension in the case of p = 1000 and p = 3000.For DLBCL gene expression data,we found out 4 significant genes(MYBL1,CYB5R2,MAML3,BTLA)and AUC is 0.989 by means of piMOM;7 significant genes(MYBL1,CYB5R2,MAML3,TNFRSF13 B,S1PR2,SLC25A27,GAB1)and AUC is 0.981 by means of ISIS-SCAD;5 significant genes(MYBL1,CYB5R2,MAML3,CHST2,SUB1)and AUC is 0.962 by means of ISIS-MCP.MYBL1,CYB5R2 and MAML3 are measured by the three methods.Conclusion:Bayesian variable selection method based on non-local prior can control the false positive rate better,to some extent,is superior to the traditional penalty-class method in the model selection and prediction accuracy.MYBL1,CYB5R2,BTLA may be associated with classification of DLBCL.
Keywords/Search Tags:non-local prior, bayesian variable selection method, product exponential moment prior, ultra-high dimensional data, diffuse large B cell lymphoma
PDF Full Text Request
Related items