Font Size: a A A

Variable Selection Methods Based On Variable Importance Measurement From Random Forest And Its Application In Diagnosis Of Tumor Typing

Posted on:2022-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:J S MaFull Text:PDF
GTID:2504306518975309Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Objective:To explore the performance of the six methods(RFE,biosigner,Boruta,altmann,vita,r2vim)based on variable importance measurement from random forest in high-dimension variable selection analysis.Then select the appropriate methods to instruct the model for diagnosis of diffuse large B cell lymphoma(DLBCL).Methods:We evaluate the sensitivity,specificity,Youden index,positive predictive value,negative predictive value,total variables,prediction accuracy,stability,calculate time of the six methods(RFE,biosigner,Boruta,altmann,vita,r2vim)based on variable importance measurement from random forest through simulation study.We searched the Gene Expression Omnibus(GEO)database to find the related gene information on diffuse large B cell lymphoma type.Then we select the method which showing the higher sensitivity to make the preliminary selection in the case study,and we select the method which showing the higher positive predicition value to model the prediction model of DLBCL.Results:In simulation study,the method vita showed the higher sensitivity in multiple simulation scenarios and the method biosigner showed the higher positive prediction value.We obtained nine data sets on DLBCL in GEO databases which containing 1362 samples we can use.After the preliminary selection based on the vita method,we obtained 1019 genes that expressed differently in ABC/GCB DLBCL.After the final selection based on the biosigner method,we obtained 77 genes that expressed differently in different DLBCL type.Conclusion:The methods vita and biosigner can be used in the preliminary selection and final selection because of their higher sensitivity and positive prediction valued respectively.DLBCL case study showed that the diagnosis model based on vita and biosigner can effectively achieve DLBCL classification diagnosis.
Keywords/Search Tags:random forest, variable selection, DLBCL
PDF Full Text Request
Related items