Font Size: a A A

Application Research Of Model Averaging In High-dimensional Biometric Data

Posted on:2019-09-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y T LinFull Text:PDF
GTID:2480305453499864Subject:Statistics
Abstract/Summary:PDF Full Text Request
With development of biotechnology in big data age,a large number of high dimensional biological data have emerged.For example,gene chip technology has greatly improved the efficiency of gene sequencing and reduced the cost of sequencing.The dimensions of these biological data range from tens of dimensions to thousands of dimensions.The amount of data is huge and complex.Then,the redundancy and irrelevance of data has been increased.In order to reduce the noise in the high dimensional data and improve the research efficiency,the variable selection method has been paid attention and developed.The model averaging method is not dependent on an optimal model,but the better model for combination forecasting has been decided by giving better weights for each model.Therefore,the useful information of the single model has been comprehended and reduced influence of the uncertain factors on the single model in the model averaging method.In this study,the model averaging method has been used to model and analyze the high dimensional biological data.The research content is divided into three parts:1.The study on disease diagnosis based on the model averaging of Logistic regression.Firstly,using three penalty function method based on Logistic regression model(SCAD-L,gMCP-L and GB-L)and theirs four corresponding combination models(gMCP+SCAD-L,gMCP+GB-L and gMCP+cMCP-L and cMCP+SCAD-L)were analyzed and compared all kinds of data modeling for six types data by Monte Carlo simulation method,respectively.As an example for the Arrhythmia dataset in UCI,the research showed that the model combined gMCP-L with GB-L has higher classification prediction accuracy than three single models and the other combined models.The model averaging model can integrate information in a single model,and improve the accuracy of disease diagnosis for doctors.2.The applied research of model averaging method for survival data.Based on the data set of breast cancer,the event time and status corresponding to different censorship ratios are simulated.Firstly,the variables are selected by the random forest method.Secondly,the Bayesian model averaging method is used to analyze for the selected variables.Compare with the COX regression model,the analysis results of the Bayesian model averaging method has better performance and higher accuracy.3.The applied research of model averaging method for high dimensional gene data.Using model averaging method,the high dimensional data are analyzed based on the explanatory variables p is larger than the sample size n.The method can proceed as follows:firstly,the explanatory variables are grouped according to order of explanatory variables using the P-value of the significance test.Secondly,the regression model of each group is established;finally,the weight of each model is calculated by Jackknife and Mallows criteria so on,respectively.The result has been made to average several regression models.The results show that the higher accuracy of model can be obtained by the improvement average method of the model.In summary,the model average method has a better performance in disease diagnosis,high dimensional survival data and high dimensional gene data.
Keywords/Search Tags:High dimensional biological data, model averaging, Logistic regression model, Cox risk proportional regression model
PDF Full Text Request
Related items