Font Size: a A A

Applied Research Of Penalty Function On Genome Association Analysis

Posted on:2019-06-01Degree:MasterType:Thesis
Country:ChinaCandidate:L T LiFull Text:PDF
GTID:2370330545489976Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and computers,massive amounts of data have been generated in various fields.For example,there are many kinds of complex data in many fields such as biology,computer science,and finance.Therefore,it is necessary to extract valuable data from a large amount of data that the information is analysed.In recent years,the variable selection methods have become a hot topic of statistics,especially the regularization method of penalty terms.Its feature is that variable selection can be performed by penalty function,and it can handle high-dimensional and co-linear data.This paper mainly studies the improvement of variable selection based on penalty function and its application in genome association analysis,it considers the network structure relationship among variables based on the original penalty function model,and uses the penalty term regularization for multivariable regression model.The research can be divided into two parts.The specific content is as follows:1.The penalty function model based on network structure among variablesIn this paper,on the model of single variable selection,the network structure relationship among variables is considered.Through the simulation study of four different types of data,and using the prostate cancer gene expression profile data to verify the example,the merits and demerits of the variable selection model based on the penalty function of the network structure and the variable selection model based on the penalty function are compared respectively.The results show that the penalty function model based on network structure has more advantages and stability than the original penalty function model.In particular,the network structure model based on the MCP penalty function has a higher predictive ability for the detection of patients with prostate cancer.2.The application study on the multivariate linear regression model of multivariate variables based on penalty function(1)In this paper,the multivariate linear regression model of multivariate variables based on covariance estimation is described.Using computer simulations,the multivariate linear regression model of multivariate variables based on covariance estimation is used to predict and select variables in six cases.In the multi-trait QTL mapping of DH population in Rice,the multivariate linear regression model of multivariate variables based on covariance estimation is used.Compared with the sparse partial least squares method,the results show that the multivariate linear regression model of multivariate variables based on covariance estimation has a better effect on variable selection.(2)Since the multivariate regression model based on covariance estimation can only be performed when the explanatory variable is smaller than the sample size,this paper proposes a multi-variable regression model based on high-dimensional data(the explanatory variable is much larger than the sample size).The effects of the prediction accuracy and variable selection are analysed by computer simulations.The multivariate linear regression model of multivariate variables based on high-dimensional data and sparse partial least squares are applied to union multi-trait gene association analysis in indica hybrid rice.The study showed that the method of the multivariate linear regression model of multivariate variables make more adaptive and precise than the method of sparse partial least squares.
Keywords/Search Tags:Network penalty term, Multiple dependent variable, High-dimensional data, Genome association analysis
PDF Full Text Request
Related items