Font Size: a A A

Research Of Gene Expression Prediction Model

Posted on:2015-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:X H MengFull Text:PDF
GTID:2180330464968752Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As completed the human gene sequencing, regulation of gene expression become a focus in bioinformatics. Central dogma shows that the transcription of the gene dominate regulation of gene expression.Transcription factors, Histone Modifications, DNase I hypersensitivity are closely linked to genetic transcription. Improvements in molecular biology experiments can more accurately measure the extent of gene expression,It makes possible to simulate the process of Transcription Factor,Histone Modifications,DNase I regulate gene expression. Meanwhile, the use of regression models to predict gene expression has become a hot issue in bioinformatics.The main work is we establish three kinds of regression models to predict gene expression, namely multiple linear regression model, support vector regression model, regression tree model. Respectively,we regard Transcrption Factors, Histone Modifications and DNase I data as regression model of explanatory variable to predict gene expression.We draw a conclusion that Transcription Factor and Histone Modifications and DNase I can enhance the ability to predict the regression model, while the degree of gene expression can be obtained in silico,rather than in vivo.Firstly, we use multiple linear regression model to predict gene expression. Analysis of goodness of fit and predictive ability of the regression model. Experimental results show that the regression model reached the intended effect, but its handling of various factors on the model is too simple.Secondly,we use support vector regression model to predict gene expression,and compare the ability of the support vector regression with multivariate linear regression model for predicting gene expression, and the ratio of the goodness of fit. The experimental results indicate that: the support vector regression model not only can improve the fit, but also can enhance the ability to predict. To prove the results attributed to support vector regression kernel function, we were made kernel and non-kernel function to hold the assumption. We take Histone Modification data as a priori probability, and combine Transcription Factor Affinity Score to calculate the log posterior probability score, those values as a feature of regression model, theexperimental results show that adding Histone Modification data to the model can improve the predictive ability of gene expression. Therefore, we conclude that: Transcription Factors, Histone Modifications and DNase I can influence the regulation of gene expression,effective combination of this information can greatly improve the predictive ability of the model.Finally, we use regression tree model to predict gene expression.However, the regression tree model compared with other models is slightly lower in the goodness of fit and predictive ability. Thus, we propose a model based on linear regression tree transformation. This model use the joint of each explanation factor to construct linear regression, pick out the appropriate factor to explain the composition of the new explanatory variable collection, re-use regression tree to construct regression models. The improved model compare with the primitive regression tree model and multiple linear regression models in the fitting and forecasting capabilities have been improved, but still slightly lower than the support vector regression model. In short, we use those regression models can be harvested good results, the calculation models of Transcription Factor, Histone modification and DNase I play a prerequisite in the regression model.
Keywords/Search Tags:Transcription Factor, Histone Modifications, Support Vector Regression, Tree Regression, Gene Expression
PDF Full Text Request
Related items