Research Of Transcription Factor Binding Affinities Prediction

Posted on:2015-04-19

Degree:Master

Type:Thesis

Country:China

Candidate:S G Wang

Full Text:PDF

GTID:2180330464468612

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Transcriptional regulation is an important part of the process of gene expression regulation. The identification of transcription factor binding sites is one of the central topics for decoding transcriptional regulation, and an important step to understand the mechanism of transcriptional regulation. However, we are facing a severe test for identifying transcription factor binding affinities from a huge number of base pairs accurately. To solve this problem, the main work of this paper is as follows:1. We propose a model based on Logistic regression to predict transcription factor binding affinities, combining with high-throughput Ch IP-Seq technology. The model selects all 5-mer sub-sequences, which contains A, T, C, G, to compare with DNA sequences, and calculates the number of sub-sequences which appear in the DNA sequence, so as to construct an affinity matrix. The affinity matrix can accurately capture the transcription factor binding affinities. Then, we use this affinity matrix to construct the Logistic regression model and use the stability selection algorithm to optimize the model selection. Compared with the models based on PWM and PSSM, this model can improve the transcription factor binding affinities prediction accuracy.2. We improve the existing multiple linear regression model, to develop a new multiple linear regression model based on PBM. The model chooses all sub-sequences, the length of which is 3-8 bases, to compare with DNA probe sequences, and counts whether the sub-sequences appear in the DNA probe sequences, so as to construct a new affinity matrix. Then, we use the affinity matrix to establish the new multiple linear regression model. Meanwhile, we use SLEP to optimize the new model selection to improve the model prediction accuracy. Compared with the existing multiple linear regression model, the performance of the new multiple linear regression model based on PBM is very competitive for the transcription factor binding affinities prediction.

Keywords/Search Tags:

Binding affinities, Ch IP-Seq, PBM, Multiple linear regression model, Logistic regression model

PDF Full Text Request

Related items

1	Application Of Bayesian Method Based On Linear Regression Model
2	Classification Variables Of Logistic Regression Model And Its Application Research
3	Effect Of Random Forest-Lasso Logistic Regression Model On Screening Health Risk Factors Of Fatty Liver
4	Improved Ridge Regression Estimators For The Logistic Regression Model
5	Regularized Logistic Regression And Its Application
6	Multiple Linear Regression Model With Linear Constraints, Statistical Diagnostics
7	Evaluation Of The Risk Of Loess Earthquake Landslides Based On Statistical Model
8	Readability Analysis Of HSK Intermediate And Advanced Reading Texts Based On Multiple Logistic Regression Model
9	Research On Regression Analysis Algorithm Based On Differential Privacy
10	Improvement And Application Of GM(1,1) Forecasting Model In Urban Water Consumption