Font Size: a A A

The Analysis Of Transcription Factors' Binding Sites Based On Structural Data

Posted on:2006-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:L H TangFull Text:PDF
GTID:2144360212482259Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Though vitally important to cell function, the mechanism of protein–DNA binding has not yet been completely understood. There have been numerous attempts to develop ad hoc procedures for the analysis of DNA binding-site sequences. Here, according to the 3-D structure data of protein-DNA complexes, amino acid-base pair interactions are analyzed, and the prediction methods of transcription factor's binding sites are studied.All protein-DNA complexes in PDB database were used. Based on the 3-D structure data of protein-DNA complexes, NUCPLOT software was used to compute all the interactions between amino-acids'side chains and DNA sequences. With SWISSPROTS annotation of the proteins which are parts of protein-DNA complexes, the complexes were divided into different sets, including sets that are related to gene regulation process and one set that is not.The interactions between amino acids'side chains and DNA base edge in protein-DNA complexes -- hydrogen bonds and non-bonded interactions-- were analyzed. Detailed analysis of binding residues shows that some three- and five-residue segments frequently bind to DNA and that this binding sequence motif plays a major role in binding. So we made an assumption that a residue's binding state is determined by its sequence neighborhood and tried to use machine learning methods to predict transcription factor's binding sites.Using none redundant databases of protein–DNA complexes, BP neural network models were developed to utilize the information present in this relationship to predict DNA binding proteins and their binding residues. Sequence neighborhood was found to provide sufficient information to predict the probability of its binding to DNA with nearly 65.85% NP at 53.28% precision for the considered proteins.We also developed a novel method for predicting transcription factor's binding sites by extracting sequence features using support vector machine (SVM). This method could classify binding sites in nucleic acid of protein–DNA complexes with 66.71% sensitivity at 89.72% precision by using kernel of radial basis function.Machine learning methods can predict transcription factor's binding sites fairly well and increase in the number of structure data will make this method promising. However, the power of this method is that we can examine the structural effects on the specificity in a quantitative manner.
Keywords/Search Tags:gene regulation, protein-DNA complex, transcription factor, binding site, artificial neural network, support vector machine
PDF Full Text Request
Related items