Font Size: a A A

Sequence-based Structure,Function Pediction For Membrane Protein

Posted on:2013-02-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:C Q WangFull Text:PDF
GTID:1110330371985694Subject:Chemical informatics
Abstract/Summary:PDF Full Text Request
Membrane proteins are crucial players in the cell and take center role in processes ranging from ions, small molecules transport to sophisticated signaling pathways. Many are also prime contemporary or future drug targets, and it has been estimated that about60%of approved drugs are directed against membrane proteins. Despite the biological importance of membrane proteins, it is still notoriously hard for sturctural and functional studies of membrane proteins, due to the problems associated with the purification and availability in stable forms suitable for X-ray crystallography and electron microscopy (EM) studies. Therefore, membrane proteins still represent very important yet one of challenging research objects in a number of disciplines.This dissertation focuses on the sturctural and functional studies of membrane proteins using vary mathematical and bioinformatics approaches to study the relationship between sequence, structure and function. The ultimate purpose is to build sequence-based model to predict the structure and function of membrane proteins. Most important, we hope the built models could resolve major issues (structure determination, subcellular localization and functional studies) on membrane protein only from sequence information.In Chapter1, we first review the development and discuss the consequences for our understanding of membrane protein structure, biogenesis, folding and function. Then, we discuss current structure and function prediction methods against a background of knowledge that has been gleaned from membrane protein. At last, the data resource, sequence representation and prediction mathematical methods for membrane proteins structure, function prediction in this dissertation were introduced.In Chapter2, we presented a novel and concise method for predicting burial status (the residue exposure to the lipid bilayer or buried within the protein core) of transmembrane residue of a-helix membrane proteins. By using sliding window technology, the sequence information contained in the immediate neighbors of the central residues was first extracted. Then, two strategies were used for feature generation to encode the window. The main features used include the conservation index, sequence based-structural and physicochemical features. The features that highly correlated with burial status were then selected using recursive feature elimination (RFE) method. At last, least squares support vector machines (LS-SVMs) was used to develop classification model due to its good performance and less time-consuming characteristic in the classfication model development. The model was developed from43membrane protein chains and its prediction ability was evaluated by an independent test set of other non-redundant ten membrane protein chains. The prediction accuracy of our method were satisfactory. On the other hand, the position and the composition of hydrophobic amino acid propertie were proved to be very important features influencing the burial status of a TM residue.Burial status prediction model can only qualitative identify exposed transmembrane residue but can not figure out how much surface area is exposed. Therefore we developed a sequence-based computational model for the prediction of solvent accessible surface area of a-helix and β-barrel transmembrane residues The main proces of our model is described in Chapter3. The model was developed from78a-helix membrane protein chains and24β-barrel membrene proteins. Firstly, the evolutionary conservation in a set of a-helix and β-barrel transmembrane proteins was extracted by using sliding window technology. Thereafter, the decrease in "residual sum of squares " was used to rank all variable and the conservation score that high correlated with accessible surface area of transmembrane residues were selected to building model. At last, the prediction models were developed using support vector machine and random forest methods. The results show that our model performs well for both types of transmembrane residues and outperforms other prediction model which was developed for the specific type of transmembrane residues. The prediction results also proved that the random forest model incorporating conservation score is an effective sequence-based computation approach for predicting the solvent accessible surface area of transmembrane residues.Knowledge of the subcellular localization of membrane proteins is very important and fundamental to understand the function of membrane proteins in many cases, such as in cellular function, biological process, signal transduction, metabolic pathway and drug design, In Chapter4, we aimed to develop a model that can be used to predict the subcellular localization of membrane proteins covering all localization sites in eukaryotic. The main process of our model is described as follows:firstly, the dataset were downloaded from the UniPort database. Then the dataset was divided into a development set and an independent test set. In order to represent the information about MPs comprehensively, the sequence-derived structural, physicochemical features and the evolution information extracted by the concept of Chou's pseudo amino acid composition were utilized. We utilized K-nearest neighbor (KNN) algorithm combined with Chou's score function in the development of the computational model. The performance of the prediction models was evaluated by cross-validation and its prediction on the test set. The results prove that our computational method performs well for predicting multiple subcellular localization sites of membrane proteins in eukaryotes.In Chapter5, the first sequence-based model for predicting function of membrane proteins were presented. It can be used to identify eukaryotic membrane proteins among26functions. In addition, the predictor is powerful and flexible, particularly in dealing with proteins with multiple functions. Both the sequence-based structural, physicochemical information and evolution information have been fused into the predictor. The satisfactory prediction results from cross validation and independent test set proved that our computational method is reliable to predict multiple function of membrane proteins in eukaryotes.
Keywords/Search Tags:Membrane protein, α-Helical membrane protein, β-Barrelmembrane protein, Burial status, Solvent accessible surface area, Subcellularlocalization, Function prediction
PDF Full Text Request
Related items