Font Size: a A A

The Structure Prediction Of Membrane Protein Based On Machine Learning And Statistics Calculation Method

Posted on:2018-09-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:X YinFull Text:PDF
GTID:1360330590470366Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Proteins play crucial roles in biological activities and perform critical functions in living organism.They distribute in the structure of cells and express the genetic inheritance through carrying out divese functions as composed protein macromolecules.Membrane proteins are one of the main types of proteins existing in the biological membrane.Due to the difficulties of wet-lab experiments as the structural complexity,the obtained structure of membrane proteins are less than globular proteins.Membrane proteins only account for 30% of the total numbers in protein database.But they have significant influence in the membrane of cell such as membrane anchoring,communication mechanism,pore formation,transport across membrane and immune functions.Besides,membrane proteins are utilized as the candidate targets by ~60% of drugs.Thus,solving the structure of membrane protein is highly desired.In general,similar structures of proteins have the similar biological functions,so the known structures of proteins help to elucidate the mechanism of their functions.In the current post-genome era,the scale of the protein database has reached a million level with rapid increasing.The gap between the proportion of the solved structure by experimental ways and the number of the recorded sequences is special large.The desire of solving the structure of proteins performs very urgent.The traditional methods for solving the structures are determined by X-ray,NMR and Cryo-EM.Because of the structural complexity of membrane proteins,the spanning segments of membaren proteins embedded in the lipid layer have hydrophobic characteristics,so in current situations membrane proteins are hard to be crystallized through experimental ways.And these approaches are high cost and time-consuming.Therefore,the huge space for developing computational methods is provided to develop the prediction of protein structure.In recent years,as the development of pattern recognition theories,the methods of machine learning and artificial intelligence have been widely used in the area of bioinformatics.The prediction of protein structures have obtained many progresses by applying machine learning methods.The solved structures of proteins are selected as samples from protein database to construct the dataset for training machine learning models.The annotated structure information has been used to label the training data as positive or negative samples.Meanwhile,through analysis the special of the structure and functions with membrane proteins,the characteristic features are extracted by diverse approaches.By divided train dataset and test dataset,the machine learning model can be built with optimized parameters.The prediction of protein structure is implemented with cross-validation for evaluating the performance of the model.On the other hand,the methods based on statistical calculation residue evolutionary information are also proposed in recent researches.These approaches require large scale homology sequences and high quality MSAs(Multiple sequence alignments).Through analysis the co-evolved correlation,the interactions between residues are calculated as the constraints for modeling the space conformation of proteins.Although many progresses have been achieved by existing methods,there are still many aspects need to improve such as the accuracy of prediction,the structural characteristics of protein,the robustness and the general applicabilities of the presented methods.The current algorithms treated the residue dependently without considering the correlation among the adjacent residues in local sequence.For enhancing the structure prediction of transmembrane ?-barrel proteins,we proposed new potocols and designed innovative pipelines for predicting topology structures,the protein contact map and the interaction of ?-strands.Our reseach is composed of the following contents:(1)We construct a benchmark dataset by selecting high resolution solved structures of proteins from PDB(Protein Data Bank).The sequence identity is cut off by PICES software with under 40% similarity for reducing homology redundancy to cover more protein superfamily.The enlarged scale of protein dataset with high quality structural annotation was built for improving the prediction of the proposed model.(2)In the selection of features,we use multiple sequence alignment to obtain the evaluation conserve information by PSI-BLAST tool with against searching protein database.And the sliding window approach is used to fuse multiple extracted features with optimized size.We also apply sparse coding algotithm to extract features for reducing the dimensional redundance and denoise.(3)We propose chain learning approach for improving the prediction of topology structure.With considering the correlation relative,we add the local status properties as input features combining with global characteristcs.Through two stages consisting of pepline,the prediction results from first step are optimized by the prediction model.The chain learning algotithm can solve the problems including mutation of the predicted status of residues,and smooth the probability profile to enhance the accuracy of prediction.(4)In the post-process of the predicted topology structure,we design dynamic threshold to determin the probability profiles.Based on the optimized initial threshold values,we process the partical segments such as tight turns or short loops with dynamic threshold method for reducing the misjudgements.According to analysis the length distribution of strands and loops,we adopt a special way to identify the irregular structures to improve the performance of our protocol.(5)We present the prediction method of residue-residue contact and strand-strand interaction.We combine the co-evolved correlation analysis with machine learning method to complement with each other for advancing the performance in different contact patterns.The deep learning algorithm is used to extract features to reduce dimensional redundency and eliminate noise.This approach is applied to improve the efficiency in training model and predicting contact map of ?-barrel proteins.
Keywords/Search Tags:Pattern recognition, Protein structure prediction, Deep learning, Contact map, Dynamic threshold
PDF Full Text Request
Related items