Font Size: a A A

Protein Fold Recognition And Remote Homology Detection Based On Profiles

Posted on:2019-06-17Degree:MasterType:Thesis
Country:ChinaCandidate:M Y GuoFull Text:PDF
GTID:2370330566998948Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Protein fold recognition and remote homology detection are two basic problems in bioinformatics.The main idea of solv ing the problems is to infer the structure and function of protein according to the similarity of protein sequence information.The problem of fold recognition is more difficult than remote homology detection,because the similarity between protein sequences within the same fold structure is lower than the sequence similarity within same remote homology relation,so the fold recognition problem based on sequence information is more challenging.In recent years,scholars in this field have given many research methods,among which the methods based on sequence profile show excellent performance,because the sequence profile contains more protein evolutionary information,which is more representative than single sequence.At present,there are still many insufficient and promotion space in the method based on sequence profile,so this article continues to study the protein sequence profile,mainly in the generation process of sequence profile,using two methods to remove the noise in the original profile.Because the length of the sequence is different,the length of the generated sequence profile will also be different.In order to use the machine learning algorithm,the profiles should be transformed into fixed length feature vectors firstly.In this paper,two vectorization methods of sequence profile are used,which are matrix transformation method and profile alignment method.Then many prediction models are proposed based on the two methods combined with different sequence profiles for protein fold recognition and remote homology detectio n,which effectively improve the prediction performance.In this paper,we first propose two kinds of denoising profiles called Boundary Frequency Profile(BFP)and Rank Frequency Profile(RFP)to remove the noise information generated in the position-specific frequency matrix(PSFM),and combine three different matrix transformation methods to transformed PSFM,RFP and BFP into fixed length feature vectors respectively to construct nine prediction models for protein fold recognition and remote homology detection.The performance of different matrix transformation methods are compared and discuss the influence of noise in profile on the prediction performance.This paper continues to use another vectorization method called profile alignment method and a more explanatory alignment strategy is designed.Combine the Sequence-Order Frequency Matrix(SOFM)with more evolutionary information,the prediction model called SOFM-SW is constructed.The influence of the amount of information in the sequence profile on the comparison algorithm is discussed.For the shortage of profile alignment algorithm,this paper further studies the key part called score function.There are six different score functions are introduced.Combine with BFP,RFP and PSFM,18 kinds of prediction models are proposed for protein fold recognition and remote homology detection,and give the analysis of performance obtain by the six score functions and the influence of noise in sequence profile.The performance of the two kinds of vectorization methods combined with different sequence profiles is comprehensive analyzed.Finally,some suggestions are given for the selection of sequence profiles and vectorization methods for solving the two problems.
Keywords/Search Tags:Protein fold recognition and remote homology detection, Matrix transformation, Sequence-Order Frequency Matrix, Denoising profiles, Profile alignment, Score function
PDF Full Text Request
Related items