Font Size: a A A

Protein Remote Homology Detection Based On Ranking Methods

Posted on:2019-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:S Y JiangFull Text:PDF
GTID:2370330590473927Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of sequencing technology,protein sequences are growing rapidly but their functional structure are unknown.In order to detect the functional structure of unknown proteins,although several computational predictors have been proposed,their detection performance is still limited.As one of the most important fundamental problems in protein sequence analysis,protein remote homology detection is critical for both theoretical re search(protein structure and function studies)and real world applications(drug design).In this study,we treat protein remote homology detection as a retrieval task,aiming at finding protein sequences with known functional structures that are highly correlated with the query protein and inferring the functional structure of it.In view of the complementarity of alignment methods,the aggregation strategy can effectively improve the performance of the ranking.In this regard,this paper introduced a matching aggregation model,and designed three cost metrics to measure the cost of a certain sequence at a certain position.Three matching aggregation models were constructed by combining these three cost metrics and the Kuhn-Munkres algorithm.Tested on the SCOP benchmark dataset,the experimental results showed that the matching aggregation strategy can effectively improve the performance of ranking,and their performance were superior to most existing methods.To make up the shortcomings of the matching aggregation model and solve the problem that the existing learning to rank models do not consider the features of protein sequences,this paper proposed methods of combining learning to rank with protein features.Two discriminative profile-based features(Top-n-gram,ACC)were embedded into the ranking learning model in four ways,and four predictors ProtDec-LTR(ED),ProtDec-LTR(CS),ProtDec-LTR(PC)and ProtDec-LTR3.0 were constructed.These four models effectively improved the performance of ranking due to the introduction of evolutionary information.ProtDec-LTR 3.0 further enhanced predictive performance by considering protein correlations in a more granular manner,and its predictive performance was superior to the best existing method.Both matching methods and learning to rank models are aggregation strategies relying on multiple alignment methods,which is costly.In this regard,a network propagation model was employed to improve the performance of a single model.In order to solve the problem that inaccuracy similarity network of the existing network propagation model was constructed through alignment,the real homologous relationship was utilized to construct a more accurate similarity network.Twelve predictors were constructed by combining six kinds of sequence alignment methods and two network propagation models(PageRank and HITS),and the performance of these 12 predictors were higher than the basic methods.To further improve performance,PageRank and HITS were combined to work on the best basic method HHblits,and the HITS-PR-HHblits predictor was constructed.Test on SCOP and SCOPe benchmark dataset,the experimental results showed that HITS-PR-HHblits outperformed other state-of-the-art methods on the SCOP and SCOPe benchmark datasets.Finally,the related methods proposed in this paper were verified on independent test sets.Experiments showed that each method achieved stable performance on independent test sets.
Keywords/Search Tags:Protein remote homology detection, Matching aggregation, Learning to rank, Network propagation
PDF Full Text Request
Related items