Font Size: a A A

Protein Sequence Classification Based On Feature Enhancement And Attribute Dependence Fusion

Posted on:2020-08-31Degree:MasterType:Thesis
Country:ChinaCandidate:L YueFull Text:PDF
GTID:2370330602461441Subject:Computer technology
Abstract/Summary:PDF Full Text Request
G protein-coupled Receptors(GPCRs)is the largest Protein superfamily found in the human body.Its function is to mediate the response of cells to various environmental stimuli and participate in many physiological processes.Therefore,how to accurately classify GPCRs is a.hot issue.In this paper,two new methods for protein sequence classification are proposed.Compared with previous classification methods,the semi-naive bayesian classification algorithm presented in this paper based on the Multiple Sequence Alignment(MSA)feature extraction method has significantly improved the classification accuracy.This algorithm mainly combines MSA and semi-naive bayes classifier,in which MSA plays a role of feature enhancement and is used to extract more valuable sequence features.Considering that features are not mutually independent,semi-naive bayes algorithm is used to model the interdependent features of features.Due to the MSA is a time consuming process,in order to improve the efficiency of feature extraction for classification sequence,this paper also introduces another feature extraction based on MSA and amino acid substitution matrix method,the method also in MSA result substring to extract the features,are different from previous methods in extracting for sorting sequence characteristics when no longer needs to be added to the MSA for each category operators.This method takes into account the possible substitution of amino acid sites in the process of evolution and adds amino acid substitution matrix into the process of screening characteristic substrings to model the process.Finally will be based on the MSA and substitution matrix method to extract the characteristics and the combination of multiple classifiers,and carries on the experiment on GPCRs data set,the results show that this method not only has made great improve on the efficiency,and improve the classification accuracy,in the four classification of GPCRs level reached 99.685%,99.215%,98.822%and 97.291%respectively of the classification accuracy.To sum up,this paper implements two efficient GPCR classification methods.
Keywords/Search Tags:g protein-coupled receptor, MSA, semi-naive bayes classifier, classification, substitution matrix
PDF Full Text Request
Related items