Font Size: a A A

Study On Feature Extractions And Similarity Of Protein Sequences

Posted on:2022-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:L ChenFull Text:PDF
GTID:2480306527984729Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
With the rapidly increasing data of proteins,the prediction of protein structures and functions using the bioinformatics technology becomes an important task.In this thesis,the statistical technology and machine learning methods are used to study the texture characteristics and similarities of proteins.The intelligent algorithm is applied to predict the protein structure types,and the artificial optimization design of the signal peptide for hyposecretory protein is carried out.In the second chapter,a new protein feature extraction method is proposed.The protein sequences are transferred to Markov transfer frequency matrixes.Then their four eigenvalues(contrast,homogeneity,correlation and energy)are obtained.The new protein feature vectors are combined with the pseudo amino acid composition(Pse AAC)proposed by Chou.Finally,the method is used to classify the proteins of Eukaryotes,Gram+bacteria and Gram-bacteria.Results show that the new feature vectors of proteins proposed can represent the features of proteins.In the third chapter,in consideration of the importance of the recognition of the protein,a new method of recognizing protein secondary structures based on the dual-tree complex wavelet transform is proposed.Firstly,the protein sequences are transformed to distance matrixes according to C_? three-dimentional coordinate.Then in consideration of the implicit texture information,the four-level decomposition of the matrixes are applied using the dual-tree complex wavelet transform to extract the different direction subband energies and standard deviations.The protein secondary features are represented by 48 dimensional feature vectors formed by the subband energies and standard deviations.Finally,the proposed method is successfully applied to the classification of four different kinds of proteins.In the fourth chapter,in consideration of the low secretion and difficulty of mass production of many natural proteins,three methods of optimization and design for original signal peptide sequences are used,which are replacing,inserting and deleting certain amino acids.Then,the non-interlaced dynamic time warping(NI-DTW)is used to calculate the similarities of the artificial signal peptides,according to the pseudo skeleton distance matrixes of three dimensional coordinates the artificial signal peptide based on the different physical and chemical properties of amino acids.Finally,the artificial optimization design of signal peptides is achieved.
Keywords/Search Tags:Protein structure, wavelet transform, feature extraction, support vector machine, signal peptide
PDF Full Text Request
Related items