Font Size: a A A

Intelligent Prediction Of Protein Secondary Structure Based On Fuzzy Support Vector Machine

Posted on:2018-10-17Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2310330512479804Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the completion of the Human Genome Project,more and more protein sequences are detected.It is imperative to study fast and effective protein secondary structure prediction algorithms.Doing research on the structure of proteins is mainly to study the function of them and understanding the function of proteins has important significance for biopharmaceuticals,agricultural biology and other fields.In recent years,machine learning has become the mainstream method for protein secondary structure prediction.In this paper,an improved fuzzy support vector machine(FSVM)is proposed based on the traditional support vector machine(SVM).Besides,the similarity of the protein sequence is measured by the vector cosine method.Specific research work is as follows:1.Considering that different proteins with similar protein sequences often have similar structures,we proposed a comparison method for protein sequence similarity.The physicochemical properties of Hy(hydrophobicity scales),pKa(COOH)(the dissociation constant values(pKa),pKa values of-COOH)and pKa(NH3+)(the dissociation constant values(pKa),p Ka values of-NH3+)were used as the coordinate of amino acid residues.The amino acid sequences are mapped into a three-dimensional space.The vector cosine method is used to measure the similarity of two amino acid sequences.2.In order to reduce isolated points or noise points,we set the membership using the distance between the point and the center of the class.In this paper,sample points are mapped to high-dimensional space.Then in the high-dimensional space,combined with k-nearest neighbor algorithm,the support vector and noise points are distinguished by calculating the closeness between the sample points and their surrounding points.3.A new protein secondary prediction algorithm is adopted by combining protein sequence similarity and the fuzzy support vector machine.The test set of protein sequences are compared with those of the pdb_full database.If the similarity is greater than 0.9,the secondary structure of the protein sequences found in the pdb_full database are replaced the secondary structure of the test set,the amino acid sequences with low similarity are predicted by FSVM model.In improved FSVM algorithm,some training samples with small membership degree were excluded.Simultaneously,the weight of vector sample points is increased to eliminate noise interference.The new algorithm can improve the accuracy.The algorithm is combined with the similarity measure methods of protein sequence.The experiments show that the accuracy of prediction is higher than machine learning methods.
Keywords/Search Tags:Protein sequence, Membership, Support vector machine, Protein secondary prediction, Fuzzy Support Vector Machine
PDF Full Text Request
Related items