Font Size: a A A

Classification Of Intrinsically Disordered Proteins Based On Residual Neural Network And Residues Neighborhood Features

Posted on:2020-08-29Degree:MasterType:Thesis
Country:ChinaCandidate:R C LiFull Text:PDF
GTID:2370330599960592Subject:Engineering
Abstract/Summary:PDF Full Text Request
Intrinsically disordered proteins lack a stable spatial structure in the natural state,but they are widely present in living organisms and play an important role in the normal functioning of life activities.They are also associated with many major human diseases.The study of disordered regions in proteins helps to find the pathogenesis of related diseases and grasp the progress of the disease.Traditional detection methods of intrinsically disordered proteins mostly focus on the physical and chemical properties of amino acids,and ignores the characteristic information between amino acid sequences,so the prediction accuracy is low.Therefore,this paper uses cross-correlation algorithm to analyze the characteristics of residues,and improves the use of sliding windows in traditional detection methods.In order to improve the classification accuracy of intrinsically disordered proteins,nested sliding windows are used to extract amino acid sequence features.In this paper,we present a method of local residue neighborhood feature extraction based on cross-correlation.Firstly,a hidden Markov model is constructed for amino acid sequences.Secondly,the Profile HMM score matrix of query sequence is obtained by multi-sequence alignment algorithm.Finally,nested sliding windows,cross-correlation algorithm and superimposed average method are used to calculate the neighborhood characteristics between local residues.The features extracted by this method of this paper contain more information between residues,which can express amino acid sequences more accurately.In this paper,we design a residual neural network classification model.The parameters of the neural network are debugged by many experiments,and the extracted features are classified and verified.We use the confusion matrix and ROC curve to evaluate the performance of our classification model which compared with the support vector machine and random forest classification algorithms.The results on CASP9 and CASP10 datasets show that the accuracy of our classification model is 93.8% and 93.2%,which is about 5% higher than the results of the support vector machine and random forest classification algorithms.However,the traditional intrinsically disordered protein detection methods DISOPRED3,PreDisorder and ESpritz have an accuracy of about 80% on the same dataset.The accuracy of this method is about 13% higher than the traditional methods.The results show that the feature extraction method presented in this paper complements the shortcomings of amino acid sequence features in traditional detection methods and improves the accuracy.
Keywords/Search Tags:intrinsically disordered protein, hidden Markov model, cross-correlation, residual neural network
PDF Full Text Request
Related items