Font Size: a A A

Research On Feature Recognition Of Sequence Data For Protein Interaction Prediction

Posted on:2020-11-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y M GuiFull Text:PDF
GTID:1360330578983023Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Protein is the cornerstone of all organisms,and most of them work synergisti-cally with other proteins,except for a few that work in monomeric form.Machine learning-based protein interaction prediction combined with protein sequence feature extraction method and machine learning algorithm,using large-scale data statistics to reveal protein function,understand protein interaction mechanism and discover new protein binding rules.It has very important guiding significance for protein research fields such as deciphering molecular mechanism,building protein interaction network,developing drugs and treatment optimization.Protein sequence feature extraction is one of the primary problems in protein interaction prediction.Its performance directly affects the performance of machine learning algorithms for processing protein sequence data.Therefore,how to improve the feature extraction method and how to optimize the machine learning algorithm is an urgent problem to be solved in the field of biological information research.At present,a series of advances have been made in the study of protein interaction prediction by means of protein sequence feature extraction and machine learning model training.However,the relationship split between the feature extraction of the protein sequence and the training of the machine learning model failed to effectively extract the full sequence information and long-range effects of the protein sequence,which made it difficult to improve the prediction performance of protein interaction.In order to effectively improve the prediction performance of protein interaction,and promote the application of protein interaction prediction technology in the field of protein interaction research,this dissertation studies the improvement of protein sequence feature extrac-tion methods,the introduction of machine learning model optimization techniques,and the prediction of end-to-end protein interactions.The main work of this dissertation is summarized as follows:1.Aiming at the problem that the current protein sequence feature extraction method does not consider the whole sequence relationship,a matrix sequence extrac-tion method based on Matrix of Sequence(MOS)is proposed.Based on the amino acid classification based on the dipole and side chain volume,the method abstracts the protein sequence into a vector with inconsistent dimensionality,and makes full use of the sequence relationship of each element in the protein sequence to encode the protein sequence into a dimension.Consistent vectors solved the problem of not being able to directly enter the protein sequence into the machine learning algorithm for classification and recognition.2.In order to improve the prediction performance of protein interaction,traditional machine learning models,such as K-Nearest Neighbor(KNN),Decision Tree(DT)and Random Forest(RF),and Deep Neural Network(DNN)are used to study protein inter-action prediction based on amino acid sequence and construct sixteen protein interaction prediction models combined with Conjoint Triad(CT),Auto Covariance(AC),Local Descriptor(LD),and Matrix of Sequence(MOS).The results show that the deep neu-ral network model that introduces network optimization techniques such as Dropout achieves the best evaluation index,and improves the performance of protein interaction prediction compared with the existing results.Among them,CT,AC,and LD achieved the best accuracy rates of 98.12%,98.17%,and 95.60%on the benchmark dataset,re-spectively,and MOS achieved 96.34%accuracy,99.28%recall rate,and 98.79%of the area under the receiver operating characteristic curve(AUC).Compared with the exist-ing feature extraction method,MOS can reduce the loss rate and greatly save training time.3.Aiming at the segmentation problem of protein sequence feature extraction and machine learning model training in protein interaction prediction process,an end-to-end protein interaction prediction model based on Long Short-Term Memory(LSTM)was proposed.The model takes the protein sequence feature extraction method as a part of the machine learning model,and integrates the feature extraction and model train-ing.Through the training,the superior protein sequence feature extraction method is obtained to improve the protein interaction prediction performance.The results showed that the end-to-end protein interaction prediction model achieved an optimal accuracy of 97.46%,which improved the prediction performance of protein interaction.
Keywords/Search Tags:Protein Interaction Prediction, Sequence Feature Extraction, Deep Neural Network, Conjoint Triad, Auto Covariance, Local Descriptor, Matrix of Sequence
PDF Full Text Request
Related items