Font Size: a A A

Research On Convolutional Neural Network Model Based On Attention Mechanism By Using Sequence Information To Predict Protein-protein Interactions

Posted on:2021-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:X W MaFull Text:PDF
GTID:2480306548981849Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,the method of using protein primary sequence information to predict Protein-Protein Interactions(PPIs)has received extensive attention because it does not require prior knowledge and avoids the time-consuming and labor-intensive drawbacks of traditional biological experimental methods.How to perform effective feature extraction on protein sequences and how to build a good performance machine learning classifier model are the biggest problems in predicting PPIs based on protein sequences.Therefore,this thesis proposes convolutional neural network models based on the attention mechanism from two perspectives of optimizing the sequence encoding method and improving the classification model.The main work is summarized as follows:(1)For the four existing protein sequence encoding methods: Conjoint Triad,Auto Covariance,Local Descriptor,and Sequence Matrix,these methods encode protein sequences to a fixed length.And these four methods are combined with four traditional machine learning algorithms including KNN,SVM,DT and RF to construct 16 traditional machine learning classification models.Besides we also construct 4 deep neural network models based on the above four encoding methods.(2)This study proposes a convolutional neural network model based on the attention mechanism.The model simply encode two protein sequences of a PPIs pairs and separately inputs them to the embedding layer,the convolution layer,the attention layer,and the global average pooling layer.And then,the two output feature vectors are merged into one feature vector and input to the fully connected layer to predict PPIs.Three types of attention mechanisms are used in the attention layer.They are protein sequence pairwise multiple-head attention mechanism,protein sequence pair multiple-head self-attention mechanism and double-layer attention mechanism combining protein sequence pair multiple-head self-attention mechanism and protein sequence pairwise multiple-head.(3)When these models are trained on Pan's human PPIs dataset,the average prediction accuracy is 0.988276,the average AUC value is 0.995927,and the average Matthew's correlation coefficient(MCC)is 0.976515;When predicting four external test data sets,the average accuracy ranges from 0.936631 to 0.985237;When performed on Caenorhabditis elegans,Drosophila and Escherichia.coli datasets,the average prediction accuracy values are 0.987936,0.991398 and 0.975190,the average AUC values are 0.998742,0.997156 and 0.990894,the average MCC values are0.976081,0.982930 and 0.950668 respectively.Compared with the current methods for predicting PPIs based on sequences,our proposed Attention-CNN models have considerable advantages on predicting PPIs and all evaluation metrics are almost higher than the previous methods.(4)In addition,this study also conducts cross-species testing and puts forward the hypothesis of genetic relationship between 5 different species.Besides,we constructs a species evolutionary tree to demonstrate the 5 species' genetic relationship.Especially the genetic relationship between human and Mus Musculus.
Keywords/Search Tags:Convolution Neural Network, Attention Mechanism, Protein-Protein Interactions, Genetic Relationship
PDF Full Text Request
Related items