Font Size: a A A

Study On Peptide Sequencing Method Based On Graph Neural Networks

Posted on:2022-06-19Degree:MasterType:Thesis
Country:ChinaCandidate:C N MuFull Text:PDF
GTID:2480306554953009Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Tandem mass spectrometry is the most important technology in proteomics research,common sequencing methods of tandem mass spectrometry include: protein database searching,de novo sequencing,spectrum library searching etc.Among them,de novo sequencing can derive the peptide's sequence directly from tandem mass spectrometry(MS/MS),the method had the advantage of being independent on any protein databases and plays a key role in the characterization of protein sequences of unknown species,monoclonal antibodies sequencing and other fields.However,due to the complexity,the accuracy of de novo sequencing was much lower than the database searching methods and therefore have limited the wide application of de novo sequencing.Focused on addressing this issue,denovo-GCN,a new de novo sequencing method based on graph convolutional neural networks(GCN)was proposed.Building the spectrum peaks relationship graph(spectrum graph)and traversing the path on the spectrum graph was the core idea of the de novo sequencing method based on graph theory.denovo-GCN combined the spectrum graph with the graph convolutional neural networks to form a new sequencing approach.In this method,the relationships between spectral peaks in tandem mass spectrometry were expressed by using graph structure and the features of each MS/MS peak were extracted from its corresponding cleavage site.Then the amino acid cleavage site was predicted by the GCN model and finally a complete sequence was formed step by step.Compared with the traditional method based on graph theory,denovo-GCN simplified the preprocessing process of spectral data,the spectral peaks relationship graph and the MS/MS peak's features were realized within the model framework.The candidate sequences did not need to be scored again,therefore the sequencing process was more concise.Three significant parameters affecting the model were experimentally determined,that is,the GCN layer structure,the combination of ion types and the number of MS/MS peaks used for sequencing.Datasets from a wide variety of species were used for experimental comparison.The experimental results show that,according to a peptide-level recall,the performance of denovo-GCN is9.8?21.1 percentage points higher than Novor,4?12.8 percentage points higher than pNovo,and is 2?10.6 percentage points higher than DeepNovo which adopts convolutional neural networks(CNN)and long short-term memory networks(LSTM).denovo-GCN was compared with pNovo by calculating the hit rates of top-10 sequencing results.These two types of sequencing methods still have a large space for improvement.The types of errors in the sequencing results of the two kinds of methods were summarized,the difference between labeled peptide and predicted peptide was analyzed by peptide-spectrum matches.It was found that the internal ions can provide more information for sequencing,and the effect of internal ions on the sequencing performance of our model was further demonstrated.
Keywords/Search Tags:graph convolutional neural networks (GCN), de novo sequencing, peptide identification, tandem mass spectrometry, proteomics
PDF Full Text Request
Related items