Font Size: a A A

Peptide Sequencing From Data-independent-acquisition Mass Spectrometry

Posted on:2024-09-12Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2530307136472764Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Tandem mass spectrometry is an important technology in proteomics research.Common tandem mass spectrometry sequencing methods include protein sequence library search,spectrum library search,de novo sequencing,etc.Among these,the de novo sequencing method can become the mainstream of peptide identification without any protein or DNA database,and it can be applied to analyze the protein sequence of new species,as well as the protein sequence of species whose genome has not been sequenced.However,the existing de novo sequencing methods such as peak screening,denoising,and other spectral preprocessing operations are cumbersome,which will also lead to the filtering of signal peaks in the spectrum,as well as the data acquisition method,data-independent-acquisition(DIA),which can theoretically achieve the full scan of peptides.However,the highly complex mixed MS/MS spectra pose challenges for the accurate identification of peptides and proteins,and the correspondence between precursor ions and fragment ions is destroyed.Correct analysis of DIA data has become the first prerequisite for sequencing DIA.To solve the above problems,this paper carries out research from the following two aspects:(1)Given the problems of the low number of identification results and high repetition rate of spectra in existing pseudo-MS/MS spectra data analysis methods,Corr DIA reconstructs the correspondence between precursors and fragments by calculating cosine similarity of their chromatograms.The number of identification results was greatly improved by increasing the operation of isotopic peak cluster removal and redundancy removal.In addition,Corr DIA used the information of the fragmentation window as the center of MS2 spectra to reduce the search space and improve the accuracy of spectrum removal.The experimental results showed that the number of peptides obtained by Corr DIA pseudo-MS/MS spectra was significantly increased compared with the traditional method,and the redundancy,that is,the possibility that most of the spectra came from the same peptide,was significantly reduced,and the effect of data analysis was significantly improved.(2)Through the above analysis of DIA data,high quality pseudo-MS/MS spectrum can be obtained,which can be used as the input file of DIA de novo sequencing.We proposed a DIA de novo sequencing method based on graph convolutional neural network,GCNovo-DIA.It bases on the graph convolutional neural network(GCN)model,and the characteristics of spectra peak do not need to be learned through convolutional neural network.The type of amino is predicted by GCN to generate a complete peptide sequence.Compared with the traditional method based on GCN,GCNovo-DIA simplifies the preprocessing of spectra peak and retains more spectra peak information.Compared with the method based on machine learning,the spectra peak relationship diagram and feature matrix are realized in the model framework,and there is no need to design spectra peak characteristics separately or rescore for candidate sequences.So simplify the sequencing process.The experimental results show that the number of unique peptide sequences in several different data sequencing results is higher than the existing methods,which proves the effectiveness of the graph convolutional neural network for DIA sequencing.
Keywords/Search Tags:data independent acquisition, pseudo-MS/MS spectra, cosine similarity, graph convolutional neural networks, de novo sequencing
PDF Full Text Request
Related items