Font Size: a A A

Search,Storage And Visualization Of Protein Mass Spectrometry Data

Posted on:2022-07-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:R M YangFull Text:PDF
GTID:1480306608479954Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
Protein mass spectrometry data analysis has long been out of the era of manual calculation and entered the era of mass spectrometry software.Protein mass spectrometry data analysis software must be tested and verified by actual mass spectrometry data to become a tool for qualitative and quantitative analysis of protein.It is often necessary to collect the analysis results of the same set of protein data from different laboratories to verify the accuracy of the software mass spectrometry analysis results and modify the software calculation steps according to the actual mass spectrometry analysis results,to further improve the accuracy of the software mass spectrometry analysis results.The existed mass spectrometry data analysis software can not satisfy the needs of proteomics analysis in the current days.The algorithm with faster calculation and more accurate output is still the key to improve the performance of mass spectrometry data analysis.Protein mass spectrometry data visualization has always been an integral part of protein mass spectrometry data analysis.With the rapid growth of protein mass spectrometry data scale,the performance of protein mass spectrometry data visualization has become the key to improve the performance of protein mass spectrometry data analysis.Visualization of mass spectrometry data has become a hot topic in protein mass spectrometry data analysis in recent years.Database search has always been the main method for protein identification by top-down mass spectrometry.With the growth of the protein database data scale,the identification speed of existing protein identification software becomes slower.It is an important means to improve the speed of data search in protein identification by filtering the data of the protein database and controlling the data scale to be searched in the database.There are mainly two kinds of protein sequence filtering algorithms,one is based on sequence tags and the other is based on gaped tags.The disadvantage of the former is the difficulty in obtaining tags due to losing peaks.The disadvantage of the latter is that many incorrect tags are generated by noise peaks.Liquid chromatography coupled with mass spectrometry(LC-MS)has become a standard technique for proteomics and metabolomics experiments.It is widely used in the discovery of high-throughput biomarkers and the identification of potential drug targets.Although the technology is relatively mature,there are still many factors that affect the quality of mass spectrometry data.For example,errors made during sample preparation,problems with the LC,and contamination of the sample.Visual inspection is just a visual and rapid method to find the above problems.This requires the use of mass spectrometry data visualization software,through human-computer interaction,intuitive view of mass spectrometry data to judge the quality of mass spectrometry data,to achieve the purpose of correcting mass spectrometry data and improving the quality of mass spectrometry data.This paper focuses on the analysis of protein mass spectrometry data.The innovation and characteristics of the work content are shown in four aspects1.We proposed a directed acyclic graph representation of mass spectrometry.The directed acyclic graph is named spectrum graphs.Compared to the previous method of the sequence tad and the gaped tag,this representation preserves are more useful to keep mass spectral information.We also proposed two algorithms to generate the minimal covering set of blocked patterns and the optimal blocked patterns by spectrum graph,which optimized the generation of the blocked pattern.2.We proposed the spectrum graph matching(SGM)problem and a novel filtering algorithm based on SGM.The filtering algorithm generates spectrum graphs from the subspectra of a query spectrum and uses the spectrum graphs to filter the protein sequence.In this paper,we proposed an algorithm based on the suffix tree,which was designed to search the protein database with protein sequences represented by those spectrum graphs.The consequences of the MS experiment have shown that the SGM can filter the protein sequences with high efficiency.3.We proposed a method for evaluating the quality of a summary based on a consensual assumption for a summary of a data window to be one whose image can achieve the closest visual similarity with that of all peaks in the data window.We also introduced mzMD,a new storage and retrieval system for MS data visualization.mzMD consists of a novel file format and a packaged HTTP server that enable effective MS data storage and query.mzMD is designed for applications that require frequent,fast MS data queries.Experimental results show that mzMD can obtain a high-quality mass spectrum data window summary with better speed stability than the original software.4.We designed the MS-Viewer and implemented its function.MS-Viewer solved those problems just like the ill-navigation and ill-installation.Beyond that,it also contained some more new functions just like the scroll visual and the zoom in/out function within intensity-based.MS-Viewer adopted the method called hierarchy storage to store and search the data of the mass spectrum,which can be effective to speed up the data rendering.Beyond that,this storage construct can advance the performance of the software benefited from the function of the data cache and data prefetch.
Keywords/Search Tags:Mass spectrometry, Spectrum graph, Filtering algorithm, Mass spectrometry visualization, Mass spectrometry data storage
PDF Full Text Request
Related items