Font Size: a A A

Studies On Structured Storage And Compression Of Mass Spectrometry Data

Posted on:2011-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:H B MaFull Text:PDF
GTID:2120330338990063Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Mass spectrometry (MS) is currently the most commonly used technology for the identification of proteins. In MS data processing, multiple strategies are frequently used to analyze MS data, and the data analysis pipeline is always divided into multiple subtasks with support of multiple software tools in each subtask. One problem of MS data processing is the non-uniform data formats produced from different types of mass spectrometer by different laboraries and the different data formats used by the following analysis softwares, which may affect MS data exchange and integration. It may lead to some challenges in the development of integrated MS Data Processing Platform (MSDPP) and the construction of MS databases.In the current study conditions, a research focus is to develop a integrated MS data analysis platform with many MS data analysis methods and several typical data analysis pipelines. Meanwhile, sorting out the published MS data to establish a MS database is also an important work in proteomics, and carrying out this work should be based on the development of MS data format standards. Based on the above status, the main content of this thesis are as follows:1) The implementation of common data acess interface for MSDPP The MSDPP, being designed by our working group, is a web-user-interface based platform, and it is mainly made up of data management system, tool management system, document management system and user interface system. The platform should carry out the function of submit, query, storage and sharing of MS data, and implement custom data analysis pipeline etc.. The common data acess interface can convert the data formats ( UDF ) used in the data management system to those of analysis softwares contained in the tool management system and Vice Versa.2) A unified data format for MSDPP On the basis of existing data standards, summarizing the information needed in MS data processing methods, a new MS data format (not a standard) was presented, which will make up the deficiency of XML-based data standards in the data analysis as much as possible, making the data format applied to the entire process of a typical protein identification.3) The preliminary study of MS data compression The characteristic of MS experiment itself determines that repeated experiment strategy is needed, and each MS experiment produces huge amount of experimental output data, a rising problem is that vast amounts of experimental data are required to be stored. The size of the data file may doubled when changing the experimental data to XML-based data. Therefore, we attempt to use the existing text-compression and XML-compression techniques to slove this problem, and propose a simple compression method with the preprocessing of MS data structure to test the compression performance.
Keywords/Search Tags:Proteomics, MSDPP, MS data format standards, MS data compression
PDF Full Text Request
Related items