Font Size: a A A

The Evaluation And Application Of Quality Control Tools For Identification Of Proteins By Tandem Mass Spectrometry

Posted on:2018-10-05Degree:MasterType:Thesis
Country:ChinaCandidate:X D FengFull Text:PDF
GTID:2310330569986535Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Quality control is the key problem in large scale proteomics data analysis,and target-decoy search strategy has become the main strategy in quality control.It's available to most database search engines and the base of most quality control method.While this strategy can estimate the false discovery rate(FDR)within a dataset,it cannot directly evaluate the false positive matches in target identifications.As a supplement to target-decoy strategy,the entrapment sequence method is introduced to assess proteomics data analysis process.In the target-decoy search strategy,only sample sequences(A)are used as target sequences;while in the entrapment sequence method,the target sequences are composed of sample sequences(A)and entrapment sequences(B),which are of low homology with the sample sequences.Then the combined target sequences(A+B)are reversed to construct decoy sequences(A'+B'),by using different labels,we can know which sequences database(A/B/A'/B')the identifications come from.Around the entrapment sequence method,the following two work are performed:1.The entrapment sequence method is further developed.In this research,the pyrococcus furiosus data were used as sample data,and the corresponding pyrococcus furiosus protein sequences were used as sample sequences;while the homo sapiens protein sequences were used as original entrapment sequences,then the original entrapment sequences were randomized to construct 13 different sizes of entrapment sequences,which were further combined with sample sequences to construct 13 kinds of target database for identification.By this method,we discussed the entrapment sequences database size's influence to the evaluation,and found the appropriate proportion of entrapment/sample sequences database is 10.This research also defined the False Match Rate(FMR)in the entrapment sequence method as well as the computational formula.2.Based on the entrapment sequence method,we evaluated two key procedures in proteomics data analysis workflow: Database Search Engine and Quality Control Method.According to the result of research 1,we first constructed a target database with the entrapment/sample sequences database proportion of 10;tested by both standard and experimental datasets,we first evaluated database search engines' s original scores,then four quality control methods,then database search engines' reprocessed scores.This research also proposed an alternative intergrated method for results from different database search engines or quality control methods.At last,this research discussed the possibility of using a small size entrapment database for the evaluation(Equal size of entrapment sequences database and sample sequences database).This research found MS-GF+'s both original score and reprocessed score performed best in all five search engines;and PepDistiller performed best in all four quality control methods;by grouping the identifications and filtering the identificaitons again according to FDR,we could increase the number of identifications and improve the confidence level at the same time.This study will provide researchers with a reasonable reference for proteomic data analysis when selecting database search engine or quality control methods;so as to standardize the proteomics data analysis process.
Keywords/Search Tags:Shotgun Proteomics, Quality Control, Evaluation, Target-Decoy, Entrapment Sequence Method
PDF Full Text Request
Related items