Font Size: a A A

Evaluation Of Different Protein Probability Calculating Methods Using A Semi-random Sampling Model And Other MS Bioinformatics Studies

Posted on:2007-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:X F XueFull Text:PDF
GTID:2120360185479445Subject:Cell biology
Abstract/Summary:PDF Full Text Request
Shotgun technology is a widely used approach in proteomics, which can produce large number of mass spectrum data for only one experimental process. The reliability of large-scale data sets in shotgun proteomics is an important problem, and until now most studies on the reliability of protein identification concentrate at the peptide level,. The quality control at protein level is still an intractable problem, current calculation methods of protein probabilities are evaluated basing on only small data sets of control protein samples or manual validation data, which cannot deduce the credible conclusions. So, large and reliable data sets are necessary to evaluate the reliability of calculated protein probabilities.The major object of this study is to establish an efficient evaluation system for different calculation methods of protein probabilities and examine the inpact factors of protein probabilities. A semi-random sampling model was developed according to the careful analyses of the protein identification process to simulate large-scale identified peptides, which were used to evaluate calculation methods for protein probabilities and inpact factors on protein probability.Simulation process was performed according to one data set of a control sample (18 proteins), and peptide or protein number of simulated result were compared with the real result at different peptide hits, demonstrating the efficiency of our model.Based on a experimental data set from human liver sample, 34 data sets were simulated. The three major influence factors, Data set sizes, database sizes and abundance distributions, are examined in our studies. According to these results, we found that the true positive rate decreased with the enhancing of the simulated data set or searched database size, and the depletion of high abundant proteins could increase the true positive rate of identified proteins. Finally, different methods for protein...
Keywords/Search Tags:protein identification, protein probability, shotgun, label-free protein quantification, EM-like algorithm, Mascot
PDF Full Text Request
Related items