| Proteogenomics is an emerging field of the benefit of the great progress of mass spectrometry. Proteogenomics is an area of research at the interface of proteomics and genomics. Customized protein sequence databases generated using genomic and transcriptomic information are used to help identify novel peptides from mass spectrometry-based proteomic data; in turn, the proteomic data can be used to provide protein-level evidence of gene expression and to help refine gene models.This paper reviews the current state of proteogenomic methods, applications, and computational strategies. Our contribution included5factors:(1) An iterative proteogenomics pipeline is proposed. Using the filtered results of the previous experiment as inputs, our pipeline was divided into several steps to simplifying a complex problem, so what we can do the research efficiently with high sensitivity.(2) We customized the mutant protein sequence databases using a computational algorithm. Which took multiple input interface and figured out several specific biological problems with pre-experiment and validation test.(3) This paper explored the translation termination. It found no translation termination evidences in LC-MS/MS data using the previous experimental results, but the a-chymotrypsin in trypsin reagent causing proteolysis.(4) Verified the uORF. This paper’s predictions of the uORF with the help of the computational algorithm is quite accuracy with the results of previous Ribo-Seq works. It is not suitable using LC-MS/MS to find uORF in protein evidences.(5) Explored the reading frame shift. This paper proposed a computational algorithm which could scanning every frameshift mutations in peptides. It generated six types of rameshift mutations. It also generated glycine mutation forms using the results of database searching, finally found the N-terminal posttranslational modifications. |