Font Size: a A A

.1 Machine Learning Method To Predict Protein Domain And Ligand Interactions. Retention Time Auxiliary Peptides Identified By Mass Spectrometry

Posted on:2008-11-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:1114360272981944Subject:Pathophysiology
Abstract/Summary:PDF Full Text Request
A fairly large set of protein interactions are mediated by families of peptide binding domains, such as SH2, SH3, PDZ, and MHC etc. To identify their ligands by experimental screening is not only labor intensive but almost futile in screening low abundant species, due to the suppression by high abundant species. An ideal way of studying protein-protein interactions is to use high-throughput computational approaches to screen protein sequence databases, so as to direct the validating experiments towards the most promising peptides. Predictors with only good cross-validation were not good enough to screen protein databases. In the current study we built integrated machine learning systems using three novel coding methods, and screened the Swissprot and Genbank protein databases for potential ligands of 10 SH3 and 3 PDZ domains. A large fraction of predictions have already been experimentally confirmed by other independent research groups, indicating a satisfying generalization capability for future applications in identifying protein interactions. To reduce false positive rate of peptide identification from tandem mass spectra, thresholds of search engine filters were raised, which in turn leads to high false negative rate and greatly reduces proteomics research efficiency. Here we constructed an Empirical Peptide Retention Time (EPRT) database. For a new experiment, the retention times of peptides identified with substandard MS/MS spectra were compared with their corresponding empirical ones in EPRT database. In 18 known protein mixtures and urinary proteome experiments, the numbers of spectra identifications were increased by about 15-60% and peptides increased by about 8-18%. Without consideration of spectra quality, the false positive rate of the new identifications by EPRT approach was 3. 7%; combined with minimal Xcorr, it was 0. 18%. Some substandard spectra from low abundant peptides were identified with this approach. It increased the confidence of peptide identifications, which was particularly important for peptides identified by single spectra. As more peptides retention time being collected in the EPRT database, the efficacy can be further improved. This approach can be used in any standardized reproducible separation systems.
Keywords/Search Tags:Protein protein interaction, domain ligand interaction prediction, machine learning, neural network, database screening, Peptide identification, retention time, empirical peptide retention time database, false negative rate
PDF Full Text Request
Related items