Font Size: a A A

Sparse Representation Based Protein Mass Spectrometry Data Analysis

Posted on:2013-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:J Q KeFull Text:PDF
GTID:2230330371961858Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Protein mass spectrometry is a powerful tool in proteomics research. Its potential for seekingproteomic biomarkers and early diagnosis of cancer has received a great deal of attention in recentyears. From the standpoint of pattern recognition, the cancer diagnosis and the search of biomarkersare corresponding to the most classical pattern classification and features selection problemrespectively.Suffering from the traits of high dimensionality, and small sample in mass spectrometry data,the traditional classification methods of pattern recognition lost theirs power and be helpless.Usually the classification methods of pattern recognition strongly depend on the procedures ofdimensionality reduction and training. But there are several glaring issues: first, a complexdimension reduction method may have a good performance on one data set, but it is difficult to bedirectly applied to other datasets. Second, some feature extraction methods (eg PCA) only serve tothe classification task, in the new feature space which generated by PCA, the data are dimensionlessnumbers. So, these methods do not meet the needs of biomarkers’ identifying obviously. Finally, theclassifier would degenerate into a hard rock and could not study new materials any more when thetraining is complete.For the problem of classification, we used a frame of sparse representation in the analysis ofmass spectrometry data. A random projection matrix which satisfys the Gaussian distribution isused as a replacement for feature selection;A sample extension method is used to improve the casethat the coefficients of sparse representation are not sparse enough, and it shows the sparserepresentation has good anti-robust classification capabilities.In addition, the sparse that classification is an online-based learning algorithm, according tothe actual sample of self-regulation, so as to continuously "evolution", It is a great intelligentlearning algorithm.For the good performance of sparse representation, in this paper, We proposed a new sparseclassifier based feature selection method which like a "wrapper" method to find a candidate set ofbiomarkers first, then, the very few candidates which have outstanding contributions toclassification would be selected.Results on the public data sets and data sets from clinical experimental show that theclassification of sparse representation classification (SRC) having a good classification performanceand the ability of anti-robustness. SRC can be used in the classification of protein mass spectrometry data. The feature selection algorithm can pick out a few small high performances, andthe significance biomarkers.
Keywords/Search Tags:Protein mass spectrometry, sparse representation, cancer diagnosis, feature selection, biomarker
PDF Full Text Request
Related items