Font Size: a A A

Particle Swarm Optimization-based Selective Bagging:With Application In Analyzing Dataset Of ~1H NMR-based Lung Cancer Metabolomics

Posted on:2017-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:G H MaFull Text:PDF
GTID:2284330488485512Subject:Analytical Chemistry
Abstract/Summary:PDF Full Text Request
Metabolomics, as a global approach, aims to analyze sets of low molecular weight compounds present in a biological fluid, such as plasma and cell. It is especially useful in identifying overall metabolic changes associated with a particular biological process and finding the most affected metabolic networks. Meanwhile, metabolomics deals with large datasets with high complexity. Consequently, how to analyze and mine the useful information from metabolomics datasets is the most crucial issue. Selective ensemble learning firstly trains a number of basic classifiers using the idea of ensemble and then selects some of them to make the final decision. By selecting some of the basics learners with high diversity, selective ensemble can improve the generalization ability and the stability of the traditional individual basic learner and the conventional ensemble tools. In the current thesis, taking the characteristics of metabonomics data, the properties of Bagging and optimization capacity of particle swarm optimization (PSO) into account, we developed a new selective Bagging by combining PSO and Bagging (PSOBAG). The newly developed selective Bagging (PSOBAG) has been used to improve the stability and the generalization ability of classification tree (CT) and partial least-squares discriminant analysis (PLS-DA), forming two new chemometric algorithms for analyzing the datasets in metabonomics, i.e., PSOBAGCT and PSOBAGPLS-DA. These two newly designed algorithms were used for analyzing the metabolomics data sets obtained by using 1H NMR for lung cancer serums.(1) According to the variance/bias decomposition of generalization error related to ensemble, the larger diversity among the individual sub-models, the more stable and accurate final decision one can obtain. Therefore, in this chapter, we developed a new selective Bagging by combining PSO and Bagging (PSOBAG) to improve the stability and the generalization ability of classification tree (CT), forming a new chemometric algorithm for analyzing the datasets in metabonomics, namely, PSOBAGCT. The main procedure of PSOBAGCT is as follows. Firstly, a set of CT sub-models was produced based on the idea of Bagging (BAGCT), and then PSO was invoked to search for a subset from all the constructed CT models. Such subset includes a series of CT models with the largest accuracy and diversity. Finally, the prediction results of the obtained subset were integrated by the relative majority vote to get the final prediction result. Combined with 1H NMR-based metabonomics, PSOBAGCT compared with BAGCT and CT was also applied to recognize patients with lung cancer from the healthy controls. The results showed that, the invoking of Bagging significantly improves the performance of a single CT. In addition, the introduction of PSO to select a portion of CT models with larger diversity can further improve the model performance. What’s more, PSOBAGCT also identified several metabolites with statistical significance to aid the diagnosis of lung cancer, such as lipids, lactate, glycoprotein, alanine, threonine, myo-inositol, glutamine, proline, trimethylamine, dimethylamine and 3-hydroxybutyrate.(2) Based on the strengths and weaknesses of partial least-squares discriminant analysis (PLS-DA) in data analysis of metabanomic community, in this chapter, the newly developed selective Bagging (PSOBAG) was employed to improve the performance of PLS-DA, forming another new chemometric algorithm for analyzing the datasets in metabonomics, that is, PSOBAGPLS-DA. In this algorithm, PSO was employed to search for the optimal subset with the largest accuracy and diversity from all the constructed PLS-DA models. Similarly, as the performance comparison, BAGPLS-DA and PLS-DA were also used to analyze the above-mentioned metabonomics datasets related to lung cancer. Compared with PLS-DA, the invoking of Bagging (BAGPLS-DA) can significantly improve the recognition performance. Furthermore, by introducing PSO to select a portion of BAGPLS-DA subset with larger diversity can further improve the model performance. Meanwhile, several significant metabolites have also been identified by PSOBAGPLS-DA, including lipids, lactate, glycoprotein, alanine, threonine, myo-inositol, glutamine, proline, trimethylamine and choline.
Keywords/Search Tags:Metabonomics, Partial Least-Squares Discriminant Analysis, Classification Tree, Selective Bagging, Particle Swarm Optimization, Lung Cancer
PDF Full Text Request
Related items