| Metabonomics,as a rapidly developing omics science,provides integral metabolic information of the whole bio-system.The basic strategy of metabonomics lies in the combination of high-throughput analytical platforms and chemometric multi-analysis methods.Variable selection aims to find the potential biomarkers with high sensitivity and specificity from complex metabonomics data with high dimensions.However,with the rapid development of modern analytical platforms,the metabonomics data gained is increasingly complex,directly affecting the classification performance of the algorithm and the identification of biomarkers.Moreover,it is unreliable that the variable importance is generally ranked by making use of a single score resulting from a single recognition model.Therefore,how to improve the stability and reliability of the results has become a major problem in metabonomics research.In this dissertation,inspired by the advantages of classification tree in automatically identifying important variables and measuring their corresponding variable importance values,the great potential of ensemble algorithm in improving the stability and reliability of single recognition model and the properties of radial basis function network,we formed two new methods for metabonomics data analysis.An 1H NMR-based metabonomics dataset associated with lung cancer was used to validate the performance of two proposed algorithms.The specific works are as follows:(1).In the current chapter,taking into account the good reliability and robustness of bagging classifcation tree(BAGCT)in variable selection,and the potential modeling performance of radial basis function network(RBFN),we combined BAGCT with RBFN to form a new algorithm based on Bagging CT(BAGCT),that is,bagging classification tree-radial basis function network(BAGCT-RBFN).In BAGCT,a set parallel of CT models were bulit on the idea of bagging.Each CT provided some endowed information such as the splitting variables and their corresponding contribution values.The informative variables can be successfully discovered via inspecting the variable contribution values over all CTs in BAGCT.The variables with importance values larger than zero were used as inputs of RBFN.An 1H NMR-based metabonomics dataset associated with lung cancer was used to validate the performance of the newly proposed BAGCT-RBFN algorithm compared with traditional CT and RBFN.The results showed that BAGCT-RBFN had significantly improved classification performance compared with traditional CT and RBFN.Moreover,BAGCT can eliminate a large number of irrelevant information variables,effectively improve the generalization ability of RBFN,and at the same time improve the stability and reliability of variable selection results.In addition,the informative metabolites associated with lung cancer were identified by BAGCT-RBFN,including lactate,choline,myo-inositol,trimethylamine,proline,threonine,and lipid.(2).In the current chapter,inspired by the characteristic of classification tree(CT)in automatically selecting the most informative variables and measuring their importance,the potential of boosting in improving the reliability and robustness of a single model,and the promising modeling performance of radial basis function network(RBFN),we designed an another new chemometrics tool,i.e.,boosting classification tree-radial basis function network(BSTCT-EBFN),for metabonomics data analysis.The main idea behind BSTCT is to iteratively establish a series of CT models on the various weighted versions of the original training set based on the idea of boosting(BSTCT).That is,the subsequent CT model was constructed mostly on samples with large error.The informative variables can be successfully spied via inspecting the variable importance values over all CTs in BSTCT.Then,RBFN was utilized to relate the identified informative variables to the class memberships to form a final classification model.To demonstrate the practical application of BSTCT-RBFN in metabonomics,an 1H NMR-based metabonomics dataset associated with lung cancer was applied.The results showed that BSTCT-RBFN can find a shortlist of discriminatory variables with reliability while attain more satisfactory classification accuracy than traditional CT and RBFN. |