Font Size: a A A

Bagging-Based Chemometrics For Analyzing Metabolomics Dataset Of Respiratory Syncytial Virus Pneumonia

Posted on:2020-09-23Degree:MasterType:Thesis
Country:ChinaCandidate:M Y TuFull Text:PDF
GTID:2404330578974629Subject:Analytical Chemistry
Abstract/Summary:PDF Full Text Request
Metabolomics can provide integral insights into the biological properties of cell and organism functions.The advanced high-throughput analytical techniques combined with chemometrics-based metabolomics data analysis are the basic strategy of metabolomics.Among them,the primary tasks of chemometrics lies in identifying the metabolic differences(i.e.,pattern recognition)among groups,usually for two groups(e.g.,the disease group and the health one)and discovering potential biomarkers(i.e.,variable selection)that characterize metabolic differences between groups.In metabolomics data analysis,recognition of metabolic difference among groups and biomarker discovery are often achieved on the basis of a single model.However,it is well known that the obtained results based on a single model are usually unreliable and unstable to same extent.Therefore,in the current thesis,considering the potential of bagging in improving the reliability and robustness of a single model,the advantages of least absolute shrinkage and selection operator(LASSO)as well as classification tree(CT)in automatically performing variable selection as well as determining variable importance,and the excellent modeling performance of radial basis function network(RBFN),two new robust chemometrics algorithms suitable for metabonomics data analysis were developed.A LC-MS-based metabolomics dataset associated with pediatric respiratory syncytial virus pneumonia was used to validate the performance of two algorithms.The specific research works are as follows:(1).In the present chapter,considering the fact that classification tree(CT)can automatically select important variables but generally shows overfitting,and radial basis function networks(RBFN)is a powerful modeling tool but fails to perform variable selection,we designed a new algorithm via combining classification tree with radial basis function network(CTRBFN).In CTRBFN,the tree with right size was firstly induced by traditionally recursive partitioning method and then a RBFN model was constructed with the splitting variables involved in CT serving as the inputs of RBFN and each leaf node in CT contributing a RBFN unit(i.e.,the center and width in each RBFN unit).The variable importance was determined by CT.Moreover,for further improving the performance of a single CTRBFN model,bagging was invoked to be combined with CTRBFN to form a new stability-based chemometrics algorithms for metabolomics data analysis,i.e.,bagging-classification tree radial basis function network(BAG-CTRBFN).In BAG-CTRBFN,a series of parallel CTRBFN models were firstly built based on the idea of bagging,i.e.,resampling and then combining the results from these CTRBFN models to acquire the integrative results by majority voting.The informative variables can be obtained via synthesizing the variable importance over all CTs in BAG-CTRBFN.In the current chapter,a LC-MS-based metabolomic dataset associated with respiratory syncytial virus pneumonia was used as a case study to validate the performance of the newly proposed BAG-CTRBFN compared with traditional CT and RBFN.The results revealed that BAG-CTRBFN offered more satisfactory prediction capability than traditional CT and RBFN while effectively selected important variables with reliability and robustness.Eight informative metabolites related to RSVP were identified by BAG-CTRBFN in combination with t-test and fold change,including TG(18:0/18:1/18:1),TG(16:0/16:l/22:5),Cer(d18:1/24:1),PE(16:0/22:6),PE(16:0/22:6),2-Hydroxyphenylacetic acid,L-Alanine and succinic acid.(2).LASSO,as a well-performed variable selection tool,also shows good pattern recognition capability.In view of the promising property of LASSO in variable selection and pattern recognition and the requirements of metalomics data analysis,in the current chapter,bagging was also invoked to improve the stability and reliability of a single LASSO for metabolomics data analysis.Thus,a new stability-based chemometrics method was proposed,that is,bagging-least absolute shrinkage and selection operator(BAG-LASSO).Based on the idea of bagging,BAG-LASSO adopted the method of random sampling with returning to obtain multiple data subsets,which were used to construct a series of LASSO sub-models.Finally,the prediction results of these sub-models are combined by means of the majority voting.Each LASSO provided a regression coefficient representing the importance values of the variables.The final informative variables can be recognized by inspecting all the LASSOs in BAG-LASSO.Similarly,BAG-LASSO,compared with the traditional LASSO,was applied to the above-mentioned RSVP plasma metabolomics data analysis.The results indicated that BAG-LASSO provides superior recognition performance to LASSO.For the test set,the recognition rate was increased from 78.95%to 84.21%by introducing bagging.In addition,the new algorithm has substantially improved the stability and reliability of LASSO in variable selection.Combined with t-test and fold change,the BAG-LASSO identified ten potential metabolites associated with RSVP,that is,TG(18:0/18:1/18:1),TG(16:0/18:1/20:4),TG(18:2/18:2/18:2),TG(16:0/20:4/20:4),TG(18:2/18:2/22:6),TG(18:1/18:2/22:6),PE(18:0/22:6),glucose-6-phosphate,L-alanine and oleoyl L-camitine.
Keywords/Search Tags:Metabolomics, Chemometrics, Classification Tree, Radial Basis Function Network, Bagging-Classification Tree Radial Basis Function Network, Bagging-Least Absolute Shrinkage and Selection Operator, Respiratory Syncytial Virus Pneumonia
PDF Full Text Request
Related items