Font Size: a A A

Model Diversity-Based Selective Ensemble Algorithms Combined With GC-MS Urinary Metabolomics For Screening Of Inborn Errors Of Metabolism

Posted on:2021-08-31Degree:MasterType:Thesis
Country:ChinaCandidate:Y J FuFull Text:PDF
GTID:2491306038484914Subject:Analytical Chemistry
Abstract/Summary:PDF Full Text Request
Inborn errors of metabolism(IEMs)represent a large group of rare and complex disorders caused by genetic mutations.Such disorders easily result in toxic substrate accumulation so as to cause minor to severe neurological and psychiatric manifestations and even lead to lifelong disability or sudden death in newborn.Therefore,the screening of IEMs and the exploration of its pathogenesis are crucial for initiating early treatment to minimize morbidity and mortality.Currently,urine-based metabolomics as a leading technology has been successfully employed in IEMs screening.The advanced highthroughput analytical techniques coupled with chemometric methods are the basic strategy of metabolomics.Based on the advanced analytical techniques,a large amount of metabolomic datasets containing noise and unrelated information can be obtained.The primary tasks of chemometrics are identifying the metabolic differences(i.e.,pattern recognition)among groups and screening out significant potential metabolites(i.e.,variable selection)that characterize metabolic differences among groups.Therefore,it is essential for the development of efficient and robust chemometrics methods to facilitate the screening of IEMs.In the current thesis,in view of the potential of selective ensemble algorithms based on model diversity in improving the stability and reliability of a single model and reducing algorithm learning costs,the advantages of classification tree(CT)as well as least absolute shrinkage and selection operator(LASSO)in automatically performing variable selection and determining variable importance,and the superior modeling performance of extreme learning machine(ELM)and partial least squares discriminant analysis(PLS-DA),two new stability-based chemometrics methods suitable for metabolomics data analysis were developed.The newly proposed algorithms coupled with GC-MS technology were used in the screening of IEMs.The specific work content is as follows:(1).In the present chapter,considering that classification tree(CT)can automatically select more significant variables,and ELM shows satisfactory prediction capability but fails to perform variable selection,we developed a basic learning algorithm via combining classification tree with extreme learning machine(CTELM).In CTELM,the classification tree with right size was firstly constructed,then an ELM model was built with the splitting variables of CT acting as the inputs of ELM and the node number involved in CT taken as the neuron one in hidden layer of ELM.Moreover,considering the fact that the selective ensemble algorithm can significantly improve the robustness and reliability of a single model,double fault(DF)measure and bagging were invoked to be combined with CTELM to form a new robust chemometrics method,i.e.,double fault-bagging-classification tree extreme learning machine(DF-BAG-CTELM).In DF-BAG-CTELM,by DF method,a series of CTELM sub-models were established in parallel using bagging followed by testing the their pairwise diversities to select some models with large diversity to form the final ensemble system.The proposed DF-BAG-CTELM,compared with BAG-CTELM,CTELM and ELM,was applied for GC-MS urinary metabolomic analysis of two most common IEMs,methylmalonic acidemias(MMA)and 3-methylcrotonyl-CoA carboxylase deficiency(3-MCCD).The results revealed that the invokation of CT well improves the interpretability of ELM,and DF-BAG-CTELM can further improve the generalization ability and the stability of single CTELM model.In addition,combined with one-way ANOVA and fold change,DF-BAG-CTELM identified three informative metabolites associated with MMA,including 3-OH-propionic-2,methylmalonic-2 and methylcitric4(2).Methylcrotonylglycine-1 was identified as the potential metabolite related to 3MCCD.(2)Considering that LASSO can effectively eliminate irrelevant variables and PLSDA is good pattern recognition tool,LASSOPLSDA was developed in the present chapter.In LASSOPLSDA,the variable importance was determined by LASSO,PLS-DA was utilized to build the relationship between the variables selected by LASSO and the class memberships.Here,in order to improve performance of a single LASSOPLSDA model,another selective ensemble algorithm based on DF and boosting was developed to improve the performance of LASSOPLSDA to form an another robust chemometrics algorithm,that is double fault-boosting-least absolute shrinkage and selection operator partial least squares discriminant analysis(DF-BST-LASSOPLSDA).In DF-BST-LASSOPLSDA,firstly,a set of LASSOPLSDA sub-models were iteratively generated based on the idea of boosting.And then,the diversity between LASSOPLSDA models was calculated by DF measure and the final ensemble system consists of LASSOPLSDA models with more large diversity.Finally,the prediction results of these selected models were combined by means of the majority voting,the variable importance was determined by the regression coefficient and the selected frequency of each variable.A GC-MS-based metabolomics dataset related with methylmalonic acidemias(MMA)was used to evaluate the performance of the proposed DF-BST-LASSOPLSDA algorithm compared with BSTLASSOPLSDA,LASSOPLSDA and PLS-DA.The results indicated that DF-BSTLASSOPLSDA offered more excellent prediction capability than BST-LASSOPLSDA,LASSOPLSDA and PLS-DA.It can effectively select informative variables with robustness and reliability.Five potential metabolites related to MMA were identified by DF-BST-LASSOPLSDA combine with t-test and fold change,including 3-OH-propionic2,3-OH-isovaleric-2,methylmalonic-2,methylcitric-4(1)and 2-OH-sebacic-3.
Keywords/Search Tags:Inborn errors of metabolism, Metabolomics, Chemometrics, Classification tree, Extreme learning machine, Least absolute shrinkage and selection operator, Partial least squares discriminant analysis, Selective ensemble
PDF Full Text Request
Related items