Font Size: a A A

Research On Spectral Multivariate Correction Method Based On LASSO

Posted on:2022-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:K Y WangFull Text:PDF
GTID:2511306494994309Subject:Chemical Engineering and Technology
Abstract/Summary:PDF Full Text Request
Spectral analysis is widely used in food,agriculture,medical treatment,petrochemical industry and other fields because of its fast,non-destructive,non polluting.An accurate,stable and extensive multivariate calibration model is the key to predict the components or characteristics of unknown complex samples.The existence of irrelevant information variables in the spectra affects the prediction performance of modeling.Thus variable selection methods are used to remove irrelevant variables to improve the prediction performance of a single model.This paper introduces LASSO variable selection method and ensemble strategy to build model for quantitative analysis of complex samples,as follows:1.A method based on LASSO and partial least squares(PLS)was proposed.It was applied to the near infrared spectroscopy(NIR)of tobacco and ternary blend oil datasets and the Raman spectroscopy of biological samples dataset.The contents of reducing sugar,sesame oil and sarcosine were analyzed quantitatively,respectively.Firstly,LASSO was used to select variables from complex samples.Then PLS model was established by selected variables,in which the optimal latent variables(LVs)of PLS was determined by the root mean square error of cross validation(RMSECV)with the change of LVs.Compared with uninformative variable elimination PLS(UVE-PLS),Monte Carlo uninformative variable elimination PLS(MC-UVE-PLS)and random test PLS(RT-PLS),this method uses less variables,faster operation speed.Moreover,it has certain advantages in prediction accuracy.2.A method based on LASSO and extreme learning machine(ELM)was studied.It was applied to the UV-Vis spectra of fuel oil dataset and NIR of blood and orange juice datasets.The contents of monocyclic aromatic hydrocarbons,hemoglobin and sucrose were analyzed quantitatively,respectively.Firstly,LASSO was used to select variables for samples,then ELM model was established with selected samples.The optimal number of hidden layer nodes and activation function of ELM were obtained by the ratio of average value and standard deviation of correlation coefficient(MSR)with the change of the number of hidden layer nodes and the activation function.Compared with PLS and ELM,the correlation coefficient of this method is higher than those of PLS and ELM.The root mean square error of prediction(RMSEP)is lower than those of PLS and ELM.Therefore,LASSO-ELM can improve the prediction accuracy of ELM.3.Based on the advantages of LASSO and ensemble strategy,a double ensemble method based on MC,LASSO and PLS,called MC-LASSO-PLS,was proposed for corn and quaternary blend oil NIR datasets.The contents of oil and rice oil were analyzed quantitatively,respectively.Firstly,all samples were divided into prediction set and training set by Kennard-Stone(KS)method.Then,MC was used to randomly select samples from the training set.After that,LASSO was used to select variables from the selected samples to get a training subset.Finally,a PLS sub model was established to get a prediction.The whole process was repeated T times to get T PLS sub models and T predictions.T predictions are averaged to get the final prediction.Compared with LASSO-PLS,MC-PLS and PLS,this method has higher R and lower RMSEP.Therefore,MC-LASSO-PLS can achieve more accurate quantitative analysis.4.A double ensemble ELM modeling method based on MC and LASSO,which was called MC-LASSO-ELM,was proposed for quantitative analysis of nicotine contents in tobacco leaf dataset and hemoglobin contents in blood dataset.Firstly,certain samples were randomly selected by MC,then variables from the selected samples were selected by LASSO to obtain training subset.Finally,a ELM sub model was established by training subset to obtain a prediction.In this manner,whole process was repeated T times to get T predictions.The final prediction can obtain by averaging T prediction.Compared with ELM,MC-ELM and LASSO-ELM,MC-LASSO-ELM further improves the stability of ELM and the prediction accuracy of model.
Keywords/Search Tags:ensemble modeling, multivariate calibration, complex samples, least absolute shrinkage and selection operator, extreme learning machine, partial least squares regression, Monte Carlo sampling
PDF Full Text Request
Related items