Font Size: a A A

Application Of Consensus Model In Near Infrared Spectroscopy Based On SOM Clustering Variable Selection

Posted on:2018-06-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y J LaiFull Text:PDF
GTID:2321330518987477Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data Modeling is one of the most important parts in chemometrics research. According to the different tasks of data Modeling, it can be divided into quantitative analysis and qualitative analysis. At present, the most commonly used method in data modeling is the single model modeling, which is the best performance model selected from a series of prediction models developed in process of the repeated analysis of measurement data simultaneously. However, thousand number of data is provided for sample by thousand channels of modern high-throughput analytical instruments, while it often encounters the problem with fewer samples, but more variables . Therefore, it is difficult to meet the requirement of practical prediction by using the single model modeling approach. In order to compensate the weakness of single model modeling,consensus modeling strategy was studied and has been widely applied in many research fields in last decades. Consensus modeling is to establish multiple membership models by amounts of the certain modeling methods, and combine to predict unknown samples by one certain strategy, thus resulting a consensus prediction to improve the model's accuracy and reliability. In this work, the consensus modeling method was applied to analyze the near-infrared spectral data, and the linear multi-member model consensus and non-linear multi-member model consensus were discussed. The main contents are as follows:The background and significance of the topic were introduced, and the basic principle of data modeling and the modeling method applied were addressed in this paper. The variable selection methods based on multi-regression consensus modeling methods were discussed and the purpose of variable selection was analyzed, then the linear and non-linear multi model consensus were proposed, here including multi - regression consensus modeling based on partial least squares (C-SOM-PLS) and least squares support vector machine (C-SOM-LS-SVM). The modeling are as follows: firstly the similar variables are clustered into the same cluster unit by Kohonen self-organizing feature mapping network (SOM),then N clusters are obtained whith correspond to N subsets of original NIR data. Secondly, N subsets were divided into training set, calibration set and test set respectively by Duplex algorithm. the training sets were used to establish a series of member regression models, which would be evaluated by calibration set, and the optimized model was selected by comparisions of models performance (is.error), which were used to re-distrbute the weighting coefficient of the consensus model. Thirdly, the test of the unknown sample by the member model was combined with the weighted sum method to form a consensus result. The results showed that the prediction performance of the consensus model was better than the single model, not only improved the prediction accuracy of the model,but also enhanced the stability of the model.Results of C-SOM-PLS, C-SOM-LS-SVM and their respective member models were compared. It was found that the performance of some consensus model was worse than the member model, indicating that the over-fitting of member model had an negative impacts on the consensus model. In order to reduce the influence of over-fitting on the model, the model population analysis (MPA) was employed in this study,namely the C-SOM-MPA-PLS model in the PLS consensus model. Three steps to realize this algorithm are follows : firstly, the sub-datasets are obtained by Monte Carlo sampling. Secondly, a sub-model is established for each sub-data set. Thirdly, the parameters of all the population sub-models are analyzed statistically from the sample space to obtain useful information. The results showed that the employment of MPA could reduce the influence of over - fitting on the consensus model, but also make the prediction effect better than that of the member model.
Keywords/Search Tags:Quantitative analysis, Consensus model, Member model, Variable selection, Model population analysis
PDF Full Text Request
Related items