Industry 4.0 has driven the manufacturing industry towards a more intelligent transition.Intelligent manufacturing is an inevitable trend in industrial development,and combining artificial intelligence(AI)methods to achieve an efficient and objective quality evaluation of natural products is the key to the development of the tobacco and herbal medicine industries.China is a major tobacco-growing and producing country,and quality control of tobacco is central to the sustainable development of the tobacco industry.Traditional tobacco quality evaluation methods are destructive,subjective and ineffective.Taking tobacco quality control as an example,this study proposed qualitative and quantitative analysis solutions,based on near-infrared spectroscopy(NIRs)with AI,using the NIR spectral data of tobacco leaves and its grade and chemical composition content data as the basis.They can be used to accurately predict tobacco quality,control the quality of tobacco digitally and precisely,and provide a methodological basis for the modernization of traditional Chinese medicine and quality standards research.During the model-building process,the model’s selection,the model parameters’ adjustment,and the division of samples are all important.The AI-based NIRs solutions developed in this study also revolve around these issues.The details and results of the study are presented below.(1)Currently,the grading of tobacco leaves relies mainly on the empirical judgment of experts and the results are not objective and accurate enough.This study developed a rapid grading method based on teaching-learning-based optimization(TLBO)and extreme learning machine(ELM)models for tobacco leaves.After preprocessing the spectral data,the best variable screening method was chosen to build the ELM model,and the number of hidden layer nodes of the ELM model was optimized by the TLBO algorithm,which saved time and computational costs while building the optimal ELM model.The TLBO-ELM method showed better classification results than the partial least squares-discriminant analysis model,with correct classification rates were 89.13%,89.83% and 84.21% for the upper,middle and lower parts of the tobacco leaves,respectively.The combination of NIRs and TLBO-ELM provides a new intelligent tobacco grade determination method.(2)A weighted extreme learning machine(Weighted ELM)based transfer learning algorithm,which does not depend on standard samples,was proposed to develop models for the quantitative analysis of nicotine,Cl,K and total nitrogen content in tobacco leaves by NIRs.The integrated model(Weighted ELM-Ada Boost)built using the target domain data gave significantly better predictions than the support vector machine regression and Weighted ELM models,the correlation coefficients of validation for the models of nicotine,Cl,K and total nitrogen were 0.9713,0.8967,0.8792 and 0.7780,respectively,and the root mean square errors of validation were0.0776,0.0558,0.1350 and 0.1048,respectively.With the help of the source domain data,a Weighted ELM-Tr Ada Boost model for the target domain was developed to enable the calibration transfer of samples scanned by different instruments.(3)A clustering and dimensionality reduction visualization(CDRV)method was proposed for unbalanced datasets,and a quantitative analysis model of tobacco starch content was developed.CDRV is a sample set partitioning strategy combining k-means clustering,principal component analysis and t-distributed stochastic neighbor embedding.The effects of different pre-processing and sample set partitioning methods on the effectiveness of partial least squares regression models were compared.The models built by the CDRV method showed optimal model results under different preprocessing scenarios,with residual predictive deviations greater than 2.The feasibility of the CDRV was also verified by its application on the wheat and tablet datasets. |