Font Size: a A A

Research On Deep Learning Modeling Method Of Near Infrared Spectroscopy For Drug Supervision

Posted on:2021-05-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:L Q LiFull Text:PDF
GTID:1364330605481240Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Counterfeit medicine is a difficult problem faced by countries all over the world.There were 4,405 counterfeit medicine incidents reported globally in 2018,with 102%growth in the past five years.From the "Eleventh Five-Year Plan" to the "Thirteenth Five-Year Plan",China’s national drug safety plan em-phasizes that we should crack down on illegal activities such as manufacturing and selling fake and substandard drugs.Near infrared spectroscopy(NIRS)analysis technology has the advantages of convenience,high efficiency,accu-racy,low cost,on-site detection,no damage to samples,no consumption of chemical reagents and no pollution to the environment.Since 2004,it has been applied to the national vehicle mounted near infrared drug rapid analysis system by National Institutes for Food and Drug Control and is equipped with more than 400 drug inspection vehicles in 363 prefectures and cities across the country.It uses qualitative methods to judge the authenticity of drugs,determines whether the names of drugs and their labels are consistent,and determine the content of key index components of drugs through quantitative method,so as to quickly test the quality of drugs or judge whether drugs are products of specific enterprises.At present,the system guarantees the safety of medicine in China,saves massive testing costs,and accumulates a large amount of spectral data obtained by rapid on-site testing and corresponding laboratory analysis and verification data.However,the following problems still restrict the large-scale,in-depth application and promotion of NIRS analysis technology in the field of drug quality supervision:1)NIRS analysis technology,as an indirect measurement method,can’t directly analyze the content or type of the tested sample,it depends on chemometrics or machine learning methods,and its application effect is subject to modeling methods and model performance;2)When identifying a drug,it is further recognized that the manufacturer is beneficial for the traceability of drug quality,however,the same variety of drugs from different manufacturers NIRS has smaller difference,and there are many kinds of medicines and manufacturers across the country.It is necessary to collect a large number of samples and establish a large number of independent identification models.The accuracy of the classification algorithm and the cost of modeling are very high.At present,there is no application for multi varieties.3)The establishment of accurate quantitative models usually requires the use of wet chemical methods to determine the exact content of specific compo-nents of a large number of samples as a reference value,which is costly and time-consuming,and urgently requires an accurate and stable general modeling method that can significantly reduce the demand of sample reference value de-termination;4)NIRS analysis generally has the pain point that the model can’t be applied across models and platforms,which is not conducive to large-scale application and promotion of domestic near-infrared spectrometers.The clas-sical model transfer method has less research on the model transfer between different varieties and different manufacturers’ instruments,and the effect is not good enough.A large amount of NIRS and test data accumulated in the early stage can’t be applied to the modeling process of new instruments or new varieties,and it can’t save the modeling cost and improve the model prediction accuracy.Based on these issues,this paper focuses on the three key issues of qualitative analysis,quantitative analysis and model transfer of NIRS modeling in drug supervision applications.It conducts research from three aspects of classification,regression and transfer learning,analyzes and summarizes the classic NIRS modeling methods,and further proposes a variety of novel and effective modeling methods:(1)A new method of NIRS classification based on regularized super-vised dictionary learning is proposed.In the process of drug supervision,the two-class discrimination method of genuine and fake drugs can’t obtain the information of fake drug manufacturers and be used for traceability;the method of multi-manufacturer classification of the same variety is not obvious because of the overlapping of characteristic peaks of NIRS active ingredients of the same variety,while the spectrum of drugs of the same variety and manufacturer is different within the category due to different batches,measuring instruments or environments,resulting in the classification to be difficult and the prediction accuracy is not high.In order to further improve the classification accuracy of NIRS for drugs of the same variety and multiple manufacturers,based on the sparse representation classification(SRC)with high classification accuracy,this paper proposes a new sparse classification mechanism by using the advantages of supervised dictionary learning method that can increase the differences be-tween classes,and designs the constraint term and coefficient incoherence term to solve the intra-class differences that are not handled well by the supervised dictionary.Through these two regular terms,the reconstruction error of the coding coefficient and the correlation between similar samples improves the linear separability of data and improves the accuracy of the NIRS prediction model.The method proposed in this paper can be used to classify drugs from different manufacturers of the same variety.The accuracy of classification is 2.26%~6.52%higher than that of SRC,SVM and LC-KSVD.The accuracy of this method is 1.0%~10.7%higher than other methods,indicating that the method proposed in this paper has certain universality for NIRS classification.(2)A fine-grained classification method of multi-variety and multi-manufa-cturer drugs based on Convolutional neural network(CNN)and NIRS is pro-posed in this paper.There are many varieties and manufacturers of drugs in China.There are more than 7000 drug manufacturing enterprises,and thou-sands of common varieties.If the varieties and manufacturers of drugs are identified at the same time,it is necessary to establish a large number of clas-sification models;with the increase of the number of categories,the classical multi-class methods greatly reduce the accuracy of classification,which are not suitable for drug classification of different varieties and manufacturers.CNN can perform end-to-end learning and feature extraction,with strong modeling ability.2D CNN has achieved great success in image classification and other fields.However,after converting one-dimensional NIRS into two-dimensional data,the existing two-dimensional CNN model is used for analysis,which has the problems of mechanical application and large calculation cost.In this pa-per,a one-dimensional CNN-based model for spectral classification is proposed,which can effectively reduce the influence of NIRS difference caused by raw materials,measuring environment and measuring instruments,so as to have a high accuracy of fine classification ability for NIRS of multi-varieties and multi-manufacturers.According to the NIRS of the unknown drug,its vari-ety and manufacturer are identified,which is convenient for traceability of the counterfeit drug and governance from the source.Through 18 classification ex-periments on 18 manufacturers of two kinds of drugs,when 70%of the samples are used as training sets,the classification accuracy of CNN is 99.37±0.45%,which is 4.04%~20.83%higher than that of SVM,BP,Autoencoder(AE)and Extreme Learning Machine(ELM),which shows that the method proposed in this paper has higher classification accuracy,good robustness and scalability,and is applicable to multiple varieties and manufacturers.The task of drug identification can also be applied to NIRS data analysis in other fields,which also lays the foundation for further deep transfer learning.(3)A NIRS regression method based on CNN-SVR is proposed.Multiple linear regression(MLR),partial least squares(PLS)and other commonly used linear regression methods depend on the selection of appropriate preprocessing methods according to experience.The nature of linear model determines its prediction error level,and there are still problems that the existing model can’t adapt to other instruments.Although deep learning has been proved to have good feature extraction capabilities,the features extracted from NIRS by CNN can be directly applied to MLR to achieve end-to-end analysis,but the model is only applicable to a single instrument of the same manufacturer,and there are certain requirements for the number of training sets,and the generalization ability and robustness of the model have not been verified,and it is not possible to build a robust response that can be applied to multiple instruments regression model.In this paper,a CNN-SVR modeling method is proposed,which provides the end-to-end automatic feature extraction ability by CNN,and small sample learning ability by SVR.In CNN network,SVR is applied to the output layer and constrains the network training process.Through L2 regularization,the excessive weight in the network is punished,and through ε-insensitive loss,the algorithm has sample sparsity.When only 15%(96)samples from IDRC 2002 dataset were used for model training,RMSEP of CNN-SVR model is RMSEP=3.018、R2=0.969,and RMSEP of CNN and PLS and SVR was 11%and 30%and 20%higher than that of CNN-SVR.The experimental results show that the proposed CNN-SVR algorithm is not sensitive to the super parameters in the network,and can train the model with little prediction error in the case of small samples.It can also extract the spectral characteristics of the sample itself,weaken the differences between the spectrometers,and directly predict the spectrum of another instrument of the same manufacturer through the model established by one instrument,with the maximum decision coefficient of R2=0.979.In general,CNN-SVR has better prediction accuracy,robustness and scalability,and can realize end-to-end quantitative analysis.(4)A method of NIRS modeling based on transfer learning is proposed.The traditional methods of drug identification need to collect a large number of samples for modeling.The cost of sample collection and modeling is high.The large amount of NIRS and test data accumulated in the early stage can’t be well applied to the modeling process of new instruments or new varieties.In addition,the differences of measuring instruments,environment and raw and auxiliary materials affect the spectrum of the sample,which may lead to the failure of the established model.Although some classical model transfer methods can well solve the differences between different instruments of the same model of the same manufacturer,the effect of model transfer between instruments of different manufacturers is not ideal.CNN can extract data features layer by layer from shallow to deep,and the shallow features of NIRS of different manufacturers’ instruments and different kinds of drugs are similar.Therefore,it can make full use of the shallow information of models trained with plenty of labeled data(source domain),and transfer it to a small number of sample modeling fields(target domain).In this paper,a NIRS modeling method based on transfer learning is proposed.By sharing the shallow convolution layer parameters in the model,a small number of marked samples in the target domain are used to retrain the parameters of the full connection layer network.First,the classification models of drug varieties are migrated,and the classification models of existing varieties are migrated to new varieties.When 30%training sets of drugs in the target domain are used,the classification accuracy of the migration learning model is 2.49%~33.55%higher than that of the remodeling with CNN,SVM,BP,AE and ALM.Then the transfer of regression model between instruments is realized.In the Experiment of the same manufacturer and the same model(IDRC 2002 dataset),the minimum RMSEP is 2.501,and other model transfer methods are 8%~84%larger than the RMSEP of migration learning;in the experiment of different manufacturers and instruments(IDRC 2002 dataset),the minimum RMSEP is 2.501,and the other model transfer methods are 8%~84%larger than the RMSEP of migration learning(IDRC 2016 dataset),the minimum RMSEP=0.163,and other model transfer methods are 51%~305%larger than the RMSEP of migration learning.The results show that the method can train a good model with less training set,greatly reduce the dependence on the NIRS data,and successfully solve the problem of model failure caused by the change of measuring instruments and measuring environment.When the training set in the target domain is increased,the model prediction ability of using the training set in the target domain for migration learning is better than that of re modeling.In summary,the method proposed in this paper solves the significant prob-lems,such as multi-class fine-grained classification,high-precision regression,model transfer and so on when NIRS is applied to drug supervision.At the same time,it also verifies that the method proposed in this paper can be applied to NIRS modeling analysis in other fields with strong universality.The research content of this paper is expected to solve the common problems of NIRS,in-frared,Raman and other molecular spectrum modeling,provide many valuable clues for researchers in related fields,and lay a foundation for future research work.
Keywords/Search Tags:Near-infrared spectroscopy, Deep learning, Convolutional neural network, CNN-SVR, Transfer learning
PDF Full Text Request
Related items