Font Size: a A A

Studies On The Application Of Machine Learning In Pharmaceutical Process Analysis

Posted on:2021-02-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:X YanFull Text:PDF
GTID:1361330629982387Subject:Drug Analysis
Abstract/Summary:PDF Full Text Request
In the pharmaceutical process of complex systems,it is usually necessary to monitor and control multiple critical process parameters or critical quality attributes to ensure the normal state of the manufacturing process and product quality.It is of great significance to study how to apply process analytical technology(PAT)in monitoring and controlling the pharmaceutical process of complex systems.However,traditional chemometric methods and related workflows are still mainly used for developing PAT methods.As one of the fastest growing disciplines in the field of artificial intelligence,machine learning has made breakthroughs in processing different types of complex data.Spectral data is also a kind of complex data,and in theory,multiple “intelligent” algorithms in machine learning can improve the workflow for spectral data processing and the performance of spectral analytical models.However,the research and application of machine learning methods in PAT are still limited.In this work,machine learning methods and concepts have been introduced into the workflow of spectral quantitative analysis in PAT for pharmaceutical complex systems.With the help of multiple machine learning methods,including hierarchical clustering analysis(HCA),convolutional neural network(CNN),k-nearest neighbor(kNN)and just-in-time learning(JITL),as well as ensemble learning,studies on the application of machine learning in PAT for pharmaceutical complex systems have been conducted,in terms of sample selection,automatic spectral preprocessing,spectral feature visualization,model updating and multi-model fusion.The main contents and achievements of this dissertation are summarized as followed:1.Three sample selection or reconstruction methods based on HCA were proposed.The hydrolysis process of Cornu Caprae Hircus(goat horn,GH)was taken as a case study,and on-line Raman spectroscopy was used as a PAT tool.The three proposed methods based on HCA were used to construct representative calibration sets by extracting or reconstructing samples from the pool of samples,and Raman-based partial least squares(PLS)quantitative calibration models were developed for hydrolysis process monitoring.Before sample selection or reconstruction,design of experiments-based spectral preprocessing method selection and optimal spectral co-addition number determination were conducted to improve model prediction performance.In the step of sample selection or reconstruction,parameters for HCA were optimized and different numbers of clusters were tested.Traditional sample selection methods were also applied for comparison.The three proposed HCA-based sample selection or reconstruction methods,are expected to extract representative calibration sets from a large pool of samples,and improve model performance with fewer samples used in model calibration.2.CNN modeling methods for spectral analysis and automatic spectral preprocessing functions in CNN modeling were studied.The lab-scale GH hydrolysis process and commercial-scale chromatographic elution process of Notoginseng Radix et Rhizoma(Sanqi)were taken as case studies,and on-line Raman spectroscopy and in-line near-infrared(NIR)spectroscopy were used as PAT tools,respectively.CNN model structures were designed and CNN quantitative calibration models were developed for different analytes.For CNN modeling,raw Raman spectra were used as inputs,and for PLS modeling,spectral preprocessing methods were optimized.Model performance of the CNN and PLS models were compared,and results demonstrate spectral preprocessing may be not necessary in CNN modeling.The CNN modeling method is expected to improve the traditional workflow for spectral data processing.Moreover,this study attempts to open the “black box” of CNN models.It is found that CNN models can automatically learn different spectral transformation methods to extract different spectral features.These spectral transformations have similar effect with some traditional spectral preprocessing methods,and can be regarded as the automatic spectral preprocessing methods of CNN models.The CNN modeling method has unique advantages in spectral analysis.3.Spectral feature visualization methods for CNN quantitative calibration models were proposed.The ternary amino acid system was taken as a case study,and Raman spectroscopy was used as an analytical tool.Two CNN model structures were designed and used to develop CNN quantitative calibration models for the three amino acids.Methods of spectral feature visualization for CNN models were tested.Based on the concept of “Class Activation Mapping”,the proposed feature visualization method has the feasibility of spectral feature visualization.The proposed method is expected to increase the interpretability of CNN models and understand the working mechanism of the CNN modeling method used in spectral analysis.4.A model updating strategy based on kNN and JITL was studied.The commercialscale chromatographic elution process of Sanqi was taken as a case study,and in-line NIR spectroscopy was used as a PAT tool.A model updating strategy based on k NN and JITL was applied,to show the potential of the strategy in model developing and updating.Developing efficient model updating strategies is expected to reduce the cost of model maintenance,and ensure model performance can continuously meet the needs of pharmaceutical process monitoring.5.Ensemble learning methods for multi-model fusion were studied.The GH hydrolysis process was taken as a case study,and on-line Raman spectroscopy and offline NIR spectroscopy were used as PAT tools.With the Raman spectral data and the NIR spectral data,PLS and CNN algorithms were both used for model development,and thus four individual learners were obtained.Different combination strategies in ensemble learning were used to integrate individual learners.It is found that the ensemble learning methods have the potential to improve model prediction performance.Ensemble learning methods are expected to supply a basis for integrating models developed with different PAT tools or modeling algorithms.
Keywords/Search Tags:Machine learning, Process analytical technology, Spectroscopic technique, Quantitative calibration model, Convolutional neural network, Sample selection, Feature visualization, Model updating
PDF Full Text Request
Related items