| With its advantages of low operation difficulty and fast detection speed,nearinfrared spectroscopy analysis technology has been widely concerned and applied in various fields of detection and analysis.However,in the process of NIR modeling and analysis,there are many objective or subjective factors that affect the accuracy and stability of the prediction results,resulting in poor model adaptability and difficulty in meeting the needs of online inspection and quality monitoring.This thesis comprehensively considers the requirements for robust NIR modeling,and focuses on key technical points such as outlier detection,feature extraction,and model construction,aiming to provide a complete and efficient detection system for rapid detection and quality monitoring of enterprises.The main research contents of this thesis are as follows:(1)Due to the high dimensionality and non-linearity of the near infrared(NIR)spectra data result the difficulty of the outlier measure.This thesis proposed a probability based outlier detection method,which adopted the distribution probability of the spectra data to identify outliers at each wavelength by using of copula function.The negative logarithmic function was also used to quantify the overall variation of the joint distribution for the outliers.This method not only enlarges the difference of the spectra between typical samples and outliers,but also can be adapted to multi-type of outliers.Moreover,the jump degree in statistics was introduced for the automated determination of threshold for the outliers,which avoids the threshold setting problem in empirical way and the misjudgment of the outliers.In order to investigate the effectiveness of the algorithm,the recognition of different cases and types of outliers were applied,and compared with the commonly used PCA-Mahalanobis distance,spectral residual(SR)and leverage methods.The experimental results showed that the probability based outlier detection method effectively improved the performance of outlier identification and calibration for NIR analysis.(2)In order to increase the accuracy and robustness of the calibration model to adapt to detection scenarios in different industries,a feature selection method MIC-SPA(Maximal Information Coefficient-Successive Projections Algorithm)combined with GA-ELM(Genetic Algorithm-Extreme Learning Machine)modeling method was proposed in this thesis.MIC was used to extract the feature set with high correlation with the target variable,which can filter the non-informative variables and noise data in the spectrum.Then,SPA was applied to further eliminate redundant features with the maximum projection value on the orthogonal subspace of the previously selected wavelength,and the features corresponding to the minimum RMSECV(Root Mean Square Error of Cross Validation)were selected as the optimized feature set.While the sample set partitioning based on joint X-Y distance(SPXY)method was also introduced for increasing the diversity of the training data set,Furthermore,GA-ELM was introduced to establish the robust NIR analysis model by using the advantages of neural network for non-linear data processing.In order to investigate the effectiveness of the algorithm,the MIC-SPA was compared with the commonly used feature selection methods including least angle regression(LARS),uninformative variables elimination(UVE),competitive adaptive reweighted sampling(CARS),SPA and MIC-LARS(Maximal Information Coefficient-Least Angel Regression),MIC-UVE(Maximal Information Coefficient-Uninformative Variables Elimination),MIC-CARS(Maximal Information Coefficient-Competitive Adaptive Reweighted Sampling)respectively,the number of selected features,predictive ability and robustness of the model were also evaluated.The results confirmed that the accuracy and robustness of NIRS model can be obtained by combining MIC-SPA and GA-ELM methods.It will valuable for the application of quantitative analysis by NIR spectroscopy. |