| At present,the combination of Raman spectroscopy and machine learning has become a research hotspot in various fields.With its rapid development,more and more researchers have used it for biomedical research.This paper introduces the research significance of the combination of confocal Raman spectroscopy,and takes the Raman data of cancer tissue and normal tissue as the research object,analyzes the ratio of the characteristic peaks and intensities of the Raman data as well as the composition structure of each part,analyzes the characteristics of the Raman data,and selects the characteristics And extraction,establishment of data set,combined with machine learning for data classification decision-making,and optimization of parameters.The main research work is as follows:1.The machine learning algorithm for Raman spectrum classification is proposed.Select A549 and met-5a that may appear in the chest water,use WITec spectrometer to measure two kinds of Raman spectrum data,and establish data set.The Raman spectrum data of biological samples are obtained in the range of 600-1800cm-1.In this paper,the position and peak value of the characteristic peak in this range are analyzed,and the change of Raman displacement between the two samples is also analyzed to screen the characteristic peak.2.After the data set is established,two methods are used to analyze the data in the data set: feature analysis based on principal component analysis and feature selection based on multi variable correlation analysis,which uses Pearce correlation coefficient.Experiments with PCA show that the contribution rate of 11 features is 100%,which shows that these 11 features can represent the coverage of all information.When using multi variable correlation analysis,we take each feature vector as the dependent variable,and get the highest accuracy through verification,which also proves that the first 11 principal components achieve the best effect.PCA has achieved good results in clustering.The accuracy is 90.06%,sensitivity is 94.62%,specificity is 85.28% and MCC is 80.39%.The feature analysis of this part provides a good data input for the subsequent classification.3.In the process of data set classification,different classification models are established: threshold classification,SVM,k-means.Before that,this paper compares three methods of data set partition,and selects the cross validation method in the experiment,and sets the number of data set partition subsets as 5.For threshold classification,it is obvious that the non-linear data can’t be effectively separated,although it is also calculated by formula.SVM is an effective classifier,for this algorithm,this paper selects different kernel functions to compare and optimize different parameters of different kernel functions.Finally,the best effect is to determine the parameters of RBF kernel function.K-means only has k as the optional parameter,which is 3,but the final classification result is not as good as SVM.4.The classification of Raman spectrum data is only a preliminary step in clinical screening,but for further diagnosis,it is necessary to accurately locate the samples classified as cancer.In this paper,confocal Raman imaging technology is selected,which can be widely used in the characterization and analysis of biological information components. |