| At the acquisition stage of tobacco leaf,it can not only improve the enthusiasm of farmers but also guarantee the economic benefits of the cigarette enterprises by the correct and objective tobacco classification.At present stage,there are several problems in the artificial classification like the strong subjectivity and the waste of manpower and material resources.Meanwhile,different experts may put the same piece of leaf tobacco into different grades.Therefore,the intelligent classification which is objective,fast and high accurate is desperately needed.At present,the reasearch on tobacco intelligent classification mostly focused on the aspect of image features and spectra features.The spectra features can reflect the characteristics which are closely associated with the grade of tobacco leaf like the oil content,chroma,identity and maturity better,so we study the tobacco grading based on the spectrum in this paper.Both the classification model and the number of samples of characteristic spectrum are related to the recognition rate and the overall speed in the tobacco intelligent classification system.In order to achieve a high recognition rate and real-time intelligent tobacco grading system,we carried out the following work:1.Collect and pretreate the spectrum of tobacco leaf,and then text the isolated samples.We collected 642(including 13 levels)reflectance spectrums of tobacco leaf by using the model of UV-3600 spectrometer.To reduce the noise caused by baseline drift and the influence on the grading among different characteristics,we normalized the spectrum.There may be some samples which have wrong labeled(isolated samples),so we should choose the sample traning set which was builded by the classificated models.Through the statistical analysis,we choose the appropriate threshold to test the isolated samples of each grade and determine the sample training set which was setted up in the classification model by the method of angle cosine distance,euclidean distance and correlation coefficient respectively in this paper.2.Build the classification model and improve the K neighbor algorithm.We build the classification model of SVM,ELM,KNN and WKNN respectively,and make the recognition rate of the classification model as the fitness function.The ELM and SVM optimal accuracy are 85.75% and 91.02% under the full spectrum.There are two different ways to rank the tobacco grades in the WKNN method: one is that all the training sets have the same weights on each grade,and the weight is the reciprocal of the number of samples;another one is to find the K neighbors first,and then plus a weight that is negative correlation with the distance.We rank the tobacco classification by calculating the sum of the weight of each level for the tobacco left.Combining these two methods,the recognition rate can reach 90.77%.The effect of the classification of the weighted K neighbor is superior to the traditional K neighbor,and the computation complexity is is lower than the SVM and ELM.So we select the weighted K neighbor as the classifier to judge the tobacco grade.3.Preliminary screening of the characteristics based on the clustering.Considering the discrete degree within-class and between classes simultaneously.Constructing an identification function D to identify the feature is good or not,and delete characteristics which is on the right side of inflection point according to the D value.We get the optimal effect on the classification on the sixth inflexions,and also there remaines 326 features in the traning set.Using the method aboved,the accuracy in the traning test increased from 90.77% to 94.59%,meanwhile,the recognition rate increased and the number of features reduced.4.Deep screening of the characteristics.We screen the features further by the particle swarm(PSO),genetic algorithm(GA),and analysis of correlation coefficient(CC)methods.The results show that BPSO have a better effect.By this time,the number of features is reduced from 451 to 143.The time we spent on collecting spectrum can save 68.3%,and the recognition rate has improved from 90.77% to 93.69%,increased by 2.92%. |