Butterfly classification and identification is of great significance in the fields of pest control,foreign substance invasion monitoring and insect taxonomy.Traditional morphological classification methods are time-consuming and labor-consuming,and automatic insect recognition based on computer technology has the advantages of accuracy,speed and low cost.Feature selection is the key to computer automatic recognition.The feature selection algorithm based on decision tree(Decision Tree,DT)has the advantages of high performance,few parameters and good explanation,so it is an important feature selection algorithm.But its robustness is insufficient,in particular,too many branches of the decision tree can easily lead to low generalization performance.Searching for the optimal split point(discretization)of logarithmic attributes and pruning are the key techniques to ensure the performance of decision tree algorithm.In this study,an adaptive multi-branch decision tree(CHI-MIC Adaptive Multi-branch Decision Tree,CMDT),is proposed,in which the maximum information coefficient(Maximum Information Coefficient,MIC)theory is introduced to search the optimal split points of numerical attributes,and the chi-square independence test is used to prune the branches of the decision tree.On the one hand,CMDT can automatically optimize the split points of numerical attributes and construct binary or multi-tree,which can solve the information loss caused by single binary tree;on the other hand,automatic pruning of branches through chi-square test can effectively control the over-fitting problem caused by multi-tree.Verified by 12 multi-classification data sets,the performance of CMDT is significantly better than that of the reference algorithm,and has good stability for unbalanced data.Furthermore,the CMDT algorithm is used as the feature selection algorithm,and the SVM is used to build the prediction model,which is used in the butterfly specimen data sets of Phoenix butterfly and pink butterfly.The prediction results show that the prediction results are significantly higher than the reference model,and the running speed is also greatly improved compared with the reference model.The results show that the new method has a good application prospect in the field of insect identification. |