| With the development and popularization of artificial intelligence and other hightech technologies,most multi-value and multi-label data generated and used in daily life.Compared with traditional supervised learning such as single-label learning,multi-label learning is obviously more able to meet the needs of real life.In multi-label learning,each sample usually belongs to multiple categories or is related to multiple labels.The study of correlation between labels is helpful to solve the problems of label space expansion,high storage requirements and high time cost caused by the increase of the number of labels in multi-label classification.It can improve the prediction performance and efficiency of multi-label learning algorithm and make it widely adaptable.Decision tree algorithm is an efficient classical classification algorithm that can classify data and extract relevant classification rules,but the relationship between labels is not fully considered when applied to multi-label learning problems.Based on the above considerations,multi-label decision tree algorithms considering correlation between labels(MLRDT and MLRCT)are proposed in the framework of C4.5 decision tree and CART tree.When splitting attributes are selected in MLRDT and MLRCT algorithms,the correlation between labels obtained from the feature space of multi-label data sets is integrated into the construction of similarity index to measure node samples,and a new consistency and similarity index is proposed.The attribute with the largest similarity difference between nodes before and after splitting is selected as the splitting attribute.Then,on the basis of the above algorithm,the sample label set correlation is added,the definition of similarity index is updated,in which the label similarity of node data set is considered more comprehensively,and the extended algorithms(MLRDT1 and MLRCT1)of the above algorithm are obtained.In addition,when the continuous value attributes are processed,the attribute values are first standardized and then grouped discretely,that is,the attribute values are sorted in ascending order and divided into K groups,and the samples of the attribute values in the Kth interval are divided into corresponding sub-nodes accordingly.Seven data sets are selected to evaluate the classification performance of the algorithm,and the commonly used multi-label algorithm evaluation indexes are selected.Experimental results show that the proposed algorithms MLRDT and MLRCT have achieved good experimental results. |