| Multi-label learning,which is aiming at the ambiguity in real word,has been studied and paid attention by many scholars since it was proposed.Multi-label learning improving the accuracy of description by expanding the classification result from single-label to multi-label;Label distribution learning converts traditional logical labels into probability distribution which express the description degree of labels more intuitively.Feature selection improved the accuracy of the classifier by reducing the high dimensionality of massive data.However,the traditional feature selection algorithm is computationally complex and cannot process streaming feature data.Based on this,two feature selection algorithms are proposed in this paper to solve the above problems.The main contents are as follows:(1)Aiming at the problem about the tendentious label which causes by the granularity of the feature description in multi-label learning algorithm,many researchers often use the relationship between the features with all the label to obtain some important features and then construct the corresponding feature subspace.But those approaches will always make some features for a label with a strong relevance,but no correlation for the whole label space.Therefore,the accuracy of the classifier will inevitably descend because some features will be unselected by those traditional approaches.Meanwhile,many mutual information approaches of the feature selection often use the traditional entropy method in the multi-label learning algorithms at present.The traditional entropy has a high complexity of computation because it has no nature of the complement.Therefore,a new definition about the rough entropy will be introduced and then the algorithm of the k-Kernel Feature Selection based on Rough Mutual Information(kKFSRMI)was proposed in this paper.At last the experiments show kKFSRMI is effective.(2)Traditional feature selection algorithm cannot process the stream feature data,the redundancy calculation is complicated and the description of the instance is not accurate enough.A Multi-label Distribution Learning Feature Selection with Streaming Data Using Rough Set(FSSRS)was proposed to solve the above problem.Firstly,the online streaming feature selection framework was introduced into multi-label learning.Secondly,the original conditional probability was replaced by the dependency in rough set theory,which made the streaming data feature selection algorithm more efficient and faster than before by only using the information calculation of the data itself.Finally,since each label has a different degree of description for the same instance in real word,to make the description of the instance more accurate,label distribution was used to instead of traditional logical labels.The experimental results show that the proposed algorithm can retain the features with a high correlation with the label space,so that the classification accuracy is improved to a certain extent compared with that without feature selection. |