Font Size: a A A

Research On Multi-label Feature Selection Algorithm For Dynamic Environment

Posted on:2021-06-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:J H LiuFull Text:PDF
GTID:1488306020456994Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
Feature selection for multi-label learning is one of the key techniques of data preprocessing in data mining.With the rapid development of internet technology,in many applications,the dimensionality of multi-label data is extremely high,in millions,and keeps growing in an online manner.The dynamic nature of multi-label data is that the feature space or the label space is unknown in advance and the features or the labels arrive one by one over time.Therefore,the dynamic of feature space or label space may bring many new problems and challenges to the static multi-label feature selection algorithms.To explore the multi-label feature selection algorithm with dynamic feature space or dynamic label space,we employ the concept of streaming features to model high yet dynamic feature space and adopt the concept of streaming labels to model dynamic label space.Our work focuses on studying multi-label feature selection for dynamic environment and our main work and innovations are as follows:1.Since the feature space of multi-label always exhibits dynamic and high dimensionality,we propose a novel online multi-label feature selection with dynamic streaming features.Under the framework of neighborhood rough set,our proposed method first generalizes classical neighborhood rough set model to fit multi-label learning and gives a kind of fast calculation method of positive region.Then,we analyze feature relevance and feature redundancy by employing the dependency function,and design a bound on pairwise correlations between features under label set to filter out redundant features.Extensive experiments on different types of data sets demonstrate that the proposed method achieves better compactness and higher prediction performance than several popular multi-label feature selection algorithms.2.Most existing multi-label dynamic streaming feature selection methods based on neighborhood rough set generally ignore the integrality of label set,and different neighborhood granularity setting could lead to the instability of feature selection algorithm.To address this problem,we propose a novel adaptive neighborhood granulation and rough approximation based dynamic streaming feature selection for multi-label learning to reduce the dimension of feature space.Our proposed method first designs adaptive neighborhood granulation strategy by using the density information of similar instances to solve the problem of granularity selection,and then constructs rough approximation mechanism of multi-label data.Moreover,we propose a framework for dynamic multi-label streaming feature selection based on significance analysis,which consists of two-phase:significance selection and subset update.Experimental results show that the proposed method is both better than some static multi-label feature selection algorithms and several state-of-the-art dynamic multi-label streaming feature selection algorithms.3.Most existing multi-label dynamic streaming feature selection methods are incapable of considering intrinsic group structure of features.To address this problem,we develop a novel online multi-label group feature selection with dynamic streaming features.Our proposed method consists of two-phase:group selection and inter-group selection.In the group selection,we design a criterion based on mutual information to select feature group which is important to label set.In the inter-group selection,we consider feature interaction and feature redundancy by employing interaction weight to select an optimal feature subset.This two-phase procedure continues until there are no more features arriving.Extensive experiments show that the proposed method is able to yield significant gains as compared with other well-established multi-label feature selection methods based on evaluating individual feature.4.Most existing multi-label feature selection algorithms mainly assume that the labels of the training data are obtained before learning starts.However,in real-world applications,the available labels usually arrive one by one over time.To address this problem,we present a novel multi-label feature selection with dynamic streaming labels by learning label-specific features to select a set of the most relevant and discriminative features.Our proposed method consists of two-phase:label-specific features learning and label-specific features fusion.In label-specific features learning,we select label-specific features for each newly arrived label by designing inter-class discrimination and intra-class neighbor recognition.In label-specific features fusion,a feature conversion is created to fuse the generated label-specific feature sets.The proposed algorithm provides a new processing pipeline for multi-label feature selection with streaming labels.The validity of the proposed method is verified by a large number of experiments on multiple benchmark data sets.
Keywords/Search Tags:Feature Selection, Streaming Features, Streaming Labels, Multi-label Learning, Neighborhood Rough Set
PDF Full Text Request
Related items