Font Size: a A A

Research On Filter Feature Selection Algorithm Based On Mutual Information

Posted on:2024-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:A F XieFull Text:PDF
GTID:2568307085464604Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,the data generated in human life increases exponentially.These fast-growing data not only promote the development of technology,but also bring great challenges.Because the high-dimensional data takes up more storage space and computing resources and a large number of irrelevant and redundant information in the data seriously affects the classification performance of the subsequent model.In order to reduce the negative impact of high-dimensional datasets,feature selection has attracted more and more attention.Feature selection is a data preprocessing technology,which uses the given rules to delete irrelevant and redundant features from the original feature space,and selects the optimal features to replace the original feature set.According to the different ways of combining with classifiers,feature selection can be grouped into three types: wrapper,embedded,and filter methods.Compared with the filter method,the wrapper and embedded methods have high computational complexity,strong dependence on classifiers,low universality,and are prone to over-fitting.In contrast,the filter method is separated from the learning algorithms and has strong universality and high computational efficiency.Especially for high-dimensional datasets,the filter method is far more suitable than other methods.This paper focuses on the filter feature selection algorithm,and uses information theory to measure the relationship between features and class labels.Research on filter feature selection algorithms based on mutual information reveals that redundancy among features cannot be eradicated.Therefore,minimizing the amount of redundancy between features becomes the optimization goal of such algorithms.Distinguishing dependency and redundancy between features is an effective way to achieve this goal.However,it is a challenging task to distinguish the redundancy and dependency between features.In recent years,scholars have proposed some solutions,but most of them can not effectively distinguish between dependent features and redundant features.To solve this problem,three different methods have been proposed.To sum up,the main contributions and innovations of this paper are as follows:(1)In order to solve the problem that existing algorithms can not effectively distinguish dependency features from redundant features,a new algorithm called Dynamic Interaction-based Minimum Redundancy Maximum-Relevance(DIMRMR)is proposed.The DIMRMR algorithm redefines the discriminant criteria of feature redundancy and feature dependency based on the feature interaction degree.On the basis of the newly defined discriminant criteria and the feature-relevant complementary item,the dynamic interaction weight is constructed.Then,the DIMRMR algorithm combines the dynamic interaction weight with the criterion function of MRMR,which can more accurately discriminate the dependency and redundancy relationships between features.To verify the performance of DIMRMR,we compare it with seven competitive algorithms on seventeen data sets.Experimental results show that this algorithm can achieve the optimal classification performance on most data sets.(2)In order to reduce the time cost of DIMRMR algorithm and ensure the efficiency of the algorithm,a feature selection algorithm named Redundancy Optimizationbased Feature Selection is proposed.ROFS defines feature-relevant complementary item as new feature relevancy item to measure the correlation between features and the class label,and uses traditional feature relevancy and two types of redundancy(class-dependent redundancy and class-independent redundancy)to build redundancy optimization weight.At the same time,ROFS uses a chain structure to combine new feature relevancy with redundancy optimization weight to measure the importance of each candidate feature.The experimental results show that ROFS not only consumes much less time than other newly proposed algorithms,but also shows strong competitiveness in classification ability.(3)In order to distinguish the dependency and redundancy between features effectively,a method named New Minimal-Redundancy Maximal-Relevance is proposed.NMRMR designs a new feature redundant item based on the MRMR algorithm,which combines class-dependent redundancy and class-independent redundancy,and calculates the redundancy between features precisely according to the positive and negative values of class-dependent redundancy.NMRMR was compared with four competing algorithms on four classifiers and sixteen data sets.The experimental results show that NMRMR obtains the maximum average classification accuracy on four different classifiers.This paper is devoted to the study of dependency and redundancy between features.To solve the problem that the existing feature selection algorithms can not effectively distinguish the dependence and redundancy between features,two different solutions are proposed and have achieved good performance.These studies can separate the dependent features from the redundant features,so as to further reduce the redundancy between features and improve the quality of data.Therefore,it has certain theoretical significance and research value.
Keywords/Search Tags:Feature selection, Filter method, Mutual Information, Feature dependence, Feature redundancy
PDF Full Text Request
Related items