Font Size: a A A

Research On Feature Selection Algorithm On Imbalanced Data Classification

Posted on:2018-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:L HeFull Text:PDF
GTID:2428330569475198Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The experimental verification of the traditional classification algorithm is mostly carried out on the balanced data set.In the present practical application,the imbalanced data sets are more and more common,such as credit card fraud analysis,automatic diagnosis for patients in the hospital,etc.The traditional classification algorithm has unsatisfactory results on these imbalanced data sets,so the research on the data classification algorithm in the unbalanced data set has great practical application value.Firstly,it introduced the traditional data classification method and the data classification method on imbalanced data set.Then analyzed the concept and the basic process of the feature selection,and discussed some problems faced by traditional feature selection methods on imbalanced data sets.Relief algorithm is a classical algorithm in feature selection,and it is proved that it can achieve better effect on balanced data set in a large number of experimental studies.In order to make the Relief algorithm better to deal with the problem of imbalanced classification,we use the cost-sensitive way to lead a cost factor into the Relief algorithm,so that the Relief algorithm can better deal with the imbalanced data set.Experiments the improved algortihm on the data set,can be found on the imbalanced data set,the improved Relief algorithm can achieve better results than the traditional Relief algorithm.The integrated learning method is a way of integrating a number of different classifiers to achieve better classification performance.Apply the integrated learning method to feature selection in imbalanced data classification,using the sampling method to generate different balanced data sets from the imbalanced data set,using the traditional Relief algorithm on each data set,and finally summarize the results.Then proposed a feature selection algorithm based on integrated learning.Experiments on the dataset can be found that algorithm is superior to traditional Relief algorithms.
Keywords/Search Tags:imbalanced data set, feature selection, Relief algorithms, cost factor, integrated learning
PDF Full Text Request
Related items