Font Size: a A A

Fuzzy Rule-Based Oversampling Technique Research

Posted on:2020-11-07Degree:MasterType:Thesis
Country:ChinaCandidate:G C LiuFull Text:PDF
GTID:2428330602952471Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
With the explosion of data as the basis,with high and new technology such as information technology and computer network development marked,the human society has stepped into a new era of data.How to mine the valuable information in these massive data and categorize them becomes particularly important.Although machine learning and data mining technologies are gradually mature and used to solve complex practical problems,in recent years,challenges still arise such as the imbalanced class distribution and missing data problems.In order to obtain a complete data that is easy to classify,the data resampling techniques come into being.As an important research content of fuzzy set theory,fuzzy classification rules can better represent the distribution characteristics and causal characteristics of data and have important applications in classification problems.Traditional resampling methods make it difficult to show the correlation between attributes,and the new data synthesized cannot be judged to be in a reasonable area.Therefore,this thesis uses fuzzy rules to learn the structural characteristics of training data,so as to distinguish the distribution areas of minority class and majority class data.Furthermore,we present a new resampling method for rebalancing the imbalanced data by adding minority data.The basic idea is to first describe the distribution of minority class data through fuzzy rules,each fuzzy rule corresponds to a fuzzy confidence region,and then synthesize new data in this region.Extensive experiments using 55 real-world imbalanced datasets evaluate the performance of the proposed oversampling technique.The results show that our method statistically outperforms all the algorithms of comparison.Next,this thesis develops a creative fuzzy reasoning-like method for recovering the missing values.The main difference between fuzzy reasoning-like method and traditional fuzzy reasoning method is that the former is used to produce the new numerical attribute values rather than predict the class label.Its main idea is to find the rule that best matches the data containing missing values in the obtained fuzzy rule set,and then apply the rule to fill the missing attribute values.Besides,we still conduct a large number of experiments to verify the performance of the proposed method,and the experimental results illustrate that the model is designed reasonably and effectively.Finally,we focus on the issue of the parameter in the algorithm proposed in this thesis,i.e.,fuzzy partition granularity.Combined with the experiment results analysis and the optimization of genetic algorithm,the proposed algorithm has been shown to be robust,is not dependent too much on the parameter selection,and it achieves a reasonable parameter value.For follow-up studies how to use fuzzy knowledge processing multi-class or multilabel data provides a successful experience.
Keywords/Search Tags:Imbalanced data, Fuzzy rules, Missing data, Synthetic data, Genetic algorithm
PDF Full Text Request
Related items