Research On Data Feature Selecting And Data Balancing Methods Based On Genetic Algorithm

Posted on:2022-07-21

Degree:Master

Type:Thesis

Country:China

Candidate:X Q Wang

Full Text:PDF

GTID:2518306329488474

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

With the advent of big data,data floods from all directions.In order to obtain useful information,people need to analyze data.However,the real life data is often redundant,if we use them directly,the model’s performance will be poor.Therefore,we should clean the redundant data before putting the data set into classifier.Data has many features,but not each feature has a positive effect.Some redundant features will not only increase the amount of calculation,but also may reduce the classification accuracy.The processing of data features is mainly divided into feature selection and feature extraction.This paper mainly studies feature selection algorithms.When studying the feature selection method,we found that if the unbalanced data set is balanced,the model performance can be improved,so this paper also studies the method of balancing the data set.This paper adopts a multi-dimensional data feature selection method combining genetic algorithm and dragonfly algorithm which names genetic dragonfly algorithm(GDA)to solve the problem of selecting feature subsets.People apply the traditional genetic algorithm to the feature selection process to get a better feature subset,but the traditional method has low accuracy and slow optimization speed.In order to speed up the convergence,this paper embeds the dragonfly algorithm into the genetic algorithm in the crossover and mutation processings.We find the optimal gene position and the worst gene position through the dragonfly algorithm to ensure the optimal gene reserved and the worst gene discarded during crossover.,The genes have the same probability of mutation in the traditional genetic algorithm.The genetic dragonfly algorithm sets different mutation probabilities according to the optimal position and the worst position,so that the optimal gene has a greater probability to be selected,and the worst gene to be discarded.This paper uses five different data sets and five different classifiers to test this method,which proves that the feature selection method proposed in this paper is more effective and robust.This paper also proposes a SMOTE algorithm which based on boundary enhancement and internal clustering(BEIC-SMOTE)to solve the problem of poor classification performance of imbalanced data sets.The traditional SMOTE algorithm randomly generates new samples.The improved SMOTE algorithm enhances the boundary,generates new samples for the minority samples at the boundary.Generating new samples only at the boundary is likely to ignore the internal samples.This paper considers the boundary and interior when generating new samples,which not only ensures that the boundary is clearly depicted,but also ensures that the features of the internal minority samples are enhanced.Experiments have proved that the BEIC-SMOTE method of equalizing data is more effective than other methods.

Keywords/Search Tags:

Machine learning, feature selection, genetic algorithm, dragonfly algorithm, SMOTE

PDF Full Text Request

Related items

1	Research On The Dragonfly Algorithm Based On Enhancing Individuals' Flight Direction And Its Application
2	Research On Feature Selection Method Based On Dragonfly Algorithm And Flower Pollination Algorithm
3	Improved Dragonfly Algorithm Based On Competition Operator And Destruction Operator
4	Prediction And Analysis Of Telecom Customer Churn Warning Model Based On Machine Learning
5	Study On Feature Selection Algorithm Based On Structured Data
6	Application Research Of Spark-based Dragonfly Algorithm In Text Categorization
7	Research On Quantum Evolutionary Computational Methods And Its Application In Feature Selection
8	Research On Employee Turnover Prediction Method Based On Feature Selection And Machine Learnin
9	Research On Text Classification Based On Optimized Feature Selection Algorithm
10	Research On Tensor Feature Selection Based On Genetic Algorithm