Classification Techniques For Imbalanced Data And Applications In Intelligent Medical Decision Support

Posted on:2023-06-26

Degree:Doctor

Type:Dissertation

Country:China

Candidate:J C Wu

Full Text:PDF

GTID:1524307319493384

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

Data imbalance is prevalent in medical datasets.Standard classifiers in machine learning assume classification based on a balanced class distribution,so they perform poorly in imbalanced data classification tasks.In the medical field,misclassification of minority class samples often causes more damage than misclassification of majority class samples,so the problem of imbalanced data classification is particularly important in the medical field.In this dissertation,we propose new imbalanced data processing methods from three different perspectives:combination of resampling and single classifier,combination of resampling and ensemble learning,and cost-sensitive learning without using resampling,and apply them to medical-aided decision making to improve decision support and provide theoretical and technical support to medicalaided decision making for imbalanced data.The main work and innovation points are as follows.First,in terms of the hybrid approach of resampling and single classifier,an improved SMOTE with adaptive SVM hybrid technique for imbalanced data classification is proposed.In the preprocessing stage,for the problem that SMOTE cannot identify data noise,a noise filter based on ensemble learning is proposed to clean up the noise samples,which effectively reduces the risk of incorrectly setting the noise threshold and improves the sampling effect.In the classification stage,for the problem that the classification performance of SVM is easily affected by parameter settings,an adaptive SVM optimized based on the fuzzy self-tuning particle swarm algorithm is proposed.Finally,the hybrid method is applied to the survival prediction after lung cancer surgery,and the prediction accuracy is improved.Second,in terms of the combined approach of resampling and ensemble learning,a hybrid sampling and improved dynamic ensemble selection combination of imbalanced data classification method is proposed.First,SMOTE-ENN is used to preprocess the data to balance the data distribution and clean up the noise samples.Then,a candidate classifier generation method with a mixture of multiple clustering and Bagging is proposed to address the problem of insufficient local ability and diversity of candidate classifiers generated in existing dynamic ensemble selection techniques,which effectively improves the local ability and diversity of candidate classifiers.Finally,the combined method is applied to detect COVID-19 from routine blood data,which improved the detection accuracy and provided a new methodological idea for computer-aided diagnosis of COVID-19.Third,in terms of cost-sensitive learning techniques without using resampling preprocessing,a self-organizing cost-sensitive RBF neural network is proposed for the problem that misclassification cost is not easy to set and standard RBF neural networks do not have cost sensitivity.The method adopts the imbalance rate as the misclassification value for minority samples,and optimizes the structure and parameters of the RBF neural network using GA and improved PSO,respectively,to achieve self-organized optimization of cost-sensitive RBF neural networks under dynamic structure.Finally,the proposed algorithm is validated on several medical datasets,such as breast cancer and diabetes,to improve the performance of paramedical diagnosis.

Keywords/Search Tags:

Imbalanced data, Resampling, Dynamic ensemble selection, Cost-sensitive learning, Intelligent medical decision making

PDF Full Text Request

Related items

1	Research On Prediction Algorithm Of Thrombosis Risk Based On Imbalanced Data
2	Application Study Of Data Mining In Intelligent Identification Of Metabolic Syndrome In Physical Examination Population
3	Statistical Analysis Of The Safety And Effectiveness Of A Chinese Medicine Injection Based On Machine Learning
4	Research On Medical Intelligent Diagnosis And Decision Support Based On Imbalanced Data
5	Research On Feature Selection And Classification For Medical Imbalanced Data
6	Research On Predictive Models For Imbalanced Medical Data
7	Research On Application Of Imbalanced Medical Data Based On Balanced Sampling And Deep Learning
8	Research On Intelligent Diagnosis And Decision Support Of Pregnancy-induced Hypertension Based On Unbalanced Dat
9	Research On Medical And Health Decision Making Support For Multimodal Data
10	Research And Implementation Of Key Technologies Of Disease Auxiliary Diagnosis Based On Ensemble Learning