Font Size: a A A

Classification And Prediction Of Stroke Using Deep Reinforcement Learning Based On Fusion Data Preprocessing

Posted on:2023-12-06Degree:MasterType:Thesis
Country:ChinaCandidate:T T YuanFull Text:PDF
GTID:2544307037496384Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
With the acceleration of the aging process in China,stroke,which has the characteristics of high disability rate and high fatality rate,presents an explosive growth of the disease seriously endangering the national health.Stroke disease is controllable,early screening and intervention can play a good preventive effect.The analysis of stroke screening data can effectively block the incidence risk of the disease by intervening and diagnosing the disease in advance.However,traditional classification algorithms do not take into account the characteristics of stroke screening data such as high dimension and imbalanced data,so it is difficult to get good classification effect.In order to solve the problems existing in classification and prediction of stroke screening data,on the one hand,feature redundancy and data imbalance were solved from the perspective of data preprocessing,on the other hand,a classification and prediction model of stroke based on deep reinforcement learning was established from the perspective of model construction and optimization.The characteristics of the stroke screening dataset are as follows: On the one hand,the dataset presents high dimension and contains redundancy.On the other hand,the number of samples of different categories presents an imbalanced distribution.Firstly,the feature dimensions of the stroke screening dataset are too many,which will interfere with the classification performance.Reducing the feature can not only improve the classification performance,but also ease the computational complexity and reduce the cost.The redundancy noise was removed,the features strongly correlated with the stroke category were selected,and the risk factors that were critical to the risk of stroke were obtained,so that the intervention and treatment of stroke diseases could be carried out at an early stage.Secondly,the imbalance of the stroke screening dataset leads to the classification bias in the majority of categories.However,in the diagnosis of disease,a few samples are the most important.Therefore,how to solve this problem is also the focus of this topic.At present,traditional machine learning classification algorithms are often used for classification of stroke screening data.However,the poor training performance of these models cannot meet the high requirements of disease diagnosis.In order to achieve better classification effect,a deep reinforcement learning classification model with improved loss function was proposed.The specific research content of this topic is as follows:(1)Aiming at the defect of redundant features in the stroke screening dataset,a mixed feature dimension reduction method FS-FE was proposed.Firstly,the Correlation based Feature Selection(CFS)algorithm was improved through Maximal Information Coefficient(MIC).An improved feature selection algorithm MCFS is proposed to compensate for the defect that CFS tends to select more attribute values in feature selection.Then,the selected feature subset is further simplified by PCA feature extraction algorithm to obtain the optimal feature combination.In order to verify the effectiveness of FS-FE feature dimension reduction method,CFS,information gain algorithm and Relief feature selection algorithm were compared with FS-FE method.Four machine learning classification models,Naive Bayes,J48,SVM and KNN,were used to perform experiments on public datasets and stroke screening datasets,respectively.Experimental results show that the FS-FE feature dimension reduction method can achieve better classification effect and is suitable for multiple classifiers.(2)Due to the imbalance of stroke screening data,the MAHAKIL Random and Isolation Forest(MARAIF)sampling technology based on MAHAKIL Random and Isolation Forest(MARAIF)was proposed to prevent such distribution from affecting the risk classification and prediction performance.Firstly,in order to improve the diversity of newly generated samples,random numbers were used to replace the single average method used in MAHAKIL over-sampling technology to generate new samples.In addition,considering that noise samples are easily generated when new samples are synthesized,isolated forest algorithm is used to detect and remove noise samples from the newly synthesized samples.Finally,machine learning classifiers are used for classification.Compared with SMOTE,ADASYN,MAHAKIL oversampling the experimental results show that MARAIF can get better classification performance combined with different classifiers,AUC and F1-measure can increase 25.50% and 11.32% respectively,which verifies the effectiveness of the method.In addition,the experimental results show that MARAIF oversampling technique has some limitations on high imbalance samples.Considering that SMOTE is easy to intensify the in-class imbalance of the dataset by synthesizing new samples,but SMOTE with close minority has more minority information when synthesizing new samples,Combine MARAIF and SMOTE,CMRIS is proposed.Firstly,set some minority samples by MARAIF algorithm,combine the new sample with the original data to form a new imbalance dataset,finally use SMOTE algorithm to get the final balance dataset.The experimental results show that COMPARED with SMOTE,ADASYN,MAHAKIL oversampling,CMRIS is suitable for high imbalance dataset,and get higher AUC and F1-measure on different classifiers,which verifies the effectiveness of the method.(3)In view of the poor effect of the traditional classification algorithm on the stroke screening dataset,in order to achieve efficient diagnosis and prediction of stroke incidence risk,based on Double DQN and Dueling DQN,a New Loss Function Deep Q Network(NL-DQN)was constructed,and the model was optimized from two aspects of optimization algorithm and activation Function.In order to improve the stability and convergence of neural networks and solve the problem that the traditional loss function punishes outliers too much,a more robust loss function is proposed.Finally,we compare the experimental results of Naive Bayes,J48,SVM,KNN and DQN models in general datasets and imbalanced datasets.The results show that the proposed NL-DQN model not only outperforms the existing classifier in general datasets,but also has better compatibility with the oversampling algorithm in imbalanced datasets.
Keywords/Search Tags:Stroke, Feature dimension reduction, Imbalance, Oversampling technique, Deep reinforcement learning, Loss function
PDF Full Text Request
Related items