Font Size: a A A

Research On Application Of Imbalanced Medical Data Based On Balanced Sampling And Deep Learning

Posted on:2019-08-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y W ZhaoFull Text:PDF
GTID:2404330593951099Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of science and technology,our country is in the high-speed development stage of medical data informatization.With the exponential growth of medical data and the development of medical management system,the access to medical data is becoming more and more convenient.but the effective information hidden in massive medical data has not been fully tapped and effectively utilized.Medical data include physical examination data,electronic medical records,diagnostic imaging,and medical data.At present,medical diagnosis mainly relies on doctors' professional knowledge and rich clinical experiences.It is of a practical significance to how to dig out hidden useful information from medical data,so as to provide assistance for doctor's treatment decision.Aiming at the problem of imbalanced classification in medical data and the problem of disease modeling,this paper uses the knowledge of data mining to establish prediction model for medical data,and provides reference for doctors to diagnose the disease.At the data level,a new imbalanced data processing algorithm called KE-SMOTE is proposed to solve the problem of class imbalance in medical data.For majority class data set,KE-SMOTE uses K-Means repeatedly until the minimum error of clustering is no longer smaller or a specified number of iterations is reached,then we get the results of multiple clustering,finally we use clustering ensemble method to carry out under sampling.For minority class data set,KE-SMOTE uses over sampling method based on smote algorithm.According to combination of the new majority class samples and the new minority class samples,we get a new training data set.Experiments using UCI data sets show that the proposed algorithm has better performance than the traditional class imbalance processing algorithm.At the algorithm level,A deep belief network based on autoencoder called AE-DBN is proposed.AE-DBN uses autoencoder to extract features from the data set,and uses deep belief network to establish the model.By adjusting the number of hidden layer and each layer node,the optimal deep belief network model is constructed.In this paper,the medical data of hyperuricemia provided by hospital were used to carry out the experiment.It was proved that the algorithm proposed in this paper had higher classification accuracy compared with the traditional machine learning algo-rithm.At the same time,according to data analysis of features in dataset and modeling experiments with combination of different features,we got the influence factors of hyperuricemia,which can provide reference for doctors to diagnose hyperuricemia.
Keywords/Search Tags:Medical Data, Imbalanced Dataset, Clustering Ensemble, Autoencoder, Deep Belief Network
PDF Full Text Request
Related items