Font Size: a A A

Research On Predictive Models For Imbalanced Medical Data

Posted on:2022-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:H J WangFull Text:PDF
GTID:2504306326996919Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of medical informatization,it is feasible to mine effective information from medical data with the help of machine learning and data mining.This paper starts from the medical data,aiming at the class imbalance problem existing in the medical data,puts forward the corresponding solutions from the data level and algorithm level,reduces the impact of the class imbalance of data on the model,and improves the prediction effect of the model on the few classes that need to be focused on.At the data level,a synthetic oversampling algorithm based on global data distribution is proposed to optimize the selection criteria of a small number of samples used to generate synthetic samples by applying clustering,so as to avoid generating more noise samples in the synthesis process.The algorithm assigns the weight of the number of synthetic samples according to the information representation of the sample and the sparseness of the cluster to solve the problem of in-class imbalance and inter-class imbalance.Using parameters that combine contour coefficients and interactive information to help the K-means algorithm set a reasonable number of clusters for a few and most classes,respectively,to ensure clustering effects.Second,by using clustering information,the generation path of synthetic samples is improved to avoid class overlap.In addition,the algorithm was extensively evaluated in 10 manual data sets and 10 real data sets.The results show that our method is superior to or comparable to some other existing methods in evaluating indicators when using artificial data generated by algorithms.At the algorithm level,a cost-sensitive rejectable classification model is proposed,which uses the cost-sensitive neural network,cost-sensitive decision tree and the particle group optimization cost-sensitive support vector machine as the base classifier.On this basis,an ensemble classifier is established based on the voting strategy and rejection mechanism to realize the classification prediction of the data.With an acceptable rejection rate,the evaluation index of the model has been improved well.
Keywords/Search Tags:Synthesis oversampling, Imbalanced learning, Predictive model, Classification, Cost-sensitive
PDF Full Text Request
Related items