Font Size: a A A

Research On Early Warning Model Of Thalassemia Based On Ensemble Learning

Posted on:2020-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y F LiuFull Text:PDF
GTID:2434330596497520Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
In the southern part of China,Thalassemia is a serious hemoglobin disease and has highly carrying rate,if the medium and heavy Thalassemia have found out of time,not only will cause serious harm to the patients themselves,but also will have a negative impact on the family and society.At present,most short-term treatment methods will cause patients to suffer huge physical and mental torture but there is no radical cure completely,and expensive treatment costs are also not paid by the general family,so the early detection of thalassemia carriers and take corresponding measures,for the effective reduction of the birth of children with Thalassemia is particularly important.The paper mainly studies the risk factors of Thalassemia and constructs an early warning model,the main research results are as follows:Because of the dataset of Thalassemia subject is highly unbalanced,the learning algorithm will have the characteristics of deviating from the actual results,so the data-processing method of mixed sampling is proposed,and the processing results are simulated and verified by decision Tree algorithm,which lays a foundation for the establishment of the early warning model of Thalassemia.Aiming at the high dimension of Thalassemia dataset,the paper puts forward a Boruta Learning Algorithm to select risk factors,compares them with the risk factors selected by statistical methods,selects the consistent factor into the model,independently analyzes the factors that choose to be inconsistent,and finally obtains the input characteristics that are most suitable for model learning.Considering that the oversampling in the mixed sampling method will introduce noise,the oversampling will fitted,the early warning model of Thalassemia based on random forest under mixed sampling is constructed,and the model simulation is verified,and compared with Bayesian,KNN and other algorithms.The results showed that the stochastic forest early warning model in mixed sampling reached 0.95 in the Recall rate and reached 0.93 in the Youden index.Because the assumption premise of the Adaboost algorithm is that the initial weight of the sample is the same,the algorithm has the shortcoming of insensitive classification recognition when dealing with unbalanced data,so the early warningmodel of Thalassemia based on the initial weight reconstruction is constructed,and the equilibrium factor is introduced into the model to overcome the shortcomings of the algorithm,and the experimental verification the recognition rate of the model is8% higher than the Adaboost,which can meet the requirements of primary screening of Thalassemia.
Keywords/Search Tags:Thalassemia, unbalanced data, Decision tree, Ensemble learning, RF
PDF Full Text Request
Related items