Font Size: a A A

Research On Early Warning Model Of Thalassemia Based On Machine Learning

Posted on:2021-07-10Degree:MasterType:Thesis
Country:ChinaCandidate:W L XuFull Text:PDF
GTID:2514306200453184Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Thalassemia is a serious hemoglobin disease,and its carrying rate is mainly in the south of China.Because there is no radical cure at present,most treatment methods will not only make patients suffer huge physical and mental torture,and the treatment cost is not the average family can afford,so for the carrier of thalassemia and early detection and take corresponding measures,to effectively reduce the birth of children with thalassemia is particularly important.This paper mainly studies the risk factors of Thalassemia and constructs an early warning model.The main research results are as follows:Due to the high imbalance of the Thalassemia data set,the learning algorithm will have the characteristics of deviating from the actual results,so the hybrid sampling data processing method is proposed.The new data generated using the far distance principle,and summarized the number of which new sample points were born according to the results of many times the experiments.Then using the AdaBoost algorithm to process the results of the verification test and verify.Aiming at the data has high dimensionality,discretization and continuity of the thalassemia data set,the entropy weight method and the random forest algorithm used to select risk factors,and the consistent factors selected as the input characteristics to model.Considering that the Oversampling method can introduce the noise data,may cause the fitting.Undersampling may lose important data.Both of them existed shortcomings.The early warning model of Thalassemia has proposed,which based on hybrid sampling AdaBoost.The model simulation is validated,then compared with Logistic Regression,KNN algorithm and SVM,the result shows that the hybrid sampling AdaBoost warning model in the F value reached 53.65% in relation to others algorithms,G value reached 91.55%,the overall evaluation index is superior to the other machine learning algorithms.Because the base classifiers of the AdaBoost algorithm are the decision trees and lack the diversity of base classifiers,a fusion model built on the existing machine learning.And experiments show that,the Recall value of the model are verified,which is not only better than the performance of single machine learning is high,and compared with the mixed sampling AdaBoost model it has improved 3.7 percentage points on Pre and 3.5 percentage points on F value,so it accords with the requirements of the Thalassemia early screening.
Keywords/Search Tags:Thalassemia, Imbalanced data, Ensemble learning, AdaBoost, Fusion model
PDF Full Text Request
Related items