Research On Aided Cancer Diagnosis Based On Boosting Integration Rules

Posted on:2024-07-31

Degree:Master

Type:Thesis

Country:China

Candidate:H D Qu

Full Text:PDF

GTID:2544306941994919

Subject:Mathematics

Abstract/Summary:

PDF Full Text Request

Machine learning assisted cancer diagnosis has always been a hot research direction in the medical field.Traditional cancer diagnosis methods usually rely on manual observation and judgment,which is subjective and error-prone.The machine learning algorithm can mine effective features and patterns from a large number of patient data through learning and analysis,and improve the accuracy and efficiency of diagnosis.However,with the explosive growth of medical data,traditional single classifiers often fail to meet the requirements of high accuracy and high robustness,and may have overfitting or underfitting problems in the face of different types of tumors.Therefore,this paper proposes an improved algorithm based on Boosting integration rules to integrate traditional machine learning classifiers,and designs a heterogeneous Boosting integration algorithm combined with the improved algorithm to improve the accuracy and diversity of model-assisted cancer diagnosis.The specific research contents are as follows:Firstly,the feasibility of TCGA breast cancer data was studied by using statistical methods such as Cocharan-Q test.Because the feature dimension of TCGA data set is very high,the traditional machine learning model is difficult to complete the fitting.Therefore,the method of difference analysis and feature selection is used to reduce the dimension of data.After obtaining the low-dimensional gene data,the machine learning model is built to classify and compare the results to select a relatively better parameter adaptive method.Secondly,for the problem of poor classification effect of minority data in machine learning classification results of unbalanced TCGA data,a resampling method is proposed to balance the data,and experiments show that the SMOTE algorithm effectively improves the classification results of minority samples,and the recall rate is high.However,the SMOTE method still has the problem of low accuracy relative to the recall rate due to the need to manually adjust the K value and the unstable quality of the generated minority class data.In this thesis,a K-SMOTE algorithm that can adaptively select the K value is proposed,which effectively improves the classification accuracy.Then,Boosting ensemble algorithm is used to improve the classification effect of machine learning model.For the problem that the exponential loss function is sensitive to outliers and easily affects the generalization performance of the model,this thesis proposes a Huber Boost algorithm based on Huber loss function and integrates K-SMOTE algorithm in its framework,which not only improves the classification accuracy,but also improves the F1-score index.Finally,in order to enhance the universality of the improved model,this thesis designs the HK-SHBoost algorithm to integrate heterogeneous base classifiers.Through classification experiments and universality experiments,it is proved that the improved model algorithm effectively enhances the diversity and universality of the model.

Keywords/Search Tags:

Ensemble learning, Machine learning, Oversampling, Difference analysis

PDF Full Text Request

Related items

1	Ensemble And Machine Learning-based Chemometrics For Metabolomics Data Analysis Associated With Inborn Errors Of Metabolism
2	Research On Machine Learning Algorithm Based On Ensemble Learning
3	Research On Ensemble Learning For Depression Recognition Based On Speech
4	Research And Implementation Of Neuropsychological Test Analysis And HIV-associated Dementia Degree Analysis Method Based On Machine Learning
5	Research On Key Technologies Of Arrhythmia Recognition Based On Machine Learning
6	The Research And Implementation Of Dermatoscope Image Classification Algorithm Based On Ensemble Learning
7	Research On Several Machine Learning Algorithms And Application In Fetal Heart Monitoring
8	Research And Implementation Of Ensemble Learning Methods In Cytotoxicity Prediction
9	Research On FMRI Data Classification Based On Independent Component Analysis And Ensemble Learning
10	Research On Bladder Tumor Sensing Technology Based On Meta Learning And Ensemble Learning