Font Size: a A A

Improved Decision Tree Model And Its Application In Medical Diagnosis Data Classification

Posted on:2022-07-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q CaiFull Text:PDF
GTID:2504306569466304Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
The advancement of medical diagnosis technology is closely related to the life and health of each of us.With the development of science and technology,many researchers have paid great attention to the application of artificial intelligence technology in the field of medical diagnosis.Among them,cancer is a major disease that is extremely threatening to humans.The prognostic treatment effect of patients has attracted much attention.Generally,the five-year survival rate is used to measure the treatment status of cancer patients after treatment.According to the Lancet Global Health Data Statistics,the five-year survival rate for most cancers is still low,such as lung cancer,liver cancer,gallbladder cancer,pancreatic cancer,etc.,and their five-year survival rate is less than 20%.And because of the particularity of the medical treatment process,there are still problems caused by difficulty in obtaining data and limited access to it.In general,the available data belongs to multi-modal data,including category type data(such as basic information of the research object)and continuous data(such as DR or CT image formed by X-camera scanning inspection).At present,many researches and applications are based on a single data type for modeling,and the data information is not fully utilized.Therefore,how to effectively combine different types of data to establish a unified model is of research significance and exploratory value.This article will combine the research hotspots and data characteristics of the decision tree model in this field,and launch the following work.The establishment of a reliable and effective cancer prognosis model is of great significance to the prognosis of patients.To resolve the problem of unbalanced categories lies in the binary classification of the dataset labeled by the five-year survival rate,we propose DF-SMOTE model that combines the improved SMOTE method and the deep forest method.It compares the risk regression COX model with the early lung cancer prognosis data provided by the Guangdong Provincial People’s Hospital.The classifier that builds the model also uses two single classifiers(support vector machine and decision tree)and the random forest ensemble learning.The experimental results show that the DF-SMOTE model is significantly better than others,followed by the RF-SMOTE,which verifies the effectiveness and superiority of this scheme in dealing with unbalanced categorical datasets.Finally,the effect of different characteristics is analyzed through the decision-making process of the visualized decision tree.The feature importance analysis graph shows the order of feature importance from high to low,which plays an important guiding role in our feature selection.The other work in this article is to combine the two types of data,categorical data and continuous data.Based on the framework of decision tree structure and combined with neural network,learning from deep neural decision forests and soft decision trees,Hybrid Network for Multimodal Data(HN-MD)is proposed.The experiment using the congenital heart dataset provided by Guangdong Provincial People’s Hospital includes categorical data such as basic patient information and DR image continuous data.It is a typical multimodal data.The design of the CNN in the intermediate split node of the HN-MD method uses different classic networks such as VGG16,Inception-V3 and Xception methods for comparative experiments,which are respectively marked as HN-MD-VGG16,HN-MD-Inception and HN-MD-Xception.We use single DR image data and multi-modal data sets to conduct experiments.The single-data experimental results show that the HN-MD series methods are significantly better than the random forest method,and the accuracy of the testing dataset has increased by about 5.22%.On this basis,the categorical data is added to form multi-modal dataset.The multi-modal data experimental results show that the effect of the three methods using multi-modal data is better than that of a single DR image data set and the accuracy of the testing dataset has increased by about1%,which is sufficient to verify the feasibility and effectiveness of the HN-MD method.
Keywords/Search Tags:Machine Learning, Decision Tree, Convolutional Neural Network, Hybrid Network Algorithm, Medical Diagnosis Data
PDF Full Text Request
Related items