Font Size: a A A

Application Of Decision Tree Classification Methods In Medical Data

Posted on:2021-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z X WangFull Text:PDF
GTID:2504306305953709Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Our human society has entered the era of big data,and it has become a general consensus to use data analysis methods to study problems.With the continuous development of medical data,the analysis and modeling of medical data has a wide range of application prospects.Disease diagnosis related grouping scheme(also known as DRGs grouping)is an internationally recognized scientific and reasonable grouping method which can promote the rational distribution of medical resources and reduce the burden on patients.However,China lacks a sound DRGs grouping theory system,so we need to develop a suitable method for grouping patient data in combination with our actual national conditions and the specific situation of medical data.Based on these information,this paper uses data mining methods to fit medical data using decision tree models to achieve the purpose of grouping medical data.Firstly,this paper introduces the basic theoretical knowledge of decision tree models.The paper also discusses five main decision tree generation algorithms which names ID3,C4.5,CART,CHAID and E-CHAID respectively and their characteristics.Then,we show the algorithms for pruning decision trees and use an example to show the entire process of decision tree model generation visually.Next,the paper introduces the source of the data,the scale of the data and the process of preprocessing the data.Data preprocessing mainly includes the screening,cleaning and integration of variables in the data set.Through data preprocessing,we can obtain the variable index that has an important impact on the total hospitalization cost of the patient,and to make sure all variables in the data are numerical variables without any missing values.Therefore,the preprocessed data can be directly used for the fitting of the decision tree models,and further preparation is made for subsequent research.Then,the paper uses three different decision tree algorithms which names CHAID,E-CHAID and CART to fit the preprocessed data,and the grouping results of the obtained models are presented in the form of pictures and tables.Finally,the paper uses statistical methods to test the grouping effect of the decision tree models and compares them in various aspects.The result shows that the cross-validation results of the models obtained by the three algorithms on the training data set are similar,and the total squared loss on the test set is also not significantly different.Since the model built by CHAID algorithm is more complicated,we suggest that the models based on E-CHAID or CART algorithm are better.Both models divide patient data into 9 groups.In general,the paper put forward some new opinions and insights on the DRGs grouping of medical data.Also,a complete research plan was designed using data mining methods,and a variety of statistical analysis software were used for practice.The research results are comprehensive and innovative,and can provide a reference for more general medical data processing and analysis processes.
Keywords/Search Tags:data mining, medical data, DRGs grouping models, decision tree methods, total cost of hospitalization
PDF Full Text Request
Related items