Font Size: a A A

Research And Application Of Data Mining In Studying Medical Data

Posted on:2008-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:J YinFull Text:PDF
GTID:2178360215990935Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data mining technique had been used in business for years, which is being key technique in e-commerce. As its superiority in exploring information material, data mining has being generalized to insurance, medical treatment, manufacturing and telecom and so on.In recent 7 years, state army No.1 hospital information system (HIS) has been used in nearly 500 hospitals of army, arming police and regions. While time passing, data were sent to HIS database from different approach, hundreds of thousands of piece one day. How to make"information"into"knowledge"using those abundant medical data? It is a proper way to analyses those data by the help of data mining. This thesis chooses personal and cost information of patients with coronary heart disease in recent three years in Xinqiao hospital, to set up class model of medical cost.Because of some reasons, there are some data with value missing. For better efficiency and accuracy while data mining, it is needed to impute those missing data by the help of data imputation technique. Based on the comparison of popular missing data imputation approaches, this thesis validates multiple imputation having good performance in imputation. So multiple imputation was used for handling data with missing value.This thesis introduces several data mining algorithms. As it is a concluding algorithm based on examples, and it concludes class rules from a series of unordered and ruleless examples, decision tree was chosen to do data mining in medical information as core data mining algorithm, after comparing those algorithms'applicability.Choosing the splitting attribute is very important to decision tree's performance in the decision tree's building process. On that aspect, this thesis firstly gives out partition to examples by analyzing conditional attribute, and gets agree degree for right partitions. The attribute, which contributes most to make right decision, could de found by agree degree. An algorithm for building decision tree, which based on agree degree is extended.Agree degree decision tree adopts threshold pre-pruning. When the example numbers in the leaf node is less than a fixed value, the smallest branch with this node will be cut, only leaving the ex-leaf node to be new leaf node. Adopting threshold pre-pruning to do pre-pruning may lose some rules only existing in small examples. However, the algorithm is comparatively simple and high efficient, and there is no need to build the whole decision tree.Comparing those trees building by CHAID, CART and QUEST, decision tree constructed using agree degree algorithm has moderate tree scale, with higher classifying and predicting precision.
Keywords/Search Tags:Data mining, Missing value handling, Decision tree, Agree Degree
PDF Full Text Request
Related items