Font Size: a A A

Research On Classification In Clinical Missing Datasets Based On Bayesian Network

Posted on:2010-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y ChenFull Text:PDF
GTID:2178360275997395Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
In realistic,it exist the phenomenon of missing data because of some reasons. The clinical missing data may relate to the state of some features,and they implicate information.If we don't use right methods,wrong inference results will happen. Because of that,how to deal with clinical missing datasets for enhancement accuracy of diagnosis becomes an important research subject.We call the "complete feature" which doesn't contain any missing value,and the "incomplete feature" which on the contrary.Little and Rubin defined three kinds of missing data mechanism:First, MCAR(Missing Completely at Random).The missing data irrelevant to the "complete feature" and "incomplete feature".Second,MAR(Missing at Random). The missing data just depends on the "complete feature".Third,NMAR(Not Missing at Random).The missing data depend on the "incomplete feature" and can't neglect.The methods of diagnosis for missing data are two kinds basically.One way is to repair missing values and then build a model to classify.The methods of repair values are Gibbs Sampling,EM Algorithm,Bound&Collapse method and Gradient descent method beside the simplest approach to replace all missing data by "zero" and to replace missing numeric data by the mean value of the data defined under the same variable.Although they have the respective merit,the zero substitution and mean substitution are ignored the information and can reduce the quality of repair. Others only can deal with the MAR.But the missing data mechanism not necessarily is the MAR.Another way is to classify the missing data directly.Mainly have Naive Bayes Classifier,Bayesian Network,C4.5 Decision tree,Robust Bayesian Estimation(RBE) and so on.Among of them,Bayesian Network Inference algorithm can deal with incomplete datasets that traditional Inference algorithm can't. For the traditional Inference algorithm we must know all possible inputs,if flaw one of them will lead to biased results when build the model.Bayes method can solve this problem,because Bayesian Network reflected are probability relations of data in the whole data field.We can build a precise model even if flaw one variable.RBE can deal with any missing data mechanism,however,it easily lead to biased results or fill in incorrect values.We also considered the Bayesian Network compared the decision tree to express easier in the medicine diagnosis rule,so we choose Bayesian Network to conduct our research.As the development of construction of hospital informationization,the hospital information system accumulated the massive patient material and the medical data. These data contained important information for doctors,hospital supervisors and medical service control section.How to withdraw this information becomes an increasingly prominent demand.Bayesian Network was one of data mining effective tools,and it provided a nature method to express the causes and effects information that we can use it to discover the latent relations in data and described by graph. Bayesian Network is getting more and more widespread in the application of overseas hospital information management.The American scholars utilize the Bayesian Network in the surgery result forecast,nursing research,the validity and reliability assessment of hospital diagnosis and treatment reports and so on.The European scholars utilize it in predicting the treatment result,the assessment of emergency medical services etc.in liver cirrhosis patients.Taiwan scholar utilized the Bayesian Network in the research of people health care medical expense examination automation,as well as the appraisal to atypical pneumonia's diagnosis and to the medical service diagnosis and so on.At present,the hospital information system has put into application in most districts in the interior and the data which the system accumulated also getting more and more,so some scholars utilized the Bayesian Network in the medical diagnosis research.Bayesian Network takes the probability theory as theory basis,probability inference as inference foundation, express and describes the connection and causal relation in data instance by graph theory form.It is composed of two parts:Directed acyclic graph(DAG) and Conditional probability table(CPT).It not only can express the knowledge intuitively and clearly,solve uncertainty of the system and incompleteness and complexity of the data well,but also can automatic update the knowledge rules in the diagnosis process.Bayesian Network can expend knowledge and the reasoning ability of network of its own,besides depends upon the expert knowledge.Bayesian Network as an intelligent processing tool that own the extremely value in the medicine diagnosis.We used two methods to predict medical problems in clinical missing data.The first way:Using Bayesian network modeling based on attribute selection to classify the clinical missing data.The second:Using raw data to repair missing data,and then classify the complete dataset by Bayesian network to validate the effect.The process of above methods brief descript as follow.The first method:First,we create an additional attribute to represent missing or not for each existing attribute that was found to be absent in one or more patients. Second,we use the method which based on the wrapper method and genetic search to select attributes in the above data set.There are two kind of method to select attributes.One is called filter method,and it is the dispersion degree measurable which mainly established in chooses between the attributes.It must before the study starts,filters the attribute collection to have a most superior attribute subset first.The other is called wrapper method,and it is established in using the effective of classify to weigh the result of selecting attributes.It is called wrapper method because the study method is wrapped in the process of select.In this paper,we use the after method in the attribute optimization.At last,we classify the best attribute subset by Bayesian network modeling and validate the effect.Our experiment is carries on the classification to three kind of acute disease's incomplete data sets,and their attribute from more to less.Our experiment has the representation with the different difficulty diagnosis of acute disease.The experiment is under the WEKA3.5.6 environment. The results prove that our method not only considers the value of missing clinical data but also deletes irrelevant or redundant attributes of this data set.The proposed classification is better than that which only uses Bayesian network modeling.The second method:First,by taking advantage of relativity between the attributes implied in the raw dataset,as well as by compiling with the expert experience,we build an structure to reconstruct the missing attributes.We can draw support from the expert knowledge to select the correlative attributes in the datasets, but expert's subjective opinion is possibly unable to find out the relatedness which conceals in them.So we can use MI to calculate the relatedness between each attribute,and then choose the subsets to analyze.Second,we use neural network knowledge to estimate the value of the missing attributes;at last,we apply this new structure in clinical missing data and classify them by Bayesian network to validate the effect.We choose two complete datasets(Statlog(Heart) and Breast cancer) from the UCI Machine Learning Repository to carry on the experiment under the Matlab7.0,and compare their classified effective separately under the different MAR proportion after the repair.Our method can improve the precision of estimation.The domestic and foreign scholars had already done the extensive research regarding the missing data.At present each new method emerges one after another incessantly.But regardless of using which fill method also avoidless the influence of subjective factor in original system,and it is infeasible to complete the entire dataset under the situation of missing value excessively.So we must pay attention to distinguish the essence clearly,and utilize suitable method reasonably to solve the actual problem.That is the key to be.If we use a suitable method,we will get a satisfy result in the accuracy and efficiency of diagnosis.But the clinical dataset's flaw mechanism generally unknown in the reality,therefore,how to find a way to achieve a more effective accurate result is our study in the further.
Keywords/Search Tags:Missing data, Bayesian network, Attribute selection, Neural network
PDF Full Text Request
Related items