Font Size: a A A

Observation Analysis Of Decision Tree Extraction From Artificial Neural Network

Posted on:2012-12-15Degree:MasterType:Thesis
Country:ChinaCandidate:ISAAC KOFI MENSAHFull Text:PDF
GTID:2268330425984169Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The ability of artificial neural networks to learn and generalize complex relationships from a collection of training examples has been established though numerous research studies in recent years. The knowledge acquired by neural networks, however, is considered incomprehensible and not transferable to other knowledge representation schemes such as expert or rule-based system.Two of the commonly used techniques are artificial neural networks and decision trees. Artificial neural networks (ANNs) have empirically been shown to generalize well on several machine learning problems. Reasonably satisfactory answers to the questions like how many examples are needed for the neural network to learn a concept and what is the best neural network architecture for a particular problem domain are available in the form of learning theory, so it is now possible to train neural networks without guesswork. This makes them excellent tool for data mining, where the focus is to learn data relationships from huge databases. However, there are applications like credit approval and medical diagnosis where explaining the reasoning of the neural network is important. The major criticism against neural networks in such domains is that the decision making process of neural networks is difficult to understand. This is because the knowledge in the neural network is stored as real-valued parameters of the network, the knowledge is encoded in distributed fashion and the mapping learnt by the network could be non-linear as well as non-monotonic. One may wonder why neural networks should be used when comprehensibility is an important issue. The reason is that predictive have an appropriate inductive bias for many machine learning domains. The predictive accuracies obtained with neural networks are often significantly higher than those obtained with other learning paradigms, particularly decision trees. Decision trees have been preferred when a good understanding of the decision process is essential such as a medical diagnosis. Decision tree algorithms execute fast, are able to handle a high number of records with a high number of fields with predictable response times, handle both symbolic and numerical data well and are better understood and can easily be translated into if-then-else rules.However there are a few shortcomings of decision tree algorithms:Firstly, Tree induction algorithms are unstable, ie. Addition or deletion of a few samples can make the tree induction algorithm yield a radically different tree. This is because the partitioning features (splits) are chosen based on the sample statistics and even a few samples are capable of changing the sample statistics. The split selection (i.e. selection of the next attribute to the tested) is greedy. Once selected, there is no back tracking in the search. So the tree induction algorithms are subject to all the risks of hill climbing algorithms, mainly that of converging to locally optimal solutionsSecondly, the size of a decision tree (total number of internal nodes and leaves) depends on the complexity of the concept being learnt, sample statistics, noise in the sample and the number of training examples. It is difficult to control the size of the decision trees extracted and sometimes very large trees are generated making comprehensibility difficult. Most decision tree algorithms employ a pruning mechanism to reduce the decision tree. However, pruning may sometimes reduce the generalization accuracy of the tree.One of the most recent and more sophisticated algorithms is the TREPAN algorithm. TREPAN builds decision trees by recursively partitioning the instance space. However, unlike other decision tree algorithms where the amount of training data used to select splitting tests and label leaves decreases with depth of the tree, TREPAN uses membership queries to generate additional training data. For drawing the query instances, TRAP AN uses empirical distributions to model discrete valued features and kernel density estimates to model continuous features. In our research, we seek to develop heuristics for employing Trepan, an algorithm for extracting decision trees from neural networks. Typically, several parameters need to be chosen to obtain a satisfactory performance of the algorithm. The current understating of the various interactions between these is not well understood. By empirically evaluating the performance of the algorithm on a test set of databases chosen from benchmark machine learning and real world problems, several heuristics are proposed to explain and improve the performance of the algorithm. C4.5decision tree induction algorithm is used to analyze the datasets. We then use the data classified as cross-validation and training in neural network models is used together as the training data for the C4.5decision tree algorithm. The C4.5model is then compared to the best TRAP AN model for classification accuracy and comprehensibility. The experimentation is further validated by performance statistic measures. The algorithm is extended to work multi-class regression problems and its ability to comprehend generalized feed forward networks is investigated.Further, the empirical investigation of TRAPAN on these datasets shows the following heuristics (ⅰ) for complex and highly complex datasets, best model accuracy is obtained within a tree size range of number of inputs (ⅱ) TREPAN generalizes better at low min. sample sizes for highly complex datasets having little data. This is because TREPAN generates instances and obtains class labels for those instances from the oracle when there is not enough data, finally (ⅲ) we have also observed that single text TREPAN and TREPAN algorithm perform better than disjunctive TREPAN most at times.
Keywords/Search Tags:Rule extraction, Decision tree, Data mining, Classification, Knowledge discoveryNeural Netwrork, feed forward
PDF Full Text Request
Related items