Font Size: a A A

Classification Model-based Knowledge Discovery Process

Posted on:2003-05-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:1116360065461532Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Knowledge Discovery in Databases (KDD) aims at analyzing massive mount of data and extracting meaningful and comprehensible patterns,called knowledge. In recent years,KDD has got domestic and international widespread concern and has been becoming most hot researching realm in the field of information systems and computer science.This thesis is sponsored by National Natural Science Fund Research Project-Research on Knowledge Discovery and Data warehouse. Based on through exploring and analysis on the related literatures,the state-of-the-arts of knowledge and data mining,the main contents and key technologies are generalized and summarized. The development trends,questions,and further tasks are particularly commented on KDD. Based on the UCI Knowledge Discovery in Databases Archive and UCI Machine Learning Archive as experiment data,works developed in this thesis as follows:1. This thesis studies the current KDD model,and suggests a KDD model based data extractor. There are four stages in the KDD model which are data preprocessing,data extracting,data mining and result evaluating. The extractors can extract specific data for some data mining algorithms. So,it can avoid of same queries to the database,and improve data mining efficiencies.2. In the data preprocessing stage,we study the theory and technology of feature selection. An algorithm which combines filter method and wrapper method whom both used in the feature selection is proposed. The algorithm can decrease data dimensions by removing irrelevant features,increase algorithm accuracy and speed up the knowledge discovery process.3. In the data extracting stage ,aiming at interfacing of knowledge discovery algorithms to large database management systems,we present a family of generic ,set-based,primitive operations for Knowledge Discovery in Databases. We show how a number of well-known KDD classification algorithms can all be computed via our generic data extractors. We present SQL_C4. 5for demonstrating how our extractors can support C4.5,a widely-used decision tree system.4. In the data mining stage,we study the theory and method of constructing multivariate decision trees,present a new architecture for constructing multivariate trees. We have evaluated the proposed methodology on several datasets from UCI repository,and compared against the state of the art in decision tree induction. It exhibits consistent advantages both in terms of accuracy and complication.5. In the data mining stage,we also propose a combing algorithm MNN designed to improve the accuracy of the nearest neighbor(NN) classifier based theory and technology about combining multiple classifiers. MNN combines multiple NN classifiers each using only a random subset of features.At last,this dissertation designed a prototype system for the application of KDD in the medical data. Algorithms suggested in this thesis are applied into the system.We did a lot of experiment and evaluation to proposed algorithms. The experiment results are encouraging. The experiments results show that the theory and method presented in this thesis are correct,efficient,valuable and robust in practical application.
Keywords/Search Tags:Knowledge discovery, Data mining, Classification, Decision trees, Naive bayes classifier, Feature selection, multiple model
PDF Full Text Request
Related items