Font Size: a A A

Methodology And Applications Of Classification Knowledge Discovery Based On Rough Set

Posted on:2006-05-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:J H ManFull Text:PDF
GTID:1119360212982154Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
As an signify branch of knowledge discovery, data classification plays an increasingly excellent role in many business affairs. From the view of knowledge discovery, we sum up and appraise the data classification ways in existence based on given rules in this paper and introduce the currently results and research status of knowledge discovery, data classification and rough set on data classification. Based on the above works, we display our study of classification knowledge discovery based on rough set according to the approach of knowledge discovery in decision table.In this paper, data pretreatment is singled out to be discussed as one chapter, which including the denotation of decision table for information system, the clean up of redundant and inconsistent data in original decision table, the discretization of continued data, and so on. In order to both improve the efficiency and reduce the destroy of data cleaning, the idea of support degree is introduced to the definition of equivalence matrix in rough set, so as to construct an expanded equivalence matrix, which is used to design two data cleaning algorithms for the redundant and inconsistent objects in decision table separately. The attributes weightiness based on information entropy is introduced to construct the discernibility matrix of rough set. For a decision table with multi continues attributes values, exact discretization algorithm is given to reserve full consistency of decision table. At the same time, in order to conquer the keenness of discretization point to various training subset and various discretization technique, an algorithm is given to construct both exact discretization sections and fuzzy discretization sections based on exact discretization points.In order to reduce the conditional attributes of decision table, we divide the decision table into two cases called less objects and more objects, and design two improved rough set attributes reduction methods separately. For a common decision table with less objects, starting with the core of decision table, we present an attributes reduction algorithm based on information entropy and relative discernibility matrix, which can reduce the work of calculating and make full use of information entropy as long as the interaction among conditional attributes. For a rather huge decision table with large data set, that is, a decision table with more objects, in this paper the problem of attributes reduction for decision table is transformed into a programming issue based on expanded matrix, and an immune algorithm is given to solve the programming model. We calculate the core and expanded matrix of a decision table with the weightiness of attributes based on information entropy and rough dataclassification quality, and then we can replace the common immune algorithm based on information entropy with a new immune algorithm based on distance, to get the solution of programming issue, that is, the attributes reduction of decision table, so as to reduce the destroy of redundant and inconsistent attributes in decision table for the discovery of classification knowledge, reduce the mount of data to be treated, simplify the data classifier, thereby reduce the impact of noise.The common rule obtain way in decision table of rough set is improved too in this work. The rule incorporate way of continues conditional attributes, the probability data classification rule obtain way and the explanations of decision-making are discussed. Two bagging algorithms for value reduce and rule obtain based on both core attributes of decision class and equivalence matrix, where the value reduce algorithms based on both core attributes of decision class can lead to tidiness rules which are easy to be retrieved. For continues value of conditional attributes, the value reduce and rule incorporate algorithms based on equivalence matrix can incorporate continues value of conditional attributes with the same classification label, so as to avoid conflict rules and assure precision of produced rule, on the other hand, avoid the coming repeat work for reduction and incorporation. Furthermore, the probability data classification rule obtain way in boundary region is discussed, definitions of three parameters, strength, consistency and coverage are given based on conditional probability. An algorithm for decision making and explaining is given based on Bayesian theory of rough set. The algorithm is proved to be efficiency by the calculate results of demand integration analysis for supply chain.Furthermore, all of the knowledge discovery methods for data classification put forward in this paper are applied in a supply chain management case, we mainly discuss the supply chain demand trend forecast way based on data classification, introduce the improved data classification way based on rough set to the process of supply chain demand forecast, to get the forecast knowledge from the supply chain corporations anciently demand forecast experience and get the relation between demand trend and the economic, society as well as culture backdrops, so as to coach the coming demand trend forecast with the knowledge. For conditional attributes with high arrangement, an incorporate way of conditional attributes is put forward in this work to replace the common attributes reduction of decision table.
Keywords/Search Tags:Rough Set, Data Classification, Attributes Reduction, Rule Incorporate, Supply Chain Demand Trend Forecast
PDF Full Text Request
Related items