Rough set theory, proposed by Pawlak in the early 1980s, is a mathematical theory for reasoning about data. The main idea of the theory is to approximate inexact, uncertain concepts by using of available knowledge or information. Since 1990s, it has attracted much attention of researchers around the world. Now, this theory has become a flash point in the research area of computer science and information science.In classical Rough Set, the equivalence relations play important roles. In fact, the equivalence relations are likely to inexistence in practice, so the classical rough set theory is limited in incomplete information systems. Started with the equivalence relations of classical rough set model, we put forward a kind of extensions rough set model by introduced majority inclusion relation. Not only does the extension reserve the advantages of the present model, but also these models discard the disadvantages of the present model. This model improves the tolerance relation and makes the rough set more flexible.In the real databases, the data records are composed of many attributes with continuous value. Since most of the existing methods of data mining are capable of dealing with the discrete attributes, it's necessary to discretize the continuous attributes firstly. In this thesis, some concepts of rough sets are combined to study the discretization of continuous attributes. In this paper, an algorithm for continuous attribute discretization based on information gain and tolerance degree which is used to measure the attributes import is introduced. This method thinks over not only the ordinal relation of attribute values, but also the relative distance of attribute value.
|