Research On Discretization Of Continuous Features Based On Rough Set Theory

Posted on:2007-08-20

Degree:Master

Type:Thesis

Country:China

Candidate:X D Yue

Full Text:PDF

GTID:2178360185450968

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Data coming from the real life as classification information is usually continuous format, but continuous feature is not proper inputs for most inductive machine learning algorithms. Discrete values are intervals in a continuous spectrum of values. Transforming continuous feature into discrete ones can reduce the tested data's scale greatly and the discrete values are more concise to represent and specify, easier to use and comprehend and closer to a knowledge-level representation than continuous values. Discretization can improve many machine learning methods' efficiency and accuracy, so people are paying more and more attention on it.For the advantages of "no need of assumption of data for machine learning", "getting knowledge from incomplete or inconsistent data" and "the form of the knowledge more concise to represent and comprehend", rough set theory have been considered as an advanced induction learning method. The discretization methods based on rough set can describe the dependency among the attributes precisely to produce the better discretized data.Most existing discretization methods have been reviewed and analyzed in this paper. Furthermore, new optimized discretization methods based on rough set for supervised learning are also proposed to resolve the problems discovered in the existing methods. The main research work consists of several aspects as follows1. Propose a new axe by which discretization methods can be classified further and introduce most existing methods under the classification frame.In general, discretization methods of continuous attributes can beclassified on several axes: Dynamic vs. Static, Unsupervised vs. Supervised, Local vs. Global, Direct vs. Incremental, Top-down vs. Bottom-up. The new axe "Attribute-independent vs. Attribute-dependent" is proposed in this paper. Furthermore, we can see the methods of supervised, global, incremental, attribute-dependent features lead to the advanced discretization result through analyzing existing algorithms under the classification frame and the methods based on rough set often have the characteristics mentioned above.2. Analyze the discretization methods based on rough set and improve the MD-Algorithm according to the problems discovered in the analysis.Making use of the ordering of the cuts and attribute values on continuous features, a new method based on rough set theory is designed for reducing the complexity of the existing MD-Algorithm. The new method doesn't compress the data structure into linear space only, but also computes the cuts' importance by specific formula. So it can improve the efficiency of the original MD-Algorithm a lot.Considering the impacts of relationship among cuts to discretization accuracy, design the discretization methods that employs the cut dependency computation proposed. The methods can improve the accuracy of original MD-Algorithm.3. Complete the emulational experiment and verify the advantages of the improved algorithms.Implement the algorithms designed and several classical discretization methods to the huge scale geography information sets. Compare and analyze the discretization results to prove the advantages of the proposed methods.

Keywords/Search Tags:

Discretization, Rough set, Linear memory space, Dependency of cuts, MD-Algorithm

PDF Full Text Request

Related items

1	Research On Continuous Attributes Discretization And Rules Extracting Basesd Rough Set
2	Rough Sets Based Research On Method Of Discretization And Reduction Algorithm
3	A Study For Discretization Of Real Value Attributes Base On Rough Se Theory
4	Research On Rough Set Theory Based Data Mining Algorithm
5	Study On Comparison Of Discretization Algorithms Of Continuous Attributes
6	A Study On Rough Set Theory And Discretization Of Real Value Attributes
7	Research On Data Discretization And Classification Algorithm Based On Factor Space Theory
8	A Study For Discretization Of Real Value Attributes And LMS Algorithm
9	Application Of Rough Set And SVM In Discretization Of Continuous Attribute
10	Application Of Rough Set And Svm In Discretization Of Continuous Attribute