The Study And Application About Statistical Methods Of Data Reduction

Posted on:2008-06-14

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Y X Liu

Full Text:PDF

GTID:1117360242979136

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

Data reduction is the key step of Data Mining and it is important to study the methods of data reduction. Majority of existing methods pay more attention to supervised learning currently. However the study of the unsupervised data reduction wasn't abundant relatively. Therefore this dissertation focuses on the study to the statistical methods and application of the unsupervised data reduction.In Chapter one, the backgrounds and significance of the selected topic were illustrated firstly. Afterwards, on the bases of summarizing relevant backgrounds and study methods of the data reduction from both home and abroad, we pointed out the contents and the innovative places of this paper.In Chapter two, it was discussed the missing value imputation and the outliers detection which are the base work of data reduction. In this chapter, we summarized some methods which can be applied in Data Ming on the basis of the analysis to those statistical methods. In addition, we analyzed consumers' consumptive behavior by the methods of the outliers detection applied in the database of the some consumptive mobile telecommunication.Data reduction includes tuples reduction and attributes reduction. In Chapter three, we discussed the discrezation of continuous attributes and the concept hierarchy which are two main methods of tuples ruduction. On the bases of the summary of the current methods of the discrezation and attribute oriented induction, we put forward two methods which were the discretization of continuous attributes based on discernibility matrix and the discretization of continuous attributes based on likelihood ratio hypothesis testing. The simulation to these methods in the Iris database validated their validation.The methods of attributes reduction include the importance order, the extraction and the selection of attributes. In Chapter four, we discussed the importance order of attributes. The supervised importance order of attributes is familiar in Data Mining. We firstly, made an introduction to it. And then on the aspect of the unsupervised order, two methods were put forward which were the improved rank sum applied in the single ordinal contingency data and the unsupervised order of attributes based on factor analysis. The simulation to the methods of the contingency data of the survey questionnaire and national inhabitant average per person consumptive expend in the databases gained satisfying results.Attributes extraction and attribute subset selection were discussed in Chapter five. We firstly introduced and evaluated the several methods of statistics and other disciplines applied in attributes linear extraction and followed by the main contents of this paper-attributes subset selection. After introducing and evaluating the basic knowledge and existed study productions, we put forward the method of the unsupervised stepwise forward selection. Then we validated their validation by examples.In Chapter six, we made a summary of this paper and raised some questions need to be improved and perfected in the future study.The main innovation ideas in this paper are as follows:We put forward (1) the method about the discretization of continuous attributes based on discernibility matrix and the discretization of continuous attributes based on likelihood ratio hypothesis testing.(2) The method about the improved rank sum that applied in the single ordinal contingency data.(3) The method about the unsupervised order of attributes based on factor analysis.(4) The method about the unsupervised stepwise forward attributes selection.

Keywords/Search Tags:

Data reduction, Data Mining, Statistics

PDF Full Text Request

Related items

1	The Study Of Data Mining Based On The Statistics View
2	Application Of Data Acquisition And Analysis And Mining In Basketball
3	Design And Realization Of Data Visualization Theory On Spatio-Temporal Data
4	Design And Implementation Of Analysis System Of Data-mining-based Campus Card Data
5	Analysis And Data Mining Of Sports Undertakings Statistics In China
6	The Research Of Data Statistic Analysis And Data Mining Based On Higher Education Teaching Information
7	Research On College Students' Academic Achievement Based On Trajectory Data Mining
8	Models And Application On University Subjects And Students Data Analysis Using Multiple Data Mining Strategies Method
9	Data Mining For Applied Research, Statistical Work
10	National Matriculation Grade Analysis Based On OLAP And Data Mining Technologies