Font Size: a A A

The Study And Application About Statistical Methods Of Data Reduction

Posted on:2008-06-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y X LiuFull Text:PDF
GTID:1117360242979136Subject:Statistics
Abstract/Summary:PDF Full Text Request
Data reduction is the key step of Data Mining and it is important to study the methods of data reduction. Majority of existing methods pay more attention to supervised learning currently. However the study of the unsupervised data reduction wasn't abundant relatively. Therefore this dissertation focuses on the study to the statistical methods and application of the unsupervised data reduction.In Chapter one, the backgrounds and significance of the selected topic were illustrated firstly. Afterwards, on the bases of summarizing relevant backgrounds and study methods of the data reduction from both home and abroad, we pointed out the contents and the innovative places of this paper.In Chapter two, it was discussed the missing value imputation and the outliers detection which are the base work of data reduction. In this chapter, we summarized some methods which can be applied in Data Ming on the basis of the analysis to those statistical methods. In addition, we analyzed consumers' consumptive behavior by the methods of the outliers detection applied in the database of the some consumptive mobile telecommunication.Data reduction includes tuples reduction and attributes reduction. In Chapter three, we discussed the discrezation of continuous attributes and the concept hierarchy which are two main methods of tuples ruduction. On the bases of the summary of the current methods of the discrezation and attribute oriented induction, we put forward two methods which were the discretization of continuous attributes based on discernibility matrix and the discretization of continuous attributes based on likelihood ratio hypothesis testing. The simulation to these methods in the Iris database validated their validation.The methods of attributes reduction include the importance order, the extraction and the selection of attributes. In Chapter four, we discussed the importance order of attributes. The supervised importance order of attributes is familiar in Data Mining. We firstly, made an introduction to it. And then on the aspect of the unsupervised order, two methods were put forward which were the improved rank sum applied in the single ordinal contingency data and the unsupervised order of attributes based on factor analysis. The simulation to the methods of the contingency data of the survey questionnaire and national inhabitant average per person consumptive expend in the databases gained satisfying results.Attributes extraction and attribute subset selection were discussed in Chapter five. We firstly introduced and evaluated the several methods of statistics and other disciplines applied in attributes linear extraction and followed by the main contents of this paper-attributes subset selection. After introducing and evaluating the basic knowledge and existed study productions, we put forward the method of the unsupervised stepwise forward selection. Then we validated their validation by examples.In Chapter six, we made a summary of this paper and raised some questions need to be improved and perfected in the future study.The main innovation ideas in this paper are as follows:We put forward (1) the method about the discretization of continuous attributes based on discernibility matrix and the discretization of continuous attributes based on likelihood ratio hypothesis testing.(2) The method about the improved rank sum that applied in the single ordinal contingency data.(3) The method about the unsupervised order of attributes based on factor analysis.(4) The method about the unsupervised stepwise forward attributes selection.
Keywords/Search Tags:Data reduction, Data Mining, Statistics
PDF Full Text Request
Related items