Font Size: a A A

Statistical Classification Analysis For High-dimensional Data

Posted on:2016-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:F HuFull Text:PDF
GTID:2297330464952474Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of modern technology,each industry has gotten a lot of dimensional data,which contains much information and knowledge.Currently,fields which the dimensional data often tags mainly include text analysis、biological gene data、web and media data, etc.Also,with the elevating of the economic and scientific leve,dimensions of those high dimensional data are continuing to increase.and has greatly exceeded the size of the past.The more complex the high dimensional data is,the more information resources it contains.In order to get the information data itself hidden in maximum extent,we must reduce redundant data、dislodge noise and exclude the interference of data classification.So, the classification mining of high dimensional data is becoming more and more significant.How to get the subset of features from the high-dimensional data set efficiently will have a direct impact on the efficiency of classifier model and people’s understanding for data.Statistical classification in this paper is oriented to the increasing high dimensional data set in life and research,the main problems the paper researches are as follows:(1) Analyse and research the traditional classification alclassification algorithmgorithm during the learning as a graduate student, which contains based on distance、decision tree、bias formula and other classification algorithm.Mainly discuss the excellence of the methods and the boundedness when they confront dimensional data.(2) Introduce EP schematic,a new classification model.Which involves its related definitions 、 capbility 、 classification process and its boundedness when they confront dimensional data.(3)In view of the problem that the classifier performance of the EP pattern classifier is reduced by the irrelevant or fitting properties when classifying the high-dimensional data, there are destinations to combine the appropriate feature selection method with the EP pattern classification algorithm for use. Through feature selection to exclude the interference and irrelevant attribute features, in order to remove lots of unrelated EP mode when meanwhile ensure the generation of valid EP mode. Thus, put forward a PREP classification algorithm, that is an EP pattern classification algorithm based on PCA-Relief F. Verification of high dimensional data using the proposed method in experiments, we can find out its superiority.
Keywords/Search Tags:Dimensional data, classification algorithm, feature selection, EP model, PREP algorithm
PDF Full Text Request
Related items