Statistical Classification Analysis For High-dimensional Data

Posted on:2016-03-01

Degree:Master

Type:Thesis

Country:China

Candidate:F Hu

Full Text:PDF

GTID:2297330464952474

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

With the rapid development of modern technology,each industry has gotten a lot of dimensional data,which contains much information and knowledge.Currently,fields which the dimensional data often tags mainly include text analysisã€biological gene dataã€web and media data, etc.Also,with the elevating of the economic and scientific leve,dimensions of those high dimensional data are continuing to increase.and has greatly exceeded the size of the past.The more complex the high dimensional data is,the more information resources it contains.In order to get the information data itself hidden in maximum extent,we must reduce redundant dataã€dislodge noise and exclude the interference of data classification.So, the classification mining of high dimensional data is becoming more and more significant.How to get the subset of features from the high-dimensional data set efficiently will have a direct impact on the efficiency of classifier model and peopleâ€™s understanding for data.Statistical classification in this paper is oriented to the increasing high dimensional data set in life and research,the main problems the paper researches are as follows:(1) Analyse and research the traditional classification alclassification algorithmgorithm during the learning as a graduate student, which contains based on distanceã€decision treeã€bias formula and other classification algorithm.Mainly discuss the excellence of the methods and the boundedness when they confront dimensional data.(2) Introduce EP schematic,a new classification model.Which involves its related definitions ã€ capbility ã€ classification process and its boundedness when they confront dimensional data.(3)In view of the problem that the classifier performance of the EP pattern classifier is reduced by the irrelevant or fitting properties when classifying the high-dimensional data, there are destinations to combine the appropriate feature selection method with the EP pattern classification algorithm for use. Through feature selection to exclude the interference and irrelevant attribute features, in order to remove lots of unrelated EP mode when meanwhile ensure the generation of valid EP mode. Thus, put forward a PREP classification algorithm, that is an EP pattern classification algorithm based on PCA-Relief F. Verification of high dimensional data using the proposed method in experiments, we can find out its superiority.

Keywords/Search Tags:

Dimensional data, classification algorithm, feature selection, EP model, PREP algorithm

PDF Full Text Request

Related items

1	High-dimensional Data Based On MIC Feature Selection And Application Research
2	Research On High Dimensional Imbalanced Data Classification In The Identification Of Risk User
3	Research On Semantic Classification Model Of Teaching Evaluation Based On Feature Weighted Stacking Algorithm
4	Chinese Text Categorization Method And Implementation
5	Research On PCA And CFS Feature Dimensionality Reduction Algorithm Based On MIC
6	Research On High Dimensional Imbalanced Data Classification Based On Random Forest
7	Research On The Identification Algorithm Of College Students’ Mental Health Problems Based On Campus Big Data
8	Research On Feature Selection Method For Software Defect Prediction
9	Feature Selection Based On Rough Set For Binary-class Imbalanced Data
10	Research On Feature Selection And Integration Method And It’s Applications