Data Reduction Research And Clustering Validity Analysis

Posted on:2018-02-27

Degree:Master

Type:Thesis

Country:China

Candidate:X Yu

Full Text:PDF

GTID:2348330542984980

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

With the development of artificial intelligence and information technology,the amount of data is emerging at all industries: genetic data,medical data,financial data,and so on,human beings are entering the era of data.Facing a large amount of data,one of the main problems is that how to remove noise,redundant data and find the valuable information hidden in the data.Data reduction technology is a good tool to solve this problem.At present,the technology of data reduction mainly concentrates on the reduction of features,and has little research on the reduction of data sets.In view of the existing situation,this paper studies the technique of reducing samples in data,and the clustering effectiveness is analyzed based on this.The main purpose of data reduction is to remove unimportant information from the data set and make the remaining data more conducive to analysis.Aiming at the general characteristics of the data set distribution,in this paper,we propose two methods of data reduction: grid-based data reduction method,data reduction method based on vector angle.Based on the grid method,we divide the data space,and define the absolute density and relative density of the data points in order to achieve the purpose of data reduction.In the vector angle method,we determine the average vector angle of each data point to distinguish the core and boundary objects in the data set,the important data are preserved by deleting the boundary objects step by step.We proves that the proposed algorithm can effectively remove the redundant data points in the dataset and make the structural information of the data set more obvious by experimenting on the artificial data sets and UCI data sets.Because of the characteristic of unsupervised clustering analysis in data mining,it has been widely used in dealing with massive information.However,the effectiveness of clustering analysis has been a hot topic.Determining the correct number of data sets by using the validity of clustering is vulnerable to noise data,class separation and clustering algorithm,the number of categories determined is difficult to guarantee.In this paper,the clustering accuracy and the optimal number of classes are analyzed based on the data reduction on the data sets before and after subtracting.The experiments shows that the subtractive data sets are more separable,it has higher clustering accuracy,and the optimal number of classes is closer to the true class number of the dataset.

Keywords/Search Tags:

Data reduction, Grid algorithm, Vector angle algorithm, Clustering validity, Optimal number of classes

PDF Full Text Request

Related items

1	Determination Of Optimal Clustering Number Of Mixed Data And Its Application
2	Research Of Improved K-means Algorithm And New Cluster Validity Index In Cluster Analysis
3	Research On New Clustering Validity Index Based On Improved Clustering Algorithm
4	Clustering Validity Analysis And Its Application In Electrical Tomography
5	Research On The New Validity Index Of Internal Clustering And The Method To Determine The Optimal Cluster Number
6	Research On Determining Optimal Number Of Clusters In Cluster Analysis
7	Optimal Density Clustering And Validity Analysis Of Double Statistics
8	Research Andapplication On Determining Optimal Number Of Clusters In Cluster Analysis
9	Application Of Clustering Algorithm Based On Validity Index In Analyzing The Behavior Feature Data Of Same Behavior From Multi-View
10	Research On Determining Optimal Number Of Clusters In Cluster Analysis