Data Density Based Clustering And Outlier Detection

Posted on:2012-02-14

Degree:Master

Type:Thesis

Country:China

Candidate:W He

Full Text:PDF

GTID:2178330335950662

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Nowadays, with the development of computer science, people collect more data than ever before. How to extract pattern and learn useful knowledge from large data set is an important research field. Data mining is to combine statistics and artificial intelligence to help people to analyze data and make decision. The basic assumption of data mining is that the knowledge can be learned from data by statistics inference.The local density of data can show the internal structure of data, which can help people to understand data essentially. The probability density estimate has solid mathematical foundation, which is a very useful statistical tool in data mining domain. There are many probability density estimate methods in the literature, which are widely used in clustering, classification, outlier detection, data condensed. The basic assumption of probability density estimate is that the data are generated from an unknown distribution, and the high local density implies the cluster center area, as well as the lower local density implies the cluster boundary or outliers. Such two cases correspond to cluster analysis and outlier detection. So, it is easy to guess if cluster analysis and outlier detection can be unified by a probability density estimate framework.In this paper, we propose a new adaptive density estimate algorithm. This algorithm can make the data points in the same cluster more similar and the data points in different clusters more dissimilar. We can find more useful knowledge in data set than before by using the proposed density estimate algorithm.Firstly, the proposed adaptive data density estimate algorithm can be used to find the center of cluster, and the center of cluster can be the initial point of X-means of FCM algorithm. The experiments show the initial point selected by our algorithm can improve the performance of K-means and FCM algorithm on synthetic data set and benchmark data set.Secondly, using the proposed adaptive data density estimate algorithm we propose a new outlier detection algorithm MMOD. MMOD estimates the local density of each data point as the outlier score. The experiment shows that MMOD has better performance than state of art on benchmark data set.

Keywords/Search Tags:

data density estimate, mountain method, initial cluster centre, outlier detection

PDF Full Text Request

Related items

1	Research On Outlier Detection In Evolving Data Streams
2	Research And Application Outlier Detection Method Based On Density&Distance
3	Research And Improvement Of Local Outlier Detecting Algorithm Based On Density
4	The Outliuer Detection Algorithm Based On Cluster Outlier Factor And Unique Closet Neighbor Set
5	Research On Outlier Detection In Data Stream Based On Density
6	Research On Technology For Detecting Density-based Outlier
7	Study On Cluster Analysis And Outlier Detection Based On Natural Neighbor And Density Core
8	Improvement Of Density-Based Local Outlier Detection Algorithm
9	Outlier Detection And Application Of Categorical Data In Spark Cluster
10	Study On An Analysis Method For Cluster-based Outlier