Font Size: a A A

Data Density Based Clustering And Outlier Detection

Posted on:2012-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:W HeFull Text:PDF
GTID:2178330335950662Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Nowadays, with the development of computer science, people collect more data than ever before. How to extract pattern and learn useful knowledge from large data set is an important research field. Data mining is to combine statistics and artificial intelligence to help people to analyze data and make decision. The basic assumption of data mining is that the knowledge can be learned from data by statistics inference.The local density of data can show the internal structure of data, which can help people to understand data essentially. The probability density estimate has solid mathematical foundation, which is a very useful statistical tool in data mining domain. There are many probability density estimate methods in the literature, which are widely used in clustering, classification, outlier detection, data condensed. The basic assumption of probability density estimate is that the data are generated from an unknown distribution, and the high local density implies the cluster center area, as well as the lower local density implies the cluster boundary or outliers. Such two cases correspond to cluster analysis and outlier detection. So, it is easy to guess if cluster analysis and outlier detection can be unified by a probability density estimate framework.In this paper, we propose a new adaptive density estimate algorithm. This algorithm can make the data points in the same cluster more similar and the data points in different clusters more dissimilar. We can find more useful knowledge in data set than before by using the proposed density estimate algorithm.Firstly, the proposed adaptive data density estimate algorithm can be used to find the center of cluster, and the center of cluster can be the initial point of X-means of FCM algorithm. The experiments show the initial point selected by our algorithm can improve the performance of K-means and FCM algorithm on synthetic data set and benchmark data set.Secondly, using the proposed adaptive data density estimate algorithm we propose a new outlier detection algorithm MMOD. MMOD estimates the local density of each data point as the outlier score. The experiment shows that MMOD has better performance than state of art on benchmark data set.
Keywords/Search Tags:data density estimate, mountain method, initial cluster centre, outlier detection
PDF Full Text Request
Related items