Font Size: a A A

Research On PageRank-PathBased Clustering Algorithm

Posted on:2016-05-14Degree:MasterType:Thesis
Country:ChinaCandidate:Q D LiuFull Text:PDF
GTID:2308330461474135Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Clustering, also known as group analysis, is an important algorithm of data mining. Clustering tries to find the densely populated regions in the feature space based on the similarity among the objects, so that the same class objects have more similarities. Different with the classification, clustering is an unsupervised learning technique. Clustering is widely used in many fields, such as recommendation system, text classification, image recognition etc. Although clustering algorithm has been studied for decades, there are still many efforts in the research on it.FSC is one of clustering algorithm based on density, which has been proposed in the journal science in 2014 for the first time. It has many advantages, such as simple, easy to understand, low computational complexity etc. FSC has its basis in the assumptions that cluster centers are surrounded by neighbors with lower local density and that they are at a relatively large distance from any points with a higher local density. Compared to the other clustering algorithm based on the density, FSC requires only one input parameter and has a higher practicability.RFSC is an improved algorithm of FSC algorithm. The local density is computed as mean distance to nearest M neighbors instead of kernel function or truncated kernel function. Different with the FSC algorithm, RFSC is less sensitive to the parameter and has higher efficiency. However, both of them could not deal with the uneven density data sets. To solve these problems, we propose an improved algorithm named KFSC. KFSC customizes the own personalized appeal for each objects through dynamic control of the width of the kernel function. The experiments results show that the KFSC algorithm performed well in the face of the uneven density data sets.Finally, we propose a PageRank-PathBased clustering algorithm. In this paper, we first use the PageRank algorithm in clustering algorithm to find the cluster centers. Second, consider the transfer characteristics of the similarity, this paper introduces the pathbased similarity, and makes a modification so that the methods can robust enough against noise and outliers in the data set. The experimental results show that the algorithm proposed by us is able to identify clusters irrespective of their shapes or relative positions, and can handling of noise or outliers. Moreover, the algorithm does not require any parameters and greatly improve the practicability of the algorithm.
Keywords/Search Tags:Clustering, Density-based Clustering, PageRank, PathBased, Robust
PDF Full Text Request
Related items