Font Size: a A A

A Study Of Clustering And Data Analysis Methods Based On One-Dimensional SOM

Posted on:2010-02-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y YuFull Text:PDF
GTID:1118360302495054Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Cluster analysis has become an active research topic in data mining areas at the present time.Self-Organization Feature Map (SOM) network can preserve the topological structure and the density distribution of a dataset and can join together the samples belonging to identical class when mapping the input samples onto the output neurons. Since SOM was created, research has mostly focused on the two-dimensional SOM. Instinctively, a two-dimensional SOM can preserve more information of datasets than a one-dimensional SOM. However, the author of this dissertation discovered that a one-dimensional SOM is not inferior to a two-dimensional SOM in joining similar samples together or in taking dissimilar samples apart, and the expression of the similarity relations between samples or clusters is easier and more distinct in one-dimensional SOM than in two-dimensional SOM. Therefore, this dissertation focused on studying the clustering ability of one-dimensional SOM systematically and developing cluster analyzing methods based on one-dimensional SOM.The experiment results proved that the one-dimensional SOM performed as well as the two-dimensional SOM in clustering. Compared with two-dimensional SOM, one-dimensional SOM not only can keep the linear separability of the original clusters, but also can map the clusters that are linearly inseparable into linearly separable ones. Therefore the relationships among samples or clusters are more intuitive and the clusters boundary visualization is easier in one-dimensional SOM than in two-dimensional SOM.The influence of the neuron number and training parameters on the clustering results were studied systematically. Three quantitative criterions were proposed. They are independence, dispersion degree and maximum concentration level. The value ranges of the parameters for adequate training were found. The foundations were established for the exploration of serial clustering methods based on one-dimensional SOM.Based on the complementary of one-dimensional SOM and two-dimensional SOM in the topology preservation, a clustering method called CC-SOM was proposed based on the combined chart of one-dimensional and two-dimensional SOM. Experiments were done using three datasets with cluster labels. The results demonstrate that this method is applicable to not only the spherical clusters but also the non-spherical clusters with complicated structure.For the high dimensional and large data sets, a method called MSPS-SOM was proposed based on the most similar prototype sequence of one-dimensional SOM. Experiments showed that this algorithm performed well with noisy datasets, could deal with datasets of large dimension, and could apply to clusters both based on distance and based on density.The concepts of Samples Distance Plot (SDP) was proposed. The program of gaining the data of SDP, the method of drawing SDP and the method of revising SDP were established. The relationship of the dataset structure and the SDP shape was developed. Based on this, a data analysis method called SDP-SOM was proposed. Experiments showed that this method was able to clustering and to find the detailed structures information of unknown datasets as well.
Keywords/Search Tags:data mining, clustering, data analysis, one-dimensional SOM, combined chart, most similar prototype sequence, sample distance plot
PDF Full Text Request
Related items