Font Size: a A A

Research On Robustness Of Bayesian Fuzzy Clustering Method And Its Processing Of Large Data Sets

Posted on:2022-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhangFull Text:PDF
GTID:2518306518470374Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,the production and use of data is increasing.At the same time,with the progress of data storage technology,the scale of the data is becoming larger and larger.However,these data are often disordered and contain a lot of useless information.Moreover,due to the high cost of manual marking,the labeled data is rare and the unlabeled data is huge and easy to get.Unlabeled data corresponds to unsupervised learning.As a typical method of unsupervised learning,cluster analysis has great significance in exploring the internal relations and potential laws between these data.For these data contain a lot of useless information,how to avoid the impact of invalid data on the effective data when extracting the effective information of the data,that is to improve the anti noise ability of the algorithm;the data set is large,can not be directly imported into memory,and how to reduce the time problem in the process of algorithm operation is of great significance.In order to solve these two problems,this paper does the following work on the basis of Bayesian fuzzy clustering(BFC)algorithm.1.In order to solve the problem that the number of clusters must be specified when the traditional clustering algorithm starts to run,a Bayesian possibility clustering model is proposed,which can estimate the number of clusters and achieve satisfactory noise resistance ability.Bayesian inference and particle filter inference are used to find the maximum-a-posteriori parameters.Poisson distribution is used to estimate the optimal clustering number.In the process of computing the sample membership,the value of membership is only related to the distance between the sample point and its corresponding cluster center,which greatly improve the anti-noise ability of the model.The validity of the proposed method is verified on the data sets of iris and wine,armstrong-2002-v2 and bhattacharjee-2001,and brain CT images collected in real environment.The results show that the proposed method is effective.2.In order to solve the problem that the data set is too large to be imported into memory directly and the time consumption of BFC algorithm in processing large data set,an online Bayesian fuzzy clustering method is proposed.On the basis of BFC method,an online learning framework is introduced.At the same time,the large data set is divided into several subsets in proportion,and each subset is weighted Bayesian fuzzy clustering,The final cluster center is obtained by merging the cluster centers obtained from each data block,so as to reduce the consumption of computer memory and the running time of the algorithm.The validity of the method is verified on 2D2 C and 4D4 C composite data sets and Skin data sets.
Keywords/Search Tags:fuzzy clustering, particle filter inference, Poisson distribution, Bayesian inference, medical images, large-scale data, online clustering
PDF Full Text Request
Related items