Research On Lifelong Clustering Algorithm Based On Online Bayesian Inference

Posted on:2023-02-27

Degree:Master

Type:Thesis

Country:China

Candidate:Z S Huang

Full Text:PDF

GTID:2558307073983239

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

The development of computer technology has produced a huge amount of data.Obtaining valuable information from it can provide convenience for human beings in work and life.Clustering analysis in data mining is an unsupervised learning method,which aims to divide the data without category labels into multiple class clusters so that the data similarity in the same class cluster is large,and the data similarity in different class clusters is small,effectively revealing the information in the data.The spectral clustering algorithm in clustering analysis transforms the clustering problem into the optimal partition problem of the graph.It can not only cluster in the sample space of arbitrary shape but also obtain the global optimal solution,so it is widely used.There exist correlations between clustering tasks.Multi-task learning uses the information between related tasks,improving the clustering performance.However,multi-task learning tends only to consider a limited set of tasks; lifelong learning came into being to break through this limitation.Lifelong learning is a continuous learning process capable of handling an unlimited number of tasks,which aims to accumulate knowledge of learned tasks and use them to help unlearned tasks.In addition,Lifelong learning does not need to save all samples,so the computational complexity and consumption of storage space are low.There has been some research on multi-task clustering and lifelong clustering,but most research uses the actual clustering division of samples in the dataset as the reference basis for hyper-parameter selection,unavailable in unsupervised learning.In addition,due to the unknown sample category in unsupervised learning and the high calculation complexity,research usually uses the unified hyper-parameters value for each task,resulting in the inability to achieve the best clustering effect.Bayesian inference does not depend on the actual category of samples,so it can adaptively select hyper-parameters for each task without supervision.This thesis proposes a maximum a posteriori lifelong spectral clustering algorithm based on Bayesian inference,which uses the maximum a posteriori probability in Bayesian inference to select better clustering hyper-parameters without supervision.The algorithm takes different values of hyper-parameters as the priori information.When a new task comes,the algorithm runs with all pairs of parameters in grid search manner; then takes the obtained clustering division as a posteriori information; finally selects the hyper-parameters with the maximum a posteriori probability,and the corresponding clusters divisions are the final clustering result.Experiments show that the algorithm can effectively select better hyper-parameters and overcome negative transfer,which improves the clustering performance.Lifelong clustering needs to overcome the problem of catastrophic forgetting,which means knowledge learned in historical tasks is forgotten,resulting in the reduction of clustering performance.The experience replay method stores and replays experiences learned from historical tasks to avoid catastrophic forgetting.This thesis proposes a lifelong clustering algorithm based on historical task experience replay,which applies experience replay to lifelong clustering,and designs the rules of task retention and replay to overcome catastrophic forgetting.The algorithm establishes an experience replay module to store historical task information and retains the task experience that is easier to be forgotten through the setting of the replay module capacity; then updates the module online after processing each task; after processing multiple tasks,the tasks in the module are replayed to review the information in learned tasks.Experiments show that the algorithm can effectively deal with catastrophic forgetting.Finally,this thesis constructs a Lifelong Clustering Visualization Demonstration System Based on Online Bayesian Inference to visually display lifelong clustering task information.The system mainly includes a data display module,a lifelong clustering algorithm execution module,a clustering result display module,and a hyper-parameter selection effect display module,which respectively display the data structures of different clustering tasks,the clustering convergence process,the clustering results,and the hyper-parameter selection effects of the algorithms and other information.

Keywords/Search Tags:

Bayesian Inference, Maximum a posteriori probability, Parameter selection, Lifelong clustering learning, Experience replay, Spectral Clustering

PDF Full Text Request

Related items

1	Fuzzy Technology And Its Application For Clustering And Regression
2	Research On Bayesian Learning Theory And Its Application
3	Research On Key Techniques Of Multi-task Lifelong Learning Based On Knowledge Replay
4	Segmentation Of Image-based Spectral Clustering Method
5	Research On Robustness Of Bayesian Fuzzy Clustering Method And Its Processing Of Large Data Sets
6	Study On Spectral Clustering For High-dimensional Data
7	Research On 3D Spatial Trajectory Clustering Algorithm Based On Spectral Clustering
8	Research On Experience Replay Method For Deep Reinforcement Learning
9	Research And Applications Of Clustering Algorithms With The Model Selection Ability
10	Spectral Learning And Clustering And Its Application