| The development of computer technology has produced a huge amount of data.Obtaining valuable information from it can provide convenience for human beings in work and life.Clustering analysis in data mining is an unsupervised learning method,which aims to divide the data without category labels into multiple class clusters so that the data similarity in the same class cluster is large,and the data similarity in different class clusters is small,effectively revealing the information in the data.The spectral clustering algorithm in clustering analysis transforms the clustering problem into the optimal partition problem of the graph.It can not only cluster in the sample space of arbitrary shape but also obtain the global optimal solution,so it is widely used.There exist correlations between clustering tasks.Multi-task learning uses the information between related tasks,improving the clustering performance.However,multi-task learning tends only to consider a limited set of tasks; lifelong learning came into being to break through this limitation.Lifelong learning is a continuous learning process capable of handling an unlimited number of tasks,which aims to accumulate knowledge of learned tasks and use them to help unlearned tasks.In addition,Lifelong learning does not need to save all samples,so the computational complexity and consumption of storage space are low.There has been some research on multi-task clustering and lifelong clustering,but most research uses the actual clustering division of samples in the dataset as the reference basis for hyper-parameter selection,unavailable in unsupervised learning.In addition,due to the unknown sample category in unsupervised learning and the high calculation complexity,research usually uses the unified hyper-parameters value for each task,resulting in the inability to achieve the best clustering effect.Bayesian inference does not depend on the actual category of samples,so it can adaptively select hyper-parameters for each task without supervision.This thesis proposes a maximum a posteriori lifelong spectral clustering algorithm based on Bayesian inference,which uses the maximum a posteriori probability in Bayesian inference to select better clustering hyper-parameters without supervision.The algorithm takes different values of hyper-parameters as the priori information.When a new task comes,the algorithm runs with all pairs of parameters in grid search manner; then takes the obtained clustering division as a posteriori information; finally selects the hyper-parameters with the maximum a posteriori probability,and the corresponding clusters divisions are the final clustering result.Experiments show that the algorithm can effectively select better hyper-parameters and overcome negative transfer,which improves the clustering performance.Lifelong clustering needs to overcome the problem of catastrophic forgetting,which means knowledge learned in historical tasks is forgotten,resulting in the reduction of clustering performance.The experience replay method stores and replays experiences learned from historical tasks to avoid catastrophic forgetting.This thesis proposes a lifelong clustering algorithm based on historical task experience replay,which applies experience replay to lifelong clustering,and designs the rules of task retention and replay to overcome catastrophic forgetting.The algorithm establishes an experience replay module to store historical task information and retains the task experience that is easier to be forgotten through the setting of the replay module capacity; then updates the module online after processing each task; after processing multiple tasks,the tasks in the module are replayed to review the information in learned tasks.Experiments show that the algorithm can effectively deal with catastrophic forgetting.Finally,this thesis constructs a Lifelong Clustering Visualization Demonstration System Based on Online Bayesian Inference to visually display lifelong clustering task information.The system mainly includes a data display module,a lifelong clustering algorithm execution module,a clustering result display module,and a hyper-parameter selection effect display module,which respectively display the data structures of different clustering tasks,the clustering convergence process,the clustering results,and the hyper-parameter selection effects of the algorithms and other information. |