Font Size: a A A

Research On Continual Learning Method Based On Bayesian Model

Posted on:2021-07-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y YangFull Text:PDF
GTID:1488306050463954Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In real life,every object on Earth is generating data all the time.Traditional batch learning algorithm assumes all training data are available for the model training,having made it impractical and unsuitable for real world large scale dataset in the continual learning scenarios.If we can learn some interesting information from this kind of streaming data,and then maintain these learned information,without the need to revisit the past data,we may have huge benefits.However,in practice,the data we face are usually diverse in type,such as the data is high in dimension,large scale in data size,and its high complexity,these challenges all put forward higher request on model capability.Bayesian models have solid statistical theories,the Bayesian nonparametrics enables the model to automatically learn its structure according to different data.Bayesian inference can provide the uncertainty of estimated parameters,therefore,Bayesian models are effective of alleviating the overfitting problem to present a robust parameter estimation.Continual learning algorithm based on Bayesian model can continuously receive data and dynamically update model in sequence,which makes it suitable for large scale and streaming data,thus continual learning is one of the hot spots in current machine learning field.Focusing mainly on continual learning for streaming data,this dissertation presents relevant researches on designing Bayesian models,their parameters inference,and processing for large scale data.The main contents are summarized as follows:Firstly,for this kind of continual learning problem of streaming data clustering,we propose our memorized variational continual learning(MVCL)algorithm for Dirichlet process mixture models.In current task,our algorithm can automatically learn the number of mixture components via birth and merge moves.For the large scale dataset in current task,we divide the dataset into B fixed batches,and temporarily storing the sufficient statistics of each mixture component for each batch data,each iteration we only visit one batch data at random,and update the model parameters,therefore,our algorithm can handle large scale dataset.When the next task comes,The learned model parameters are taken as prior,multiplied by the data likelihood of the next task,after normalization,we obtain the posterior distribution of model parameters for the next task.The experimental results show that,compared with the traditional methods,our method can perform well on predicting test data of each task,and also have a good clustering performance for the previous datasets.Secondly,for the streaming data in target recognition tasks,we propose Variational Dropout Sparsifies Dynamically Expandable Network(VDSDEN)algorithm.When the first task comes,by applying variational dropout to tune individual dropout rates for each element in the weight matrix of a deep model,we can set some weights in the weight matrix to zero,and achieve a sparse model structure.When a new task is coming,through the variational dropout sparse learning on the topmost hidden units for each task,VDSDEN will first perform selective retraining for the new task.After selective retraining,if the existing network cannot satisfy the new task,VDSDEN will dynamically add in the necessary number of neurons by setting a threshold for the average dropout rate computed by the added neurons,to increase the network capacity,thus for the added neurons,only necessary neurons will be retained,and unnecessary neurons will be pruned.When the network shows degenerate performance for earlier tasks,VDSDEN will perform network duplication to prevent such semantic drift or catastrophic forgetting.The experimental results show that,compared with other continual learning algorithm on multiple public datasets,our VDSDEN method can get similar performance for each task,while learning a more sparse network structure.Finally,in order to reduce the parameter quantity,a continual learning algorithm named Bayesian Compression for Dynamically Expandable Network(BCDEN)is proposed.When the first task,by employing sparsity inducing priors for hidden units,some rows in the weight matrix can be pruned to get the compressed network.When a new task is received,by compressing the topmost network structure for each individual task,BCDEN will first perform selective retraining for the new task.After selective retraining,if our algorithm achieves an unsatisfactory recognition result for the new task,that means the old network can not sufficiently explains the data of new task,BCDEN will dynamically add in the necessary number of neurons by employing sparsity inducing priors for them,thus only necessary neurons will be retained,and unnecessary neurons will be pruned.When the network shows degenerate performance for earlier tasks,BCDEN will perform network duplication to prevent such semantic drift.The experimental results show that,compared with other algorithm on multiple public datasets,our algorithm achieves similar or better performance for each task,while learning a more compact model structure,equivalently fewer parameters.
Keywords/Search Tags:Bayesian nonparametric, Streaming data, Continual learning, Target recognition, Bayesian compression, Dynamically Expandable Network(DEN), Smantic drift
PDF Full Text Request
Related items