| The research of Gaussian mixture model mainly focuses on the selection of model component number and parameter estimation.The traditional approach is used to giving a value in advance and then adopting the method of estimating parameters based on a finite probability mixture model.When the complexity of the model doesn’t match datasets,this method is prone to overfitting or underfitting problems,and the algorithm can’t achieve automatic selection of the model.In this paper,we proposs methods to improve this problem,which perform parameter estimation and simultaneously determine the number of mixture components.Firstly,for the selection of the number of model components,the Dirichlet process is used as a priori distribution to construct the mixture model.Based on Stick-breaking construction,the upper limit constraint is set on the number of components,and finally the Gaussian mixture model with unknown number of components is obtained.Secondly,the variational inference algorithm of the model(VI-IGMM)is proposed to solve the problem of parameter estimation and component number selection of the model.The algorithm performs parameter estimation based on the variational inference and sets the weight threshold to select the model,and discards the component models with tiny weight coefficients to complete the estimation of the number of components.Thirdly,the parameter estimation method of the model is improved based on the stochastic variational inference method for solving the problem that the VI-IGMM algorithm becomes less efficient due to the increase of data volume.Similarly,model selection is added to the algorithm,and finally the stochastic variational inference algorithm SVI-IGMM of the model is obtained to solve the parameter estimation of massive datasets and the selection of the number of mixture components.Finally,the effectiveness of the two algorithms proposed in this paper for parameter estimation and model selection is verified by multiple sets of experiments.In the experiments,it is found that the VI-IGMM algorithm and the SVI-IGMM algorithm is not affected by the initial values of the components and can estimate the number of components accurately,while the EM algorithm cannot achieve model selection.Meanwhile,by comparing the algorithm iteration time and iterations,the efficiency of the SVI-IGMM algorithm is confirmed,and the algorithm is more suitable for processing large-scale datasets.SVI-IGMM algorithm not only effectively solves the problem of model parameter estimation and the selection of model component number,but also has the efficiency of processing large data sets,which is of great value in today’s massive information era. |