Gibbs sampling algorithm is a commonly used unsupervised learning algorithm in the field of Statistics and Machine Learning.It is applied to data analysis,which greatly saves resources and improves computational efficiency.With the development of society,the data available to people is increasing at an exponential rate,which brings two major problems to researchers: first,the scale of the data set is large,and it is difficult to store all the data;Second,the data dimension is high,the sample distribution is uneven,and the cost ofobtaining effective information is too high.Therefore,most traditional statistical methods are no longer suitable for large data sets.In order to solve the above research problems,we can start from the following aspects: first,Distributed Computing is adopted;Second,Subsample Sampling Method is adopted.In practical applications,the former has shortcomings such as complex architecture design,high cost of network infrastructure,and difficulty in testing and error checking,while the latter improves computing efficiency and extracts effective information while minimizing computing costs.The emergence of Subsample Sampling Method provides a new development direction for the research of big data.Therefore,this thesis selects the Subsample Sampling Method for research and proposes the corresponding algorithm-Gibbs sampling algorithm under optimal subsample(OSG).This thesis uses Subsample Sampling Method to reduce the computational complexity of Gibbs sampling algorithm.Firstly,equal probability subsample and unequal probability subsample are selected to obtain Gibbs sampling algorithm under equal probability subsamples(ESG)and Gibbs sampling algorithm under optimal subsamples(OSG).Secondly,the posteriori conditional probability distribution with the data from the normal distribution hypothesis and the linear regression model is derived under the unequal probability subsamples.Finally,numerical simulation is carried out,and the simulation results show that Gibbs sampling algorithm under optimal subsample(OSG)can capture more information under the assumption of data from normal distribution than under the linear regression model under the optimal subsample,and no matter the data from the normal distribution hypothesis or linear regression model,Compared with Gibbs sampling algorithm with full samples,Gibbs sampling algorithm under optimal subsample(OSG)has simple operation,easy implementation,low time cost and high computational efficiency.Compared with Gibbs sampling algorithm under equal probability subsamples(ESG),Gibbs sampling algorithm under optimal subsamples(OSG)has higher estimation efficiency under the same sampling ratio. |