Font Size: a A A

Research And Application On The Number Of Components In Mixture Models

Posted on:2021-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y L ZhaoFull Text:PDF
GTID:2370330614963955Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Due to the multi-source nature of the data,many of the data in current data analysis are the data of mixture model.Using the mixture model to analyze,it typically yields more accurate results than conventional methods in cluster analysis.A key element is the number of components in the mixture model,and it determines the final result of data analysis.The expectation maximization(EM)algorithm is commonly used for parameter estimation in mixture model.EM algorithm is an iterative algorithm for computing the maximum likelihood estimates of model parameters with incomplete data,and the number of components in mixture model cannot be observed or is a missing value.Researchers often use AIC and BIC to determine the number of components in mixture model.However,these two criteria are not reliable in applications,and often yield misleading results in real data analysis.Aiming at this problem,this paper studies the problem of determining the number of components in mixture model.The main works are as follows:(1)Aiming at the instability of AIC and BIC,this paper propose an improved method based on the one-dimensional mixture model.The new method uses the idea of maximum likelihood and uses the EM algorithm,use the screen plot to determine the number of components.Experiments show that the new method enhances the accuracy,and the process is more intuitive.(2)In general,there are many multi-dimensional mixed data with complex relationships(regression,classification,etc.).The statistical models or parameters of different components may be different,and their residuals may come from different distributions.This paper proposes a new method to determine the number of components in the mixture regression model,which use the screen plot of a log-likelihood function.Experiments show that the new method can get more accurate results under unsatisfactory conditions.(3)Aiming at the lack of application of the mixture model,this paper will apply the new method to determine the number of components in the parameters estimation of health insurance data.Then,this paper cluster the policyholders into two groups,and formulate scientific strategies of insurance pricing for different groups of people.
Keywords/Search Tags:mixture model, number of components, screen plot, EM, regression
PDF Full Text Request
Related items