Font Size: a A A

A Robust Semiparametric Mixture Of Regression Model Based On T-Distribution

Posted on:2024-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y GeFull Text:PDF
GTID:2530307073459654Subject:Application probability statistics
Abstract/Summary:PDF Full Text Request
In practical problems,in the face of mixed data from heterogeneous populations,traditional non-mixture often have large errors in estimating results when using sample information to infer population parameters,and cannot achieve the desired effect.The finite mixture model can solve this problem,which can not only be used to deal with data clustering and numerical classification problems,but also to further analyze the properties of different components and the relationship between variables after classifying the data.The model has received more and more attention since it was proposed,and has been widely used in various fields such as biology,astronomy,medicine and finance.Mixture of regression model also known as model-based clustering,can be used for regression analysis of data containing heterogeneity,and have a wide range of applications in market segmentation,social science and other disciplines.Traditional mixture of regression models assume that the errors of each subcomponent follow a normal distribution,that is,a Gaussian mixture model.However,most data,in reality,tend to be heavy-tailed,skewed,or with outliers,and Gaussian distributions fail to accurately characterize these characteristics.Thus,robust mixture of models based on TLE,M-estimation,and penalty functions have emerged.By effectively reducing the weights of heavy-tail values and outliers,the robust model can obtain better fitting results than the Gaussian mixture model,and the accuracy of parameter estimation and clustering results is better.Typically,when considering a mixture of linear regression model,mixing proportion and the variances of error terms are set to be constant,but such conditions are difficult to meet in practice.Non-parametric models do not need to set the form of the model when modeling,which can better avoid model error.However,it also has corresponding disadvantages,such as weak interpretation ability and insufficient use of the information provided by the data.To this end,this thesis adopts a more flexible semi-parametric mixture of regression model that combines the advantages of the two types of models,which is more relaxed than the assumptions of the parametric model,and the computational cost is lower than that of the non-parametric model,which not only effectively avoids the error caused by the model setting,improves the fitting effect of the model,but also makes full use of the data information,and has a greater scope of application.The main contribution of this thesis can be summarized as the following three points:First,based on existing research,this thesis proposes a new semi-parametric mixture of regression model,in which the response variable is the linear regression function of the predictor variable and the mixing proportion is the smoothing function of the covariate,and it is assumed that the error term follows the t distribution.Reduce the sensitivity to outliers and re-tailed data in mixture of regression models to improve the robustness of the model and parameter estimation accuracy.Second,for the parameter estimation problem,this thesis proposes a three-step backfitting estimation process combined with kernel regression,and derives the corresponding EM-type algorithm.On this basis,the improvement is made to obtain a global accelerated EM type algorithm.While ensuring the accuracy of parameter estimation,the optimal convergence speed of regression parameters and mixed proportional functions is realized,and the relevant theorems are proved in this process.Third,this thesis conducts a numerical simulation,compares the proposed method with parameter estimation methods of some existing mixture of models.The results show that there is no significant difference in the parameter estimation results of all methods under the Gaussian distribution hypothesis,indicating that the proposed method is valid.Under the assumption of skewed and outliers,the application of the model algorithm in this thesis yields better parameter estimation results,indicating that the method is more robust.Finally,by analyzing the tone data,air quality data and diabetes data,it is proved that the proposed model algorithm has practical feasibility and can effectively reduce the influence of outliers on regression.
Keywords/Search Tags:Mixture of regression models, EM algorithm, Simiparametric model, t-distribution, Kernel regression
PDF Full Text Request
Related items