Font Size: a A A

Development and characterization of a goodness of fit test for two-component normal mixture

Posted on:1993-05-11Degree:Ph.DType:Dissertation
University:State University of New York at Stony BrookCandidate:Chu, Teng-ChiaoFull Text:PDF
GTID:1479390014995599Subject:Statistics
Abstract/Summary:
Mixtures of normal distributions are a rich class of probability models for random phenomena, with wide applications in genetics, physics, medicine, economics, and other fields (see Titterington et al., 1985 for references). Although many research papers and texts have been written about parameter estimation and testing for the presence of more than one component, the problem of testing the goodness of fit of a mixture model does not seem to have been addressed.; In this dissertation, I develop a Lilliefors-like goodness-of-fit test ("Dmax") of a mixture consisting of two components that are normally distributed with different means but equal variances. I consider six mixing proportions (p = 0.5, 0.6, 0.7, 0.8, 0.9, and 0.95), three sample sizes (n = 25, 50, and 100) and five values of standardized difference between two component means (D = 1, 2, 3, 4, and 5). The empirical null distribution of the test statistic is estimated based on a Monte Carlo simulation study of size 5000.; Analysis of the percentiles and moments of Dmax indicate that the null distribution of the statistic depends on the model parameters p and D. Further investigation of a variance stabilizing transformation gave estimates of the power transform to be over the interval from log Dmax ({dollar}{bsol}lambda{dollar} = 0) to {dollar}{lcub}1{bsol}over Dmax{rcub}{dollar} ({dollar}{bsol}lambda{dollar} = {dollar}-1{dollar}) with the estimated power transform being {dollar}Dmax{bsol}sp{lcub}-0.083{rcub}{dollar} ({dollar}{bsol}lambda{dollar} = {dollar}-0.083{dollar}).; Two statistics, absolute error (AE) of the test and coefficient of variation (CV) of the percentiles, were used to determine which (if any) value of {dollar}{bsol}lambda{dollar} (the power transform of Dmax) resulted in a distribution having little or no dependency on the parameters of the mixture (D and p). The AE was obtained by first taking the average of the (1-{dollar}{bsol}alpha{dollar})100 percentage points conditional on n as the common critical value for an {dollar}{bsol}alpha{dollar} level test. Secondly I determined the actual level of significance, {dollar}{bsol}alpha{dollar}*. That is, {dollar}{bsol}alpha{dollar}* is the observed proportion of rejections associated with the common critical value. I then computed the AE as the absolute difference between the desired Type I error ({dollar}{bsol}alpha{dollar}) and the actual Type I error ({dollar}{bsol}alpha{dollar}*), i.e., AE = {dollar}{bsol}vert{bsol}alpha{dollar}* {dollar}-{dollar} {dollar}{bsol}alpha{bsol}vert{dollar}. The optimal transform was defined as the one with minimum average absolute error. This question was addressed by a four-way (p, D, n, and {dollar}{bsol}lambda{dollar}) ANOVA of AE for {dollar}{bsol}alpha{dollar} = 0.2, 0.05, and 0.01. The result indicated that the transforms {dollar}{lcub}1{bsol}over Dmax{rcub}{dollar} and {dollar}{lcub}1{bsol}over {bsol}sqrt{lcub}Dmax{rcub}{rcub}{dollar} were the appropriate ones to use. Further ANOVA of CV for {dollar}{bsol}alpha{dollar} = 0.2, 0.05, and 0.01 were conducted for the transforms {dollar}{lcub}1{bsol}over Dmax{rcub}{dollar} and {dollar}{lcub}1{bsol}over {bsol}sqrt{lcub}Dmax{rcub}{rcub}{dollar} to find out which one of these two transforms resulted in more reliable percentage points. The result indicated that the transformation {dollar}{lcub}1{bsol}over {bsol}sqrt{lcub}Dmax{rcub}{rcub}{dollar} is the 'optimal' transform. The empirical null distribution of the statistic {dollar}{lcub}1{bsol}over {bsol}sqrt{lcub}Dmax{rcub}{rcub}{dollar} is tabulated as have critical values of the test statistic.
Keywords/Search Tags:Test, Mixture, {dollar}{lcub}1{bsol}over {bsol}sqrt{lcub}dmax{rcub}{rcub}{dollar}, Distribution, Transform, Statistic
Related items