Font Size: a A A

Mixture Model With Auxiliary Information And Its Application

Posted on:2015-09-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:S T LiFull Text:PDF
GTID:1220330467461322Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Mixture model is a probabilistic model for representing the presence of subpopula-tions within an overall population. Finite mixture model not only is an important problemfor statisticians in theoretical research, but also has a very wide applications in practice.In theory, the nonregularity of fnite mixture model bring many difculties for the re-searchers, and many classical statistical conclusions do not apply. In particular, when themixture model degenerate into an unicomponent model, the parameters are not identifed,and the limiting distribution of likelihood ratio test (LRT) statistic is also not a chi-squaredistribution. According to the nonregularity of fnite mixture model, researcher presentmany methods, which mainly include restriction of the parameters space, penalty to theparameters, and construct the new statistic for test based on EM algorithm an so on.This paper introduce a new way to study mixture model, mixture model with auxiliaryinformation. Research shows that the additional data information can make the param-eters can be identifed, the estimator of parameters are consistent, the convergence rateof parameters estimation can be improved, and the limiting distribution of LRT statisticis simple and easy to use. By the auxiliary information, the power of hypothesis test willbe greatly improved.In applications, mixture model is important to the genetics. Genome imprinting isan important epigenetic phenomena, and have stronger association with many complexdiseases. Identifying the imprinted gene is very useful to study the causes of complexdiseases. Most of the exiting statistical methods are research based on the pedigree orthe family data. However, for some disease such as late-onset diseases, the informationof the parents or other family members are difculty to obtain. This paper considerthe problem of identifying imprinting based on the population data. According to thecharacteristics of imprinting, two alleles derived from parents have the diferent expression.For the heterozygote samples, the parental origin is not determined, and the expression of heterozygote has a two-component mixture model. For the homozygote samples, thecorresponding expression has a unicomponent model. The statistical problem is to testwhether the expressions of two alleles derived from parents are diferent. The homozygotesamples can provide many useful information to infer the mixture model.This paper consider imprinting test as the fundamental problem, and study the the-oretical properties of mixture model with the various auxiliary information and its appli-cation. Firstly, we consider the auxiliary information of homozygote samples to mixturemodel based on the population data. According to the characteristics of imprinting, anormal mixture model is introduced, in which the assumed of equal variance and unequalvariance are all considered. For the case of equal variance, by the auxiliary informationof homozygote samples, we prove that the maximum likelihood estimator (MLE) of pa-rameters is consistent and the LRT statistic has a0.5χ21+0.5χ22limiting distribution. Forthe case of unequal variance, we introduce the penalty function to the variance parameterand mixture ratio, in order to avoid the likelihood function is unbound and the param-eters are identifed. Based on the penalty likelihood, using the auxiliary information ofhomozygote samples, we prove that the parameters estimation is consistent and the LRTstatistic has a χ23limiting distribution. In addition, we apply these results to analyse theassociation of imprinting and disease in the schizophrenia data.Secondly, we present EM-test for the mixture model with auxiliary information basedon the population data. In this research, we combine the auxiliary information and EM-test, and propose a new EM-test statistic. In the construction of Statistic, the case ofπ=0.5was considered, and the limiting distribution of EM-test statistic was deducted.Finally, we consider the mixture model based on nuclear family data. The equalvariance and unequal variance model are proposed using the parent’s information. Bythe parent’s data, auxiliary information is more sufcient. Only the three people in thenuclear family are all Heterozygous, the corresponding trait is satisfy a two-componentmixture model. Research shows that sufcient information make the MLE of parametersis consistent, the convergence rate of the parameters are all Op(n1/2), and the LRTstatistic has a chi-square limiting distribution, the same as classical statistical result.This paper has made some breakthroughs in statistical theory and practical applica-tions of the proposed methods. Especially, in statistical theory, this paper provides a new way to study the mixture model. The use of auxiliary information not only makes theLRT statistic have the simple and easily-used limiting distribution, but also makes thepower of hypothesis testing be greatly improved. In applications, this paper could bettersolve the identifcation problem about the imprinted gene of population data.
Keywords/Search Tags:EM algorithm, EM-test, auxiliary information, mixture model, SNP, likelihood ratio test, imprinted gene, fnite normal mixture model
PDF Full Text Request
Related items