Font Size: a A A

Bayesian Network Latent Variable Model And Its Application In Association Analysis Of Gene

Posted on:2012-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:S K ZhangFull Text:PDF
GTID:2154330332996606Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Most of the complex diseases are polygenic disorders. Following with the completion of the HamMap, the studies about single-nucleotide polymorphisms and haplotype, which have become a main focus in many biomedical researshes., are going to play an important role on the research of genetic mechanism, risk of disease and different reactions to drugs of the complex diseases. The corresponding statistical methods also have become international research hot spot in recently years. Because of SNPs data's characters, such as existing measurements error and neglecting the whole gene effect in the study, many methords have defect in practical application. So this paper use latent class model based on bayesian network to analyse the high-dimensional, genome-wide data. Bayesian network latent class model not only can express the comprehensive effect of haplotype and high-dimensional SNPs effectively, but also can perform the structure analysis feature of bayesian network to analyse the complicated network structure of SNPs. Its an effective methods to analyse the large scale genetic data and will provide new methodology to the study of heredity and gene location of complex traints diseases.Based on the concept of bayesian network, this paper introduces the theory about Bayesian ayesianlatent variable model ,including identifiability, parameter estimation ands structure study of the model.We use the conception of regulation to introduce the identifiability of the model. The model parameter estimation expounds the maximum likelihood estimation, Bayesian estimation and EM algorithm which uses to the missing data. According to the procedure of generating bayesian latent variable model, model structure study introduces the scoring function which is the selection criterion of the model, including Bayesian score, BIC score, AIC score, HVL score, BICe score, CS score and the model optimization algorithm which introduces the K2 algorithm and hill climbing algorithm which is the important algorithm in this research. This paper detailed introduces two types of Bayesian network latent variable model: latent class model and hierarchical latent class model and points out the distinction and connection of the two models. This paper also expounds how to acquire the optimum model using the current data.Based on the theory, this paper use bayesian network latent varialbe model to analyse the two practical SNPs data. Example one is the data about depression's SNPs collectde by the first hospital of shanxi medic university.Each patient measures 7 SNPs.Data analysis outputs show that the population is divided into two latent classes, the probability of the two class is 0.22 and 0.78. The main reason caused the differert probability is rs11568817 and rs130058. According to the two SNPs, we can interprete the two classes'intrinsic characteristic, one class prone to heterozygosity and the other prone to homozygous, the probability of each class is give by class-conditional-probability and class-conditional- Histogram. Example two data provided by GAW17, including tens of thousands of SNPs on 22 euchromosome about 697 individuals. This research randomly chooses 29 SNPs located 12 gene in chromosome 1. According to the principle of accumulation information contribution rate reaching to 95%, the model select 15 SNPs which have abundance mutual information with X0, including C1S11408, C1S3201, C1S1786 and so on. The population is divided into 2 latent classes, the probability of the two class is 0.68 and 0.32. One class has roughly equivalent probability of homozygous and heterozygosity at the seleted 15 SNPs. the probability of the other class has great differerce. Example 2 also analyse the disease affected of the 2 class population, outputs show that the 2 class is not accordance, the affect rate of the second class population(38.64%) is higher than the first class(25.99%), differences is statistically significant( X~2 = 11.459,P=0.001). This difference is caused by the SNPs which are used to classify and interprete the classes. So we have reasons to think these SNPs are suspicious disease locus ,which provide clear idea to the next research.The discussion simply interprete the significance of this research and contrast Bayesian network latent variable model to structure equation model , latent class model based on probability parametrization. The advantage and shortcoming of the research are also explained in the discussion.
Keywords/Search Tags:Bayesian network, Latent variable model, SNPs, Latent class analysis
PDF Full Text Request
Related items