Font Size: a A A

Algorithm Of Discovering Pathogenicity Gene Region Based On Haploid

Posted on:2009-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:X N TangFull Text:PDF
GTID:2120360242480742Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Genetic association studies with population samples of matched case andcontrol are increasingly being carried out with the aim of discoveringsusceptibility genes for common diseases. As an important marker, SNP isconsidered to be an ideal tool for association studies. Also, with the help ofhigh-throughput sequencing technologies, discovering large amount of SNPshas already been feasible. However, high-throughout sequencing technologiescannot warrant that all informative SNPs associated with disease can beidentified correctly, since the number of SNP is estimated over ten million onthe human genome. Furthermore, the power of single SNP in diseaseassociation studies is limited. It's hard to uncover the real factor of disease bysingle SNP. Hence, with the purpose of reducing the prohibitive cost ofchecking SNPs in many persons and increasing the accuracy in diseaseassociationstudies,peoplestarttofocusonhaplotypedata.There are manyways toidentifyhaplotype pattern within haploidsequence.But most approaches to date are built based on the"haplotype block"model,shortly after the discovery of haplotype block in human genome. This modelassumes that the sequence can be divided into separated blocks within whichthere are low diversityconserved patterns. However, an influential research ofSchwartz suggests that there is additional correlation information acrossblocks as well as likely additional sub-structure within block regions that islosttoanalysiswhenblockdecompositionisimposed.Furthermore,theblocksthemselves often appear imprecisely defined and hard to derive reproduciblyfrom reasonable sample sizes. In order to address these problems, Schwartzproposed a more flexible block-free representation of haplotype structurecalled haplotype motif model. In this model, people can interpret a given haploid sequence according to its specific sub-structure. In other words,haplotype motif model does not implicitly assume that there are any globalblock patterns for all haploid sequences. Despite its flexibility, Schwartz'smodeldoesnotnecessarilycapturefunctionallysignificantSNPsinitspattern.That means, statistically significant haplotype motifs returned by Schwartz'sapproachmaynotbedirectlyrelatedtodiseaseortofunctionalimpairment.Asthis limitation, it is hard to locate deleterious variation precisely based onhaplotype motif. The same, we came across kinds of problems when we dodiseaseassociationresearchusingSNPorhtSNP.In order to making up those disadvantages and retaining advantages, thispaper purpose an integrative functionally informative haplotype motifselection system by taking motif's function into account and redefiningobjective function based on Schwartz's haplotype motif model. At first, thissystem will assign each SNPinto one ofthree classes accordingto its functionsignificance. Then, it can deduce all statistically and functionally significantmotifs included within haploid sequences. In the procedure, the algorithm willmeasure those motifs'functional significance based on SNP's class.Meanwhile, the algorithm will see whether those motifs we got are good orbad using the redefined objective function. In order to checking the effect ofthose functionally informative haplotype motifs in disease association studies,this paper tests the algorithm with a real Crohn's disease data. Firstly, wecompare the performance of the improved model with Schwartz's model onthe real disease dataset. We carefully check their efficiency and count thenumber of significant motifs between case and control set found by twomodels. Then we discuss and analysis the result. Moreover, this paper alsocompares the improved model with SNP, haplotype block and Schwartz'shaplotype motif on the real disease dataset. The comparison further exposesthe advantage of functionally informative haplotype motif in disease associationstudies.Functionally informative haplotype motif not only focuses on motif'sstatistical significance but also focuses on its functional significance.Meanwhile it absorbs Schwartz's model's advantage of representing haploidsequence flexibly and maintains each SNP's information. Above all, it hasbeen an important heritable mark. Simulation experiment proved that theimproved model can not only locate more haplotype motifs with significantdifference between case and control set, but also provide us with morebiological function information encoded in those motifs. Besides, for thereason that people still cannot make clear which heritable mark related tohaploid sequence will have best performance during disease associationstudies, this paper purposely compared nearly all existing heritable marksrelated to haploid sequence on Crohn's association studies. After carefulobservation and analysis, we suppose that functionally informative haplotypemotif might be the most promising, persuasive and efficient way in diseaseassociation studies. Finally, this paper mentioned problem found in theresearchandworksmightbecontinuedinthefuture.
Keywords/Search Tags:Pathogenicity
PDF Full Text Request
Related items