Font Size: a A A

Research On Three Novel Algorithms For Genotype Imputation

Posted on:2016-10-17Degree:MasterType:Thesis
Country:ChinaCandidate:D D ZhangFull Text:PDF
GTID:2180330470978539Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
At present, SNPs (single nucleotide polymorphisms) have been widely used in the detection of the risk locus of complex disease, and the GWAS (Genome Wide Association Studies) is one of the most attentions by people. But, there are a lot of missing SNPs in the control DNA sequence for using GWAS. It has become the stumbling block on the path of a complex disease research. Genotype imputation is the computer algorithms methods to recover the missing SNPs of DNA sequence. It replaces the human experimental means to avoide the waste of a lot of manpower and financial resources in the process of re-sequencing. Probability relationship and LD characteristics between genes reflect the internal characteristics of genes. It has been widely applied in the studies of biogenetics, such as SNP locus study, haplotype study and association analysis study. In the theses, three different genotype imputation algorithms are desighed to research on the genotype imputation by using the probability and LD characteristics of genes. Verification results show that, compared with the previous genotype interpolation algorithms, each of these algorithmsis running in ashorter time, timely, the accuracy is increased 1%~9%. The follows are the main works:(1) The algorithm using haplotypes is designed and completed, which is in line with the characteristics of the genotype data and doing the best to protect the loss of information. A different is made from the previous in two aspects, including the number of the haplotypes contains and haplotype inference algorithm.(2) The algorithm using average values of two |D’| (between the missing SNP and two SNPs next to it) is designed and completed, the average value of the two |D’| as the basis to impute the lost SNPs in order to finish the process of genotype imputation.(3) The algorithm using entropy is designed and completed. Entropy is a criterion of linkage disequilibrium between multiple SNP sites, as for genotype data, each of the missing SNPs is consisted by one of three definite genes. So the three entropy values are used as the basis to complete the process of this genotype imputation algorithm.
Keywords/Search Tags:SNP, Genotype imputation, LD, |D’|, Entropy
PDF Full Text Request
Related items