Font Size: a A A

Development Of Variants Annotation Program For Pigs

Posted on:2018-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:C C LiuFull Text:PDF
GTID:2323330515495453Subject:Animal breeding and genetics and breeding
Abstract/Summary:PDF Full Text Request
Variants annotation is an important part in investigation of the relationship between variants and phenotypes.With the development of the next-generation high-throughput sequencing(HTS)techniques and the reduction of cost,massive variants data are obtained,which are the foundation of functional annotation.Driven by the experimental data of regulatory elements conducted by the ENCODE project,the accuracy of annotation in model organisms such as human and mice can be improved.However,it is still a challenge to predict the effects of variants in non-model organisms using these data.In this study,we developed a program,which was a variants functional predictor implemented in Python.And functional annotations of all variants in pig genome was carried out by VIP.The main results are as follows:(1)We developed a variants functional predictor VIP(Variant Integrated Predictor).In coding sequences,VIP can perfectly make precise predictions such as synonymous mutations,missense mutations,nonsense mutations,frameshift or non-frameshift mutations,and conserved domains affected by the variants.In promoter regions,VIP can predict the transcription factor binding sites according to the TFBS position frequency matrix provided by Jaspar.In 3'UTRs,VIP can use the dynamic programming algorithm(Smith-Waterman)to calculate complementary scores between the miRNAs(provided by miRBase)and the targets,and reduce false positive by the miRDB predictions.Furthermore,VIP can also predict the splicing sites in introns.VIP was designed supporting the multicore CPU,it can not only maximize the performance at the speed of 79,000 variants per second in simple mode,but also control the memory consumption efficiently,and complete the annotation of 60 million variants with 8 GB RAM.(2)Basing on the pair-wise genome sequence alignments,we developed a matched program to construct positional relation between genomes.On this basis,VIP can integrate several kinds of experimental data into annotation,which converted from human provided by the ENCODE project.In this study,a total of 1.14 billion(40%)base pairs of pig genome were linked to human genome according to the pair-wise genome sequence alignments.We converted the PhyloP scores and CADD from human to pig,and obtained reasonable results in CDS.In addition,six groups of ChIP data of SP1 were converted for variants annotation in the promoter regions of pig genome.And according to the PFM of SP1 provided by Jaspar,we found that the binding capacity of 4,248 high quality hits was reduced.(3)VIP was applied to annotate approximate 60 million variants in the pig genome.A total of 524,081 hits in the coding sequences were perfectly identical to the results of VEP(Variant Effect Predictor)powered by Ensembl.And in 3'UTR,there were 5,008 miRNA binding sites gained and 5,969 lost due to the mutations.This study provided a valuable bioinformatics platform and reference for non-model organisms using the data conducted by the ENCODE project.And the annotation results of the pigs were the foundation for functional investigation of related genes and filter of the important variants.
Keywords/Search Tags:variants, functional annotation, transcription factor binding sites, miRNA binding sites, ENCODE, cross-species annotation, pigs
PDF Full Text Request
Related items