Font Size: a A A

Algorithm Acceleration And Data Visualization On Genome-wide Association Study

Posted on:2018-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:J YangFull Text:PDF
GTID:2370330596454779Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In genetics,a genome-wide association study(GWA study,or GWAS),is an examination of a genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait.GWASs typically focus on associations between single-nucleotide polymorphisms(SNPs)and traits like major human diseases,but can equally be applied to any other organism.With the development of the gene sequencing technology and the reduction of sequencing cost,GWAS has been developed rapidly and become one of the important methods for Precision Medicine in modern Healthcare and crop breeding in agricultural economy.An increasing munber of researchers devote themselves to the algorithm research of GWAS.There are two primary problems for the practical use of GWAS: The statistical power needs further improvement since the false positive and false negative problems still exist.Besides,the arithmetic speed also needs some further improvement under the circumstance of large sample.The result display of GWAS has two main forms: text and static diagrams.The text-form display has disadvantages in preaenting the global information of GWAS result,while the static diagrams has unsatisfying performance on showing details.This thesis concentrates on the GWAS algorithm acceleration and visualization based on the FarmCPU which has been proved to be outstanding practical in both statistical power and arithmetic speed among analysis tools for GWAS.Firstly,this work researchs the acceleration strategy for GWAS through the aspect of model optimization and parallel computation.Secondly,A strategy of data visualization directing was designed for improving the drawback of FarmCPU whose output is inconvenient to look up and lack interaction.Finally,this work designed and achieved the GWAS analysis visualization system using FarmCPU as the computing model.The improvement of FarmCPU model and the parallelization are the research for the acceleration of GWAS algorithm and the GWAS analysis visualization system is a visual realization for the GWAS.This thesis contributes as follows:1.We optimized the FarmCPU in three aspects to improve the time-consuming problem: Firstly,we use the dependency matrix operation strategy in GEMMA to replace the FaST-LMM algorithm in FarmCPU random effects model.This optimization will help to avoid the singular value decomposition(SVD)for large matrix in the variance component estimation of original model and achieve the model optimization.Then,we study the parallelization strategy of FarmCPU and achieve the acceleration of FarmCPU analysis.Finally,we optimize the code and achieve the further acceleration of GWAS.In order to verify our optimization,we set up simulation comparison experiments with the popular Arabidopsis thaliana 199 individuals and 200,000 marker data.The result of the experiment shows that arithmetic speed improves more than 4 times when guaranteeing the statistics power.2.This thesis researches the visualization strategy of GWAS result data visualization to solve the problems that the existing GWAS result display lacks interaction and convenience.The study mainly includes the design of presentation strategy for GWAS data and coordinate system,the improvement of interaction based on the event proxy mechanism,the design of acceleration strategy for the multi-object single page SVG drawing in GWAS data visualization,the design of gene annotation function based on the correspondence between SNP and chromosome location.This study also solved the “blocking” problem of the browser by using the multi-threading technology.3.Designed and achieved a GWAS analysis visualization system based on FarmCPU.It realizes the whole process from the input of GWAS data to the visualization presentation of analysis result data.It includes the design and realization of the system architecture and file management system,the encapsulation of the computing model,the design and realization of the task scheduling module,the realization of the interaction with front-end user and the visualization of data,the design and realize of database and so forth.
Keywords/Search Tags:GWAS, algorithm acceleration, data visualization, model optimization, visualization strategy
PDF Full Text Request
Related items