Font Size: a A A

Gene Level Association Analysis Based On Functional Linear Model

Posted on:2021-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:H GuoFull Text:PDF
GTID:2504306017953599Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of the second and the third generation sequencing technology in recent years,a large number of genetic correlation data containing rare variant have emerged.However,rare variant is generally low frequency in the population,which brings many challenges to the development of statistical analysis methods.In the early association analysis of loci and disease,the association analysis of single genetic variation was mainly limited to multiple calibration,because the severity of the threshold led to the lack of significance after multiple calibration.Considering that there are many genetic variations with potential functions in most disease-related genes,especially rare ones,it is a common strategy to accumulate multiple variations within a gene to increase their effects.With the upgrading of sequencing instruments,the observed data points are very close,and they can almost be considered as continuously changing observations.It is naturally assumed that the values of the data can be considered as a basic random process in function spaceIn Chapter 2,a functional linear model is established to make full use of the genetic linkage and linkage disequilibrium(LD)information of multiple genetic variations in the geno,ne and the similarity between different individuals.When functional linear model is used for gene level association analysis,the genetic variation values of discrete sites on gene fragments need to be approximated numerically.In order to solve the problem that the traditional spline function is not accurate in approximation and time consuming in derivation,this chapter uses Legendre polynomials for numerical approximation,and uses its orthogonality to improve the efficiency of obtaining functional linear models.Compared the performance of Sequence Kernel Association Test(SKAT)method,functional linear model based on spline basis function approximation,and functional linear model based on Legendre polynomial approximation on different simulated data,the functional linear model based on Legendre polynomial approximation proposed in this chapter is optimal in terms of time efficiency and statistical power.At the same time,considering the possible nonlinear relationship between the occurrence of the disease and the mutation site,a functional nonlinear model based on Legendre polynomial approximation is established in this chapter and compared with the SKAT method,the method proposed in this chapter has great advantages in both time efficiency and statistical power.In addition,the reliability of the mutation data used in gene level association analysis is also an indispensable step for finding true positive results.As next generation sequencing(NGS)brings genetic research in human diseases into an unprecedented productivity era,many bioinformatics processes have been developed to detect mutations from NGS data,and the performance of these processes largely depends on the analysis method and the invocation strategy people use.In order to facilitate the researchers to choose different analysis processes according to their own needs and actual conditions,in Chapter 3 of this thesis,the relevant indicators of the mutation detected by the GATK process and the Sentieon process are separately recorded.Then,the comparisons of them show that Sentieon is superior to GATK in both the accuracy and time efficiency of mutation detection.But Sentieon is paid software and requires high computer hardware,and researchers can make choices according to their own needs when use it.
Keywords/Search Tags:functional linear model, gene level correlation analysis, legendre polynomial, gatk, sentieon
PDF Full Text Request
Related items