Font Size: a A A

Longitudinal Data Analysis For Rare Variants Detection With Penalized Regression

Posted on:2018-06-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:H Y CaoFull Text:PDF
GTID:1310330536973900Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Objective:Longitudinal next-generation sequencing studies are increasingly being implemented to obtain additional information and increase statistical power compared with cross-sectional data.Association analyses involving rare variants are not as straightforward as analyses involving common variants in GWASs,since the power to detect an association between a single rare variant is very low.Therefore,researchers have devloped data analysis strategies that assess the collective effect of multiple rare variants in a sepecific genomic region(e.g.,a gene)rather than on individual variants separately.However,very few methods have been developed or extended to detect rare variants associated with longitudinal disease traits.Current statistical methods accounting for correlations with and between subjects such as generalized estimating equations(GEE)and linear mixed model(LMM)may not scalable to rare variants in a longitudinal design,particularly for limited sample size and missing data.Given the improved performance of a longitudinal design in identifying genetic variants,it is essential to develop a variable selection strategy to improve estimation accuracy and gene selection efficiency in a longitudinal study.Methods:In this work,we extended the rare rariants association tests to genetic longitudinal studies under the penalized GEE(pGEE)and penalized quadratic inference function(pQIF)framework.We adopted a weighted sum statistic(WSS)to collapse multiple variants in a gene region to form a gene score.When multiple genes in a pathway were considered together,the pGEE and pQIF were applied for efficient gene selection.Withincontinuous and binary trait respectively,we evaluated the estimation accuracy and model selection performance under different model settings.And then apply to explore the genetic effect within the real phenotypes in GAW18 in the Renin-angiotensin system(RAS)and Ca2+/AT-?R/?-AR signaling pathway scale respectively.Results:Compared with the unpenalized GEE and QIF methods,the penalized GEE and QIF methods achieved better estimation accuracy and higher selection efficiency.As the sample size increase,the estimation accuracy of pGEE and pQIF improves significantly and performs as well as the oracle-procedure in varaible selection,namely they work as well as if the correct submodel were known.Both pGEE and pQIF can select the true genes with high accuracy.pGEE has larger true positive rates than PQIF,but has higher false positive rates than PQIF.The pQIF remains relatively optimal even when the working correlation structure is miss-specified.However,the pQIF failed to select effective genes for the largernp value in binary and continuous trait simulations.A conservative recommendation is to apply pQIF when both the number of covariatesnp and sample size are small to avoid false selection.Conversely,whennp is large,and sample size is small,pGEE is recommended.Within the Renin-angiotensin system(RAS)pathway,the pGEE identified THOP1 and PRCP genes,while the pQIF identified THOP1 and ACE genes.Both pGEE and pQIF identified one important gene AGTR1 in the Ca2+/AT-?R/a-AR signaling pathway.Conclusion:We proposed methods for rare variants detection in longitudinal design under pGEE and pQIF framework.The proposed pGEE and pQIF perform much better than the unpenalized methods,selecting meaningful genes for next-generation sequencing longitudinal traits.Our pGEE and pQIF methods provide a general tool for longitudinal sequencing studies involving large numbers of genetic variants.
Keywords/Search Tags:longitudinal data, rare variants association test, penalized generalized estimating equations, penalized quadratic inference function
PDF Full Text Request
Related items