Lung cancer is one of the common cancers worldwide,with the incidence and mortality ranking the first.In China,lung cancer is the first and second incident cancer in males and females,respectively,and it is the largest cause of cancer death in both men and women.In the last decade,with the increasing tobacco consumption and environmental pollution,the incidence and mortality of lung cancer has been increasing in the majority regions of China.Gradually,lung cancer has become a severe public health problem to threaten the health.Epidemiological evidence has related environmental exposures as one of the most important factors for lung cancer risk,among which,tobacco exposure is the leading one.However,although 90%of the lung cancer can be attributed to tobacco,only 15%of the smokers develop lung cancer.It is possible that variation in genetic profiles contributes to this differential susceptibility,most likely in the form of a more common,low penetrant genetic alteration.The idea of primary prevention for lung cancer advocates identifying lung cancer high risk population efficiently.Except determining deleterious environmental exposure,discovering key genetic biomarkers is also among the important research fields,to increase the efficiency of high risk-group screening.In the recent decade,the rapid development of Genome-wide association studies(GWAS)has made possible comprehensive integration of genome-wide common variants to explore the susceptibility of lung cancer.Since the first lung cancer GWAS conducted in European population published in 2008,this strategy has successfully reported 45 susceptibility loci with various effects for lung cancer,largely promoting understanding for lung cancer susceptibility.Although GWAS studies have achieved fruitful findings,there are still some key scientific problems to be studied.On the one hand,relevant studies found that GWAS-reported regions explained only a small proportion of lung cancer heritability.Twin studies in Europe has estimated the total heritability of lung cancer as 26%38%.Thus,researchers has raised the concept of“Missing heritability”,which represents the gap between heritability of known regions and total heritability from traditional twin studies.Heritability has been used to evaluate the proportion of genetic effects in phenotypic variance,and measure the contributions of genetic effects on liability variance.Narrow-sense heritability(h2)refers to the proportion of additive genetic effects on phenotypic variation.Broad-sense heritability(H2)involves the proportion of total genetic effects,including additive,dominant,and epistasis,and is usually estimated through family and twin studies.Recently,researchers has been able to estimate the additive heritability for traits or disease using genome-wide genotypes of unrelated populations.Lu et al estimated array heritability for 12 cancer types,and demonstrated significant heritability(18%81%)for melonoma,pancreas cancer,prostate cancer,liver cancer,ovary cancer,adenocarcinoma of esophagus,esophagus squamous cell carcinoma,and endometrial cacer,but not for breast,gastric and lung cancer(European).Joshua et al estimated significant heritability for lung cancer in European population and Asian non-smoking women to be 20.6%and 12.1%,respectively,and the heritability for the known region to be 1.4%and 2.4%.Accordingly,most of the heritability are not missing,but hidden in those SNPs filtered out by the stringent P threshold(5×10-8)in GWAS.In addition,this study also indicates there might be strong heterogeneity among populations with different genetic backgrounds.The pattern of lung cancer heritability in Chinese population is awaited to be explored.On the other hand,studies have shown more than 80%of the GWAS-reported SNPs lie in noncoding regions,which making it a big challenge to predict the biological function of those SNPs.Recently,the rapid development of biological technology and establishment of international programs have provided powerful tools for annotating the biological functions for critical regulatory elements and SNPs reported by GWAS.Encyclopedia of DNA elements(ENCODE),Genotype-Tissue Expression(GTEx),and The Cancer Genome Atlas(TCGA)provide annotations for variants and genes,tissue-specific associations between variants and genes,gene expression in cancer tissues,respectively and will be a great help in identifying crucial variants and genes in lung cancer.Therefore,in order to elaborate the lung cancer heritability in Chinese population explained by the common variants to be detected,and integrate the biofeatures to filter out the potential functional variants,we estimated the additive heritability for lung cancer in Chinese population explained by both all the SNPs in the GWAS and all the known regions,by using the Genome-wide Complex Trait Analysis(GCTA).Then,we integrate those SNPs bearing less statistical significance with biofeatures from public database to discover potential functional variants.We hope these findings can provide the guidance for identify high risk population for lung cancer in Chinese population,and thus promoting precision and individualized prevention for lung cancer.Part Ⅰ:Estimating the heritability of lung cancer in Chinese population based on the Genome-wide Association studyIn a case-control design,we include 2,231 cases and 2,774 controls with 424,288SNPs after stringent quality control for both samples and variants,and then conducted the heritability estimation using GCTA.We estimated the Genetic Relationship Matrix(GRM)using all the SNPs and used lung cancer prevalence in Chinese population to transfer the observed scale to the liability scale.We also evaluated the heritability after imputation and that explained by the known regions.We performed subgroup estimations according to gender and smoking status.In addition,we explored the relationship between the heritability and length of the chromosomes.We found that the heritability explained by the SNPs in GWAS array and after imputation is 15.2%and 31.2%,respectively.GWAS-reported SNPs or regions(250kb or 500kb up-and downstream of the reported SNPs)partitioned only a small part(0.7%1.1%).Heritability for women and men was 24.3%and 15.3%,for nonsmokers and smokers was 22.5%and 14.4%,but no heterogeneity was shown between subgroups.we observed strong linear relationships between the estimate of variance explained by each chromosome and chromosome length(in MB units)for lung cancer(P=0.001).Conditioning on the chromosomal length,it showed that chromosome 6explains the largest variation for lung cancer and chromosome 1 explained a smaller proportion.In summary,our results revealed that lung cancer in Chinese population is affected by genetic effects.Besides the reported variants,other common SNPs in the GWAS arrays can fill part of missing heritability,indicating there are more potential SNPs waiting to be discovered out of the stringent P threshold of GWAS.These results indicate polygenic genetic architecture of lung cancer in Chinese population and more efforts should be made to discover the hidden heritability by interrogation of increasingly larger studies in the future.Part Ⅱ:Annotating the susceptibility regions of lung cancer in Asian population based on Genome-Wide Association StudiesWe used Asian GWAS databases with raw data,including our GWAS in Chinese population and non-smoking female in Asia(mainland China,Korean,Japan,Singapore,Taiwan,and Hongkong)requested from the database of Genotypes and Phenotypes(dbGap).We performed Meta analysis for the two GWAS after quality control,imputation and association analysis to identify new susceptibility loci.Then,we involved 4,195 SNPs in the functional annotation(P<1×10-44 in the Meta analysis,with consistent associations in two studies(P<0.05)and less heterogeneity(P>0.05);Reported SNPs and their LD SNPs in Asian populations).ANNOVAR was used to annotate the SNPs in coding and regulatory regions.We predicted the function of the non-synonymous variants with various approaches,including SIFT,Polyphen-2,PROVEAN,LRT,MutationTaster,GERP++,FATHMM and DANN.Regions indicated by H3K4me3 and H3K9ac(promoter),and H3K4me1 and H3K27ac(enhancer)in A549 cell line were downloaded from ENCODE.To identify target genes for crucial SNPs or regulatory regions,we performed genotype-phenotype association analysis in both lung normal and cancer tissues,to examine expression Quantitative Trait Loci(eQTL).We determined potential genes using the expression data in both tumor and para-tumor tissues from TCGA.According to the Meta analysis,three new SNPs on 6p22.1 were revealed to be significantly associated with lung cancer risk in Asian population(rs9259876,A>G,OR(95%CI)=1.29(1.18-1.41),P=1.44×10-8;rs9259050,G>A,OR(95%CI)=1.19(1.27-1.12),P=2.73×10-8;rs28465400,T>C,OR(95%CI)=0.85(0.80-0.90),P=4.40×10-8).Twenty-seven SNPs from Asian populations were validated.After integrating GWAS and the biofeatures,122 potentially functional variants emerged.20of them are located in coding region,and 102 in regulatory regions,among which,85of them are in promoters and 56 in enhancers.When compared with the reported loci in Asian population,these functional variants are located in 11 known regions and 13new regions,respectively.In summary,our study integrated existing Asian GWAS and biofeatures in public databases,and for the first time,systematically annotated the lung cancer susceptibility variants.We identified certain variants and loci with potential functions(change the function of coding proteins or regulate the transcription of genes).This strategy can efficiently leverage existing genetic information of population and biological functions to examine potential functional variants,thus providing guidance for subsequent functional experiments and promoting understanding for susceptibility mechanisms of lung cancer in Asian population. |