Font Size: a A A

Interaction Studies Of The Ner Pathway Snp On Susceptibility To Lung Cancer

Posted on:2010-01-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:L L ZouFull Text:PDF
GTID:1114360275491134Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
BackgroundLung cancer is one of the most serious types of malignant tumors,with a high incidence andmortality rates.Based on epidemiological and population studies,it was confirmed that theetiology of lung cancer is involved with genetic and environmental factors.In the process of lungcancer,the mechanism of human disease prevention plays an important role in protection.It wasknown that the human body has at least 130 kinds of DNA repair genes.These DNA repairgenes polymorphisms may through change the DNA repair capacity,thereby increasing the riskof individuals suffering from lung cancer.NER repair pathway is one of the important DNArepair pathway.Nowadays,the topic of the relationship between NER pathway genepolymorphisms and susceptibility to lung cancer becomes very hot.However,results fromsimilar studies maybe are very inconsistent.The main reason is the limited sample size,it isimpossible to analyze all possible relationship rather than part of it.At present,the molecularepidemiology studies are more concerned about the interactions between genes and genes,andinteractions between genes and environment.In addition to the traditional multiple logisticregression model can be used to analysis multiple SNPs,there are reports multi-factorialdimensionality reduction method (MDR),classification and regression tree (CART) and otherdata mining methods.All of these methods have their own advantages and limitations.There aremany questions worthy of discussion for these methods results and effects.Association rulemining is considered an effective tool in screening novel or unknown knowledge andinformation from a large amount of data,so it can be used to find valuable information aboutvarious relationships between attributes in a large number of SNPs data.This information isuseful to select candidate covariates (genes) into the following Logistic regression model.ObjectiveThis objective is to study the relationship between NER pathway gene polymorphisms andsusceptibility to lung cancer,to find interactions between SNPs related with lung cancersusceptibility,and find the helpful means or method applied in SNPs and disease susceptibilityrelationship analysis.Methods Based on the actual SNP dataset,we used the association rules mining combined Bootstrapmethod to find the association rules between SNPs and lung cancer.To confirm association rulesfindings we made the Logistic regression model based on these rules including candidatecovariates (genes) and interactions information.To preliminary prove our method correct,wecarried out a small scale simulation study,through simulate random model and set modelparameters of a special biological context same with the SNP data.We analyzed the simulationdata by above method and compared the results with other methods.Independent variablessimulation data are generated by MATLAB7.0 software programming based on simulationbiological context.Dependent variable (disease state) simulation data are generated by SASsoftware programming based on the simulation model.The classical Apriori algorithm was used in mining association rules,implemented bySAS9.13 software.We selected the following rule interestingness measurement index:Lift,Fisher's exact probability,support and confidence.By changing the index values we chosen amost effective criteria to screen association rules from actual data analysis.Methods evaluation index:(1) the selection rules criteria evaluation index:the averagefrequency (MF),standard error (SE),95% confidence interval(CI),and the total number of rulesof the variables and interactions scheduled in simulation model including in the outcome rules.(2) model evaluation index:Logistic regression model parameter estimation bias (Bias),thedegree of bias (DB),95% confidence interval coverage (Coverage).ResultsThrough the small scale simulation study,we found that association rule mining is indeed auseful tool to find the potential association between variables in a large amount of data,includinginteractions between variables.Fisher's exact probability and lift as rules interestingnessmeasurement index,combined with Bootstrap sampling technique,is indeed able to effectivelyselect rules that include variables in the simulation model.In order to ensure the success rate ofmining,the parameters minimum support (min_sup) and minimum confidence (min_conf)should be set relatively low level.The application of Bootstrap technique in association rulemining is beneficial for getting robust results.Both the simulation study results and methodanalysis of MDR confirmed that the interactions found by MDR are not credible.The actual data analysis results showed that the following SNPs and interactions related withlung cancer susceptibility:XPG-rs732321,DDB2-rs830083,ERCCl-rs3212930×ERCC1- rs3212951 and ERCC2-rs13181×XPG-rs873601.XPG-rs732321 (CC + AC) is the protectiongenotype for lung cancer (OR= 0.54,95% CI = 0.35~0.85).DDB2-rs830083 (GG + CG) willincrease the risk of lung cancer (OR=1.32,95% CI=1.03~1.70).ERCCl-rs3212930 and ERCC1-rs3212951 have synergistic effect of lung cancer risk (OR=2.75,95% CI = 1.18~6.64).Individual with the two mutation loci,compared with individual carrying one of the twomutation site,has a higher risk of lung cancer.The interaction between ERCC2-rs13181 andXPG-rs873601 (OR = 2.43,95% CI = 1.09~5.44) exists..Individual with the two mutation sites,compared with that carrying only one mutation site,or none of the two sites mutation,has ahigher risk of lung cancer.ConclusionAssociation rule mining is useful to find the potential association including interactionsbetween variables in data,through rules measurement index:support,confidence,lift and Fisher'exact probability,and Bootstrap technique.The SNPs and SNPs alliances included in rules canbe used as candidate covariates (genes) and interactions into multi-logistic regression model ofdisease and SNPs.If the power is large enough,our method is indeed able to find the SNPs andinteractions related with lung cancer.In this research,we found two lung cancer susceptibilitySNPs and two interactions.All of these positive finds can be explanted reasonable frombiological perspective.
Keywords/Search Tags:Lung cancer, nucleotide excision repair pathway, single nucleotide polymorphism, association rule mining, Bootstrap sampling, logistic regression model
PDF Full Text Request
Related items