Font Size: a A A

The Study On The Correlation Between LincRNA Single Nucleotide Polymorphisms And Gastric Cancer Based On Gene Chip And Its Risk Prediction Model

Posted on:2019-08-05Degree:MasterType:Thesis
Country:ChinaCandidate:F Z LiuFull Text:PDF
GTID:2404330569481121Subject:Occupational and Environmental Health
Abstract/Summary:PDF Full Text Request
?Objective?Through SNP gene chip and bioinformatics method,the distribution characteristics of single nucleotide polymorphisms(SNPs)of the long non-coding RNA(lnc RNA)in the chip were studied,and then the new long intergenic non-coding RNA(linc RNA)SNPs associated with gastric cancer was explored to screen the linc RNA SNPs with the most close relationship with gastric cancer,and to compare and analyze the candidate linc RNA SNPs of gastric cancer case group and control group.To study its relationship with the risk of gastric cancer.In the early stage of the research on the basis of the screening of environmental risk factors and genetic factors associated with gastric cancer and gastric cancer risk prediction model was constructed,provide xianyou local gastric cancer high-risk population screening tool.?Methods?1.To use a 1:1 matching case study,we use the 900 K Axiom at the Affymetrix to make a single nucleotide polymorphic test on the total DNA of 96 men and women from the Xianyou Fujian,and 96 of the patients who have had a healthy control of their blood cells.By using HGNC database to screen SNPs of long non-coding RNA(lnc RNA)in the chip,and then using SPSS 20.0 and Excel software to conduct chi-square test on the SNPs site of lnc RNA,the SNPs with statistical difference in the gastric adenoma group and the control group were screened,and the distribution of the SNPs of lnc RNA was analyzed by chi-square test,and the SNPs of long intergenic non-coding RNA(linc RNA)of the new genes were further screened by the bioinformatics(linc RNA database),and the distribution of linc RNA SNPs was analyzed.2.On the basis of gene chip to screen,select the minimum allele frequency(MAF)= 0.10 ~ 0.40,hardy weinberg equilibrium test(P > 0.05 SNPs db SNP database do intersection,through the literature and constructing genetic model,select candidate linc RNA SNPs.The sample size of the gastric cancer group and the control group were increased 622 cases,the genotype of the candidate SNPs site was detected by Sequenom Mass ARRAY technique,and the correlation of linc RNA SNPs and gastric cancer was performed by regression analysis using COX model.3.To analyze the field epidemiological data and the large sample SNPs results,to analyze the environmental factors and genetic factors associated with stomach cancer,and with the help of the five statistical pattern recognition algorithms on the Waikato Environment for Knowledge Analysis,WEKA,The Bayes network,logic,the support vector machine(SMO),the decision tree C4.5,and the Random Forest,we construct three different types of variables(simple environmental factors,pure genetic factors,and also added environmental factors and genetic factors),which are the key to the model of the risk of the stomach cancer,which is the ratio of the five models to the correct rate,the real positive rate,the false-positive rate,the accuracy rate,the rate of response,the F-measure and the area under the ROC curve,And through the back in the group generation,extrapolation prediction and cross validation to the applicability of the evaluation model,select suitable xianyou area risk model and combination of the ideal model.?Results?1.A total of 131,670 lnc RNA sites related to gastric cancer were selected from the gene chip,involving more than 85% lnc RNA.(1)The distribution characteristics of SNPs of lnc RNA: there were 3068 SNPs with statistical difference in the gastric adenocarcinoma group and the control group,of which 1837 loci were distributed in the long intergenic non-coding RNA(linc RNA).On chromosome 8 long arm end and chromosome 20 short arm end form gene mutation hot spots,and 23 chromosomes genetic mutations is a conservative areas.(2)The distribution characteristics of linc RNA SNPs: compared with the analysis of the composition of the transcriptional copies of linc RNA,the number of lincrnas in the single transcripts was found to be larger,accounting for 43.46%.At the linkage site of the linc RNA SNPs,the target gene is analyzed,and the collection of target genes is found in the molecular function of the beta-catenin binding site.2.To further screen 10 sites with the closest relationship with gastric cancer,the sample size analysis of these 10 candidate SNPs was found: LINC00687 gene polymorphism site rs2795025 was associated with gastric cancer susceptibility,and the incidence of gastric cancer increased with rs2795025 CC genotype(ORa=1.94,95%CI: 1.12,3.34).LINC02122 gene polymorphism site rs10036719 and rs12516079 were related to the susceptibility of gastric cardia cancer,carrying rs10036719 GG genotype to increase the risk of cardia cancer(ORa=1.84,95%CI: 1.05,3.23).Carrying rs12516079 AG and GG genotype reduced the risk of cardia cancer(with AG gene type ORa=0.48,95%CI: 0.27,0.84,and GG genotype ORa=0.54,95%CI: 0.30,0.98).The results of the combined analysis showed that the risk of cardia cancer was higher in patients with adverse genotypes with rs10036719 and rs12516079(OR=2.07,95%CI: 1.69,2.53).3.In the comparison between the five models Bayes Net,Logistic,support vector machine(SMO),decision tree C4.5 algorithm(C4.5)and Random Forest,Logistic model has the best screening effect,the accuracy rate is 75.60%,the area under the ROC curve is 0.826,and the regression and extrapolation prediction and cross validation in the model show that the change of Logistic model is small,and the stability of the model is better.In three types of gastric cancer risk prediction model,a model for the genetic factors of simple,its worst discriminant accuracy and area under the ROC curve,the model for the environmental factors of pure times,while adding a model of the optimal environmental factors and genetic factors.Preliminary build include environmental factors and genetic factors of gastric cancer risk score model: Y = 19 * age + 11* gender + 8 * smoking-8 * drinking-12 * tea+ 10 * eating faster + 7 * eating high-salt food + 9 * overeating + 10 * eating hot food + 13 * eating hard food-5 * fruit+5 * eating pickles-7 * rs10134160-3 * rs10205233-3 * rs12882235+ 1 * rs2795035.The area under the ROC curve of the scoring model was 0.715,sensitivity 0.699,and specificity 0.627.?Conclusions?1.Lnc RNA gene polymorphism sites closely related to gastric cancer can be screened through gene chip,in which the SNPs of long intergenic non-coding RNA(linc RNA)are most closely related to gastric cancer.By gene chip to screen lnc RNA SNPs method,cost-effective and realistic,suitable for large sample population.2.In the newly discovered three linc RNA SNPs closely associated with gastric cancer.LINC00687 gene polymorphism site rs2795025 is related to gastric cancer susceptibility,and carries rs2795025 CC genotype to increase the risk of gastric cancer.LINC02122 gene polymorphism site rs10036719 and rs12516079 were related to the susceptibility of gastric cardia cancer,carrying rs10036719 GG genotype to increase the risk of cardia cancer.Carrying rs12516079 AG and GG genotype to reduce the risk of cardiac cancer,SNP may be used as a genetic marker for screening high-risk groups.3.Gastric cancer risk prediction model at the same time into the environmental factors(smoking,drinking,drinking tea,eating speed,high salt diet,overeating,or hot food,like hard food,edible fruit and eat pickles)and genetic factors(rs10205233),the model of the optimal screening capacity.In the risk prediction model of five kinds of gastric cancer,the screening capacity of the risk prediction model of gastric cancer is better,which is applicable to the screening of the high-risk group of gastric cancer in xianyou county.To improve the risk factors of the environment,it can effectively control and reduce the incidence of gastric cancer by giving healthy education and guidance to patients with bad genotypes.
Keywords/Search Tags:long intergenic non-coding RNA, Gene chip, Case-control, Risk forecast model
PDF Full Text Request
Related items