| BackgroundEssential hypertension,caused by genetic susceptibility and environmental factors,is a type syndrome with or without multiple cardiovascular risk factors,which take high blood pressure as the main clinical manifestations.It is the most common chronic non-communicable diseases.It is also the most important risk factor for cardiovascular diseases.High disability rate and death rate,as well as serious drain on the medical and social resources were caused by essential hypertension,which leave a heavy burden on families and countries.In recent years,with accelerating arrival of aging society and great changes of life style,the occurence and death of cardiovascular diseases and cerebrovascular disease is also rising.The fatality rate has even reached to 40%,among which,the hypertension makes up a great propotion and it has become a serious public health problem.According to the survey data of 2002,adult’s prevalence of hypertension was 18.8%,estimated that China has nearly 200 million patients with hypertension,accoLrnting for about 1/5 of the total number of hypertension.Especially the young,increased from 29%in 1991 to 34%in 2002,is the main source of hypertension.Studies have proved that 50%-60%of the variation in blood pressure levels can be the attribution to genetic factors change.Therefore,we explore the pathogenesis of essential hypertension at the level of molecular biology which has arosed a lot of attentions.Currently,researches of essential hypertension susceptibility gene are mainly adopted the candidate gene studies,linkage analysis of genetic markers and genome-wide association studies based approach.In the candidate gene study,the current study are focusing on the renin-angiotensin-aldosterone system,G-protein signaling system,catecholamines adrenergic system,ion channels,inflammation,endothelial and other related factors.Linkage analysis based on genetic markers takes advantage of linkage recombination research relationship of disease-causing gene with the reference locus(genetic marker),which found the gene of blood pressure regulation and hypertension susceptibility.Genome wide association analysis identifies the presence of sequence variations in the human genome wide,and filters out single nucleotide polymorphisms associated with the disease.We found some genetic variation by these methods,but their shortcomings are obviouus.①Researches of hypertension-related gene loci are more concentrated in common variant,while the role of rare variants in the pathogenesis of hypertension is still poorly understood;②These blood pressure gene loci were found only in certain populations for differences of genetic background and environmental factors,while most mutant gene cannot be further replicated between different races;③Most studies require a huge amount of samples to get a significant correlation results.Meanwhile,the impact variations in a single gene locus were found on blood pressure are extremely weak;④Most research were found be focus locus variation associated with the study of blood pressure level or developing risk of hypertensionThus,with the rapid development of second-generation sequencing technology,more and more studies of diseases are adopting the method of the second-generation sequencing.The Illumina’s Solexa Genome Analyzer/HiSeq System platform is most widely used one in scientific research currently.Most researches institutions employ this system,with which a series of sequencing instrument can perform de novo sequencing,whole-genome re-sequencing,whole exon sequencing,transcriptome sequencing.It can meet different research requirements.The Whole-genome re-sequencing can scan and detect variations associated with phenotypic differences,disease and evolution in the genome-wide level,study gene sequence differences and structural variants thoroughly,including single nucleotide polymorphisms,single nucleotide variation,deletion variants,insertion variants,copy number variation and structural variation.Its advantages are less time consuming,high accuracy,high throughput and low cost.It is the first choice of genotypic diversity analysis,genetic evolution analysis,pathogenic susceptibility gene screening,which has a significant scientific value.Clinical application of second-generation sequencing includes screening and identification of genes that cause monogenic disease,screening hereditary diseases,noninvasive prenatal diagnostic techniques,detection of tumor markers,diagnosis of infectious diseases,screening of inherited metabolic diseases in newborn,personalized medicine and the guidance of other aspects,Since the advent of next-generation sequencing technology,with the rapid development of next-generation sequencing technology,which has brought great dividends.In spite of the low cost of the individual sequencing,the large-scale application can be rather expensive.DNA pooling means that mixing together the molar amount of sample which formed the mixing tank for further researches.Researchers both at home and abroad sequences extract large sample DNA pooling,and then compare the results with the results of individual sequencing.The results show no difference,which prove its high reliability and feasibility.With the rapid development of second-generation sequencing technology,massive data also lead to the development of bioinformatics.There is a growing number of computer software and on-line data analysis platform which has become a powerful tool for solving massive data analysis.Public databases also provide a sound foundation for disease research,which were constructed by differentcountries.Bioinformatics includes genomic,information structure and complexity,which has used a variety of tools for comprehensive Analysis of gene.Raw sequencing data is massive,which were obtained by DNA sequencing instruments.Firstly,it has to go through computer software to have comparison,then called and annotated to have a further pathway analysis.The most frequently used software are BWA、SOAPsnp、SAMtools、CNVnator、Varscan、BreakDancer、ANNOVAR and so on.The advancement of bioinformatics not only promotes the rapid development of the life sciences,but also lays a foundation for fully understanding of these diseases.Therefore,given the significant impact on society and the family with essential hypertension,in this study,we will use next-generation sequencing technology to investigate susceptibility genes of essential hypertension by binding DNA pooling strategies and bioinformatics analysis and study related susceptibility genes and pathogenic biological pathways which may be involved in essential hypertension preliminary.Objective:Exploring the pathogenesis of essential hypertension by applying second-generation sequencing technologies combined with DNA pooling strategies and bioinformatics analysis techniques.With which,we hope to find out possible susceptibility genes associated with the pathogenesis of essential hypertension as well as genes in genome,wide level,and interaction between genes.It would provide a theoretical basis for elucidating the pathophysiology of essential hypertension and developing essential hypertension genotyping chip.It may also provide selective basis for early detection of high-risk essential hypertension groups and their medications.Early detection and prevention of essential hypertension in high-risk populations is important,because it can significantly reduce the economic burden and society stress caused by essential hypertension.Methods:In this study,we extracted randomly sample of peripheral blood from patients with essential hypertension and healthy persons in health management center of a southern hospital of two years,the establishment of normal group DNA pooling and disease group DNA pooling,respectively.Then the whole DNA libraries were constructed for genome re-sequencing.We sequence case group and control group by whole genome re-sequencing and analysis raw data by biological information method preliminary.The bioinformatics analysis began with the sequencing data(raw data)generated from the lllumina pipeline.First,the adapter sequence in the raw data was removed,and low quality reads with too many Ns or low quality bases were discarded.This step produced "clean data".Second,the Burrows-Wheeler Aligner(BWA)was used to align reads to the reference sequence.The alignment information was stored in BAM format files to be further processed during the following steps:fixing mate-pairing information,adding read group information and marking duplicate reads caused by polymerase chain reaction(PCR).After these procedures,the final BAM files were ready for variant calling.SNPs were detected by using SOAPsnp;small insertion/deletions(indels)by GATK;CNVs by CNVnator;and SNVs by Varscan.Additionally,SVs were identified using BreakDancer and a self-method based on the Segseq algorithm.The pipeline also includes purity estimation.Filters were then applied to obtain higher confidence results for the identified variants.Next,we used ANNOVAR to annotate the variants,based on that advanced analysis can subsequently be conducted.Quality control(QC)is required at each stage of the analysis pipeline to ensure clean data and verify the alignment and the called variants.After preliminary analysis of information,we got variation sites and obtained disease-related gene variation sites by fisher exact test.Then we filtered the data by using public databases,which including Single nucleotide polymorphism database,the 1000 Genomes Project,the Human Genome and Hap Map Project,Yanhuang genomic data and so on.Then we obtain variation type and its distribution analysis on the genome-wide level,we conduct bioinformatics test for gene variation sites by gene ontology functional annotation,Kyoto Encyclopedia of Genes and Genomes pathway analysis,subcellular localization analysis,variation analysis of gene interaction.We hypothesized the susceptibility genes and signaling pathways may be involved in pathogenesis of hypertension.Results:①Case groups DNA sequencing generated 1,395,256,148 paired-end reads corresponding to 114.5 Gb bases of data yielding 32.76x sequencing depth.99.84%were successfully mapped to the reference genome by the BWA mapping approach.Control groups DNA sequencing generated 1,273,028,056 paired-end reads corresponding to 125.5 Gb bases of data yielding 36.13x sequencing depth.99.88%were successfully mapped to the reference genome by BWA mapping approach.②We received a total of 33,919 SNP loci,18,594 InDel sites,352 SV sites,88,707 CNV loci by analyzing data of obtained earlier.We found that the C:G→T:A motation type is the most variation type in genome-wide and coding region.We got 12,314 and 91 sites information respectively.③The results of GO analysis showed that most of the variations associated with essential hypertension focus on bioadhesive,stress,metabolism,biological regulation,immune system processes,the extracellular matrix,organelle,the extracellular domain,cellular connection,intermolecular binding,protein binding transcription factor activity,the activity of the molecular structure,nucleotide binding transcription factor activity,transporter activity,enzyme modulators activity,guanine exchange factor activity,a variety of receptor activity.④The results of KEGG pathway analysis showed that susceptibility genes of patients with essential hypertension may be involved in 121 biological signaling pathways.In particular,there are seven meaningful biological pathways,including PI3K-Akt signaling pathway,primary immunodeficiency,ECM-receptor interaction,B cell receptor signaling pathway,T cell receptor signaling pathways,focal adhesion and small cell lung cancer.We found that part of variation genes involved in immune system signaling pathways,which include CD4、CIITA、ADA、RFXAP、CD19、NFATC1、NFKBIA、INPPL1、VAV2、PIK3CD、CARD11、DAPP1、FCGR2B、CHUK、IKBKB、PIK3R2、CD22、PIK3AP1、CBLC、CBLB、CD4、MAPK12、PRKCQ、PAK6.In particular,part of variation genes were involved in two or more signaling pathways,which include CD19、NFATC1、NFKBIA、VAV2、PIK3CD、CARD11、CHUK、IKBKB、PIK3R2.⑤We predicted 192 possible sites of the signal peptide,216 possible regions of transmembrane and 299 detailed subtypes cellular localization of variation mutation by signalP、targetP、TMHMM⑥According to the average shortest path length,clustering coefficient,closeness centrality,degree,Number Of Directed Edges,Edge Betweenness and other parameters,we found some of the most dense network channel By PPI network construction,which include between FN1 and PKN1,between FN1 and BZRAP1,between BZRAP1 and LPHN1,between MYT1L and CWF19L1,between NIPBL and CD19,between CDC5L and PKN1,between SLC2A10 and FNI,between MYT1L and CDC5L,between RPL6 and ETFA,between SMC4 and CDC5L.Because of the large amount of information through these passages,these passages may have a significant impact on biological function.In addition,These 18 variation genes were found having gene interactions with 5 or more genes node,which include FN1、VWF、SMC4、CDC5L、CD19、MUC4、MUC12、PTH、PKN1、CHUK、ARHGAP19、NEK2、NIPBL、GBP4、MUC6、KMT2D、NUP153、TACC2.More data flows through the portion of the gene nodes,including FN1、PKN1、CD19、CDC5L.We speculated that these gene nodes may play an important node in the pathogenesis of hypertension than gene nodes,which may be the key node gene for essential hypertension.Conclusion:①We got a variety of related variations of essential hypertension in whole genome-wide by whole genome re-sequencing technique.And then these variation genes were compared with the existing public databases.Results showed that a lot number of variation genes have not been previously reported appeared.We would upload the data to the NCBI database for researchers to download,which could be a basis for future research.②The genetic variation information we got were analyzed by Gene Ontology analysis and Kyoto Encyclopedia Of Genes And Genomes pathway analysis.According to the results of Gene Ontology analysis and Kyoto Encyclopedia of Genes and Genomes pathway analysis,we found that susceptibility gene of essential hypertension are mainly involved in three types of signal pathways.They are Intracellular phosphorylation,cell adhesion and immune system signal pathway,there are seven significant pathways.And both biological information analysis correlations are strong,correlation analysis shows that the implementations of essential hypertension pathogenic genes are diverse.One gene can express different products,of which,similar products of different genes may also perform the same function.There is a relationship between the two patterns.③We got the results about relationship analysis of variation gene locus by further analysis with the methods of protein-protein interaction networks.And then we excluded the mutant gene locus that had been reported,which was associated with onset of essential hypertension.The results showed that part of genetic variants have been reported,including FN1、PKN1、CD19、CDC5L.These genes may be key nodes genes to pathogenesis of essential hypertension.In a subsequent study,we were able to screen mutant gene locus in the normal population and a larger sample size of patients with essential hypertension.The results were used in assessing whether the presence of these mutant gene locus that we found in our patients with essential hypertension or potential populations with higher risk essential hypertension. |