| Background and ObjectCancers have become the primary contributor to human health threats in the world.Among female cancers,the incidence and mortality of breast cancer both rank number one,and among male cancers,the incidence of prostate cancer ranks second,and the mortality rate ranks fifth.In China,the incidence and mortality of breast cancer and prostate cancer both rank in the top ten.Genetic variation is an essential factor leading to cancer risk.Genome-wide association analysis(GWAS)studies have found a large number of single nucleotide polymorphisms(SNP)sites associated with cancer risk.Currently,more than 2,000 breast cancer risk-associated SNPs and more than 1,000 prostate cancer risk-associated SNPs have been identified in GWAS studies.However,most of these susceptibility SNPs are located outside the protein-coding regions,making it challenging to annotate their functional mechanisms.Because these genetic variations usually lead to abnormal gene regulation,we urgently need an efficient parallel reporter gene analysis system to identify SNPs with gene regulatory functions.At present,the high-throughput reporter gene systems with 10-20 nt DNA barcode tags have been widely used,such as MPFD,MPRA,etc.However,the nucleotide composition of the barcode will inevitably bring bias in reporter gene analysis.To reduce the effect of the inherent bias of barcodes in a massively parallel reporter gene assay,researchers usually allocate 30-100 barcodes for each sequence.However,it increases the complexity of the library and data processing.We need a parallel reporter gene system with high accuracy,high stability,low bias,and low complexity.The main purpose of this study is to develop a high-efficient and low-complex parallel reporter gene system and apply this system to identify functional SNPs with gene regulatory activities from SNPs associated with breast and prostate cancer risk.Furthermore,the mechanisms and functions of the regulatory SNPs will be investigated in depth systematically.Our study will provide strong evidence and support for understanding the mechanisms by which these cancer risk variations cause diseases and promoting the application of these variants in clinical treatment.Results 1.Dinucleotide-tag reporter(DiR)gene assay systemWe developed an efficient dinucleotide-tag reporter gene system to screen for risk SNPs with gene regulatory functions(DiR).The DiR system was generated based on the firefly luciferase coding sequence of the pGL3-Promoter vector by removing the start codon to avoid the pressure of the reporter gene translation on the host cell and optimizing nucleotide composition.Finally,we obtained a 450 bp DNA region for designing the dinucleotide tags.At the same time,to enable the DiR system to quantify the expression level of the reporter gene through qPCR technology,we determined a set of coding rules for designing dinucleotide tags,and the reporter molecules can be quantified by qPCR using barcode-specific primers.Then,we introduced dinucleotide tags on the 450 bp reporter gene region by site-directed mutagenesis and constructed a reporter gene vector library.Presently,the DiR system contains 628 reporter plasmids,which allow reporter assay of 628 alleles simultaneously.Analysis shows that,compared with the common 10-nucleotide barcodes,the dinucleotide barcode system has a lower bias with nucleotide composition and can reflect the gene regulatory activity of DNA elements more accurately.In comparison with the traditional luciferase reporter gene system,the DiR reporter gene system permits the reporter gene analysis of multiple DNA sequences at one assay with high efficiency.In addition,the expression level of the reporter gene can be determined through qPCR using barcode-specific primers,making it suitable for conventional throughput reporter gene assays.The DiR system eliminates the potential pressure on the host cell brought about by the reporter gene translation process and the protein product by removing the start codon.It allows the DiR reporter expression to reflect the activity level of gene regulatory elements more accurately.The DiR system has a higher reporter gene expression level and better stability,giving the DiR system greater sensitivity and stability advantages that are required for DNA elements with lower gene regulatory activity,such as functional SNP sites.Therefore,the DiR system can accurately identify risk SNPs with gene regulatory activity and make the screening for functional cancer risk SNPs more efficient.Our works lay a solid foundation for researching the function and molecular mechanism for cancer risk variations in leading to disease susceptibility.2.Study of the functional breast cancer risk SNPsWe performed DiR-seq analysis with 288 breast cancer risk SNPs in nine breast cancer cell lines.Further combinational analysis with ChIP-seq,ATACs-seq,and other multi-omics analyses identified seven gene regulatory SNPs,including rs11552449,rs3750817,rs1092913,rs10822013,rs4808611,rs62314947,and rs2236007.The expression levels of their related genes,DCLRE1B,FGFR2,ROPN1L,ZNF365,NR2F6,AREG,and PAX9,exhibited significant associations with the altered survival probability for breast cancer patients.One of the regulatory SNPs,rs4808611,showed significant allele specificity,with the C allele exhibiting higher activity than the T allele.The rs4808611 genomic region showed high chromatin openness in a variety of breast cancer cell lines in the FAIRE analysis.Furthermore,we found that rs4808611 was significantly enriched in the cistrome of H3K27ac and H3K4me3 modification,and the enrichment in FAIRE DNA and ChIP DNA had significant allelic selectivity.The site rs4808611 is located in the intron of the gene NR2F6.The knock-out of the rs4808611 region resulted in a significant decrease of the NR2F6 gene expression in breast cancer cells.It indicates that the NR2F6 gene is supposed to be the target gene of the regulatory rs4808611 site.Further survival analysis in breast cancer patients showed that the high expression of the NR2F6 gene was significantly associated with the poor prognosis of breast cancer patients.In addition,the gene regulatory activity of the rs2236007 also showed significant allele specificity,with the A allele possessing significantly higher activity than the G allele.In FAIRE analysis,the rs2236007 region showed a high chromatin openness in various breast cancer cell lines.Furthermore,this region was significantly enriched in the H3K27ac and H3K4me3 histone modification spectrum in the ChIP experiment.Similarly,the enrichment of the rs2236007 site also showed significant allelic selectivity in both FAIRE DNA and ChIP DNA.It indicates that the rs2236007 is supposed to be an essential gene regulatory element.In addressing the target gene of the regulatory rs2236007,we found that the CRISPR activation(CRISPRa)targeting the rs2236007 region increased PAX9 gene expression significantly,and CRISPR interference(CRISPRi)decreased the PAX9 expression to the contrary.It indicates that the PAX9 is supposed to be the target gene regulated by the regulatory rs2236007 site.Moreover,through transcription factor PWM motif prediction in the JASPAR database,we found that the G allele of rs2236007 promoted the binding of transcription factor EGR1.When we knocked down the EGR1 transcription factor in breast cancer cell lines,the PAX9 expression was significantly up-regulated.On the other hand,the PAX9 gene was significantly downregulated to the contrary upon overexpression of EGR1.In the survival analysis of breast cancer patients,the low expression of gene PAX9 can lead to a poor prognosis of breast cancer patients.In summary,the G allele of rs2236007 promotes the binding of repressive transcription factor EGR1,leading to reduced expression of the PAX9 gene.The low expression of the gene PAX9 then leads to poor prognosis and malignant progression of breast cancer patients.3.Study of the functional prostate cancer risk SNPsWe performed DiR-seq analysis with 213 prostate cancer risk-related SNP sites in the 22Rv1 prostate cancer cells and nominated 32 functional SNPs having allelespecific gene regulatory activity.As one of the regulatory sites,rs684232 showed significant allele-specific activity,with the T allele exhibiting higher activity than the C allele.FAIRE experiment showed that the rs684232 was significantly enriched in the open chromatin region,with the T allele preferred obviously.In addition,ChIP analysis with antibodies against H3K27ac and H3K4me3 histone modifications showed that the rs684232 region was significantly enriched in the active histone modification markers,again with the T allele preferred.To investigate the transcription factor that binds the rs684232 region,we performed ChIP analysis and found that the transcription factor FOXA1 could bind the rs684232 region,with the T allele strongly preferred in the binding.To investigate the potential target genes of the regulatory rs684232 site,we performed eQTL analysis with prostate tissues from the GTEx database and found that the expression of VPS53,FAM57A,and GEMIN4 genes was significantly associated with the genotypes of the rs684232 site.Specifically,the T/T genotype was associated with higher expression levels for the three genes compared to the T/C and C/C genotypes.Notably,knocking down the FOXA1 expression and knocking out the rs684232 site can downregulate the expression of all three genes.Gene expression analysis in clinical prostate cancer tissues indicated that gene expression of VPS53,FAM57A,and GEMIN4 positively correlated with the expression level of transcription factor FOXA1.It is worth noting that VPS53.FAM57A,and GEMIN4 are all upregulated in prostate cancer tissues compared to the normal para cancer tissue,and their high expression is significantly associated with shorter disease-specific survival intervals.In addition,knocking down the three target genes or knocking out the rs684232 site impeded the cancerous phenotypes significantly in 22Rv1 cells.In summary,the T allele of rs684232 promotes chromatin binding of the transcription factor FOXA1 and promotes the expression of the three target genes VPS53,FAM57A and GEMIN4.The abnormal expression of genes VPS53,FAM57A,and GEMIN4 then promotes the malignancy of prostate cancer.Our findings reveal the roles and underlying mechanism of rs684232 in prostate cancer progression and hold great promise in benefiting prostate cancer patients in clinical.ConclusionsThe DiR reporter analysis system enables large-scale screening of functional SNPs from cancer risk SNPs.Compared with the traditional luciferase reporter analysis system,the DiR reporter gene system has higher sensitivity,stability,and throughput.The DiR system has a lower DNA barcode bias compared to other parallel reporter assay systems and permits a more true report of the gene regulatory activity of the SNP alleles.We applied the DiR reporter gene system to screen functional SNPs from breast and prostate cancer risk SNPs and identified multiple functional risk SNP.rs4808611 and rs2236007 are associated with breast cancer risk but are unknown for their functions and mechanisms.In our study,we propose that the C allele of rs4808611 promotes the expression of the NR2F6 gene,resulting in a poor prognosis for breast cancer patients.For the functional site rs2236007,the G allele is selectively bound by the repressive transcription factor EGR1 and causes the reduced expression of the PAX9 gene,and leads to elevated malignant progression of breast cancer.As one of the regulatory SNPs associated with prostate cancer risk,the rs684232 site promotes chromatin binding of the transcription factor FOXA1 with the T allele resulting in the high expression of three target genes VPS53,FAM57A and GEMIN4,which promoted the malignancy of prostate cancer.Our works on the function and molecular mechanisms of cancer risk SNPs can improve the functional annotations of GWAS SNPs and provide new evidence and considerations for cancer risk assessment and diagnosis for cancer patients. |