| BackgroundsAlternative polyadenylation(APA)produces mRNA with varying 3’ untranslated regions from the same gene.More than 70%of the mammalian genes have multiple polyadenylation sites(pA sites)and this post-transcription regulation may lead to a difference in mRNA half-life,translation efficiency,and subcellular localization.Core polyadenylation factors regulate APA patterns in a cell-or tissue-specific manner.Additionally,other splicing factors,epigenetics and enhancers are also involved in the regulation of APA.The dysregulation of APA has been implicated in hyperproliferation,tumorigenesis,and other diseases.Many sequencing and computational biology methods were used to investigate the genome-wide landscape of APA.However,these methods quantified the average APA pattern of cells at the bulk level and often imprecise due to the limitations of RNA-seq.Recently,methods such as BATSeq and cTag-PAPERCLIP were employed to reveal the heterogeneity of APA at the single-cell level.Although these pioneering studies have provided invaluable insight into APA at single-cell levels,these methods have limitations such as low cell capture efficiency and dependence of cell-specific markers.Benefiting from the tremendous progress of single-cell capture technology,single-cell RNA sequencing(scRNA-seq)allows the definition and molecular characterization of different cell types to an unprecedented degree.Notably,some of the most popular scRNA-seq generate reads that are enriched at 3’ ends of transcripts and their usage is growing at an exponential scale.Despite the rapid increase of scRNA-seq data in recent years,the lack of efficient single-cell APA tools leads that heterogeneity of APA at single-cell levels often was ignored.Genome-wide association studies(GWAS),involving testing genotypephenotype associations across the genomes of various individuals,have revolutionized the field of complex trait genetics over the past years.However,the functional interpretation of these variances is still a major challenge.In addition to the GWAS loci enriched in gene expression quantitative trait loci(eQTL),more and more studies indicated that there may be other independent regulatory mechanisms that link genetic variation to complex traits.These mechanisms may be affected by genetic polymorphisms and potentially lead to a genetic high risk of complex traits.As an important post-transcriptional regulatory mechanism in mammals,APA may be a key mechanism to interpret phenotype-associated genetic variants.However,there are currently only a few studies focusing on the genetic basis of APA regulation.To investigate APA at the single-cell level and further our understanding of the genetic basis of APA regulation,we developed a single-cell APA analysis tool with higher accuracy and sensitivity,and explored the potential roles of APA in multiple biological processes.We have also explored the quantitative trait sites that regulate APA in the three human immune cells and established the relationship between complex traits such as autoimmune diseases and APA through genetic polymorphism.These results may broaden our understanding of the contribution of APA during these biological processes.Results3’ enrichment scRNA-seq protocols,such as 10x and Microwell-seq,showed bias towards the 3’ end of transcripts and potentially contain APA information.While very few reads contain pA sites due to limited handling of(pre-)phasing in homopolymeric stretches,and the polyadenylation process does not cleavage at an exact site based on these few pA-supported reads.So,we built a statistical inference model for estimating APA at the single-cell level(SCAPE).SCAPE only needs the alignment position and length of R2 and uses the expectation maximum model to infer the high confidence pA site,and then SCAPE used a statistical distribution model to quantify the imprecision of the 3’ process.Finally,the highest possible pA sites,the degree of variation of the pA sites and their abundance were provided.Compared with other single-cell APA methods on theoretical simulation data and real simulation data,SCAPE can not only accurately identify more pA sites,but also maintain a higher accuracy.In the real data,the pA site identified by SCAPE has the canonical sequence characteristics of polyadenylation sites.Validated by PacBio and 3’-seq,we showed that SCAPE has high accuracy,and benefiting from the deep sequencing depth of scRNA-seq,SCPAE also has high sensitivity in the identification of low-abundance APA events.Next,we established a single-cell APA atlas in mice using SCAPE and identified pA sites,the majority of them contain canonical polyadenylation signal and some were found in public databases.The brain and testis expressed the most pA sites and tented to express longer and shorter 3’UTR transcripts,respectively.Furthermore,we found that the APA-associated 3’UTR regions are more likely to be miRNA binding sites and express in a tissue-specific manner while similar tissues showed similar APA patterns.Six tissue-specific APA events were validated by RT-PCR.We evaluated the results of the cluster based on pA expression and gene expression by heterogeneity score,and found that the pA-based clustering had a lower heterogeneity score,suggesting that SCAPE may improve the clustering of single cells.In addition,combining the mouse and human single-cell atlas datasets,we established the multitissue single-cell APA online database using SCAPE.We use SCAPE for studying APA in cell differentiation,dedifferentiation,and tumor heterogeneity.In cell differentiation,SCAPE,in combination with pseudotime analysis,identified 883 dynamic APA events including erythrocytosis-related genes such as Gata2.In somatic cell reprogramming,the relative length of 3’TR showing a varied trend,and 4131 switched APA events were identified.Next,we analyzed the tumor heterogeneity of APA in glioblastoma(GBM).Combining with the somatic copy number alteration of single cells,we identified 51 differential expression APA events between malignant and non-malignant astrocytes.We also applied SCAPE to the spatial transcriptome of GBM and identified 1146 differential expression of APA events in the non-malignant and malignant regions,and identified 68 invasionassociated APA events by comparing the tumor core loci with the tumor invasion loci.In the population level,we conducted a quantitative trait loci(QTL)study based on human monocytes,neutrophils,and CD4+T cells.In total,22077 cis-APAQTLs of 606 APA events were identified.These APAQTL-associated APA expressed in a cellspecific manner.Genomic annotation suggested that APAQTLs are located within the 3’UTR region and close to the transcription termination site.For example,38 lead SNPs in the 3’UTR region of APAQTLs,in which 9 APAQTLs disrupt PAS while one QTL gains PAS,including important immune-related genes(KLF2 and STAT6).Nevertheless,alteration of PAS explained only a fraction of APAQTLs.We therefore assess other potential regulations of APAQTLs.First,analysis of transcription factor binding site predicted by DeepBind suggested that APAQTLs were significantly enriched in the transcription factors binding sites of ARX and POLR2A,in which the latter is known to be involved in alternative polyadenylation.We then analyzed eCLIP-seq of 223 RNA binding proteins in K562 cell line and 88 RNA-seq datasets after knocking down RBP genes,APAQTLs are significantly enriched in 3’processing factors(CSTF2T)and splicing factors(SRSF1 and SRSF7),as well as the coding chromatin-binding protein gene GRWD1.Furthermore,the co-localization analysis showed that APAQTLs of HLA-DRB5 overlapped with autoimmune disease loci.ConclusionIn summary,we developed a statistical inference modeling for single-cell APA analysis(SCAPE).Based on this framework,we established a human and mouse single-cell APA database and revealed the potential roles of APA in cell differentiation,dedifferentiation,tumorigenesis,and tumor invasion.Also,we carried out population genetic studies on the three human immune cells,revealing how APA is coupled with other molecular regulatory mechanisms and the potential roles of APA in disease. |