Font Size: a A A

Clustering Of APA Genes And Co-expression Network Study Based On Shrinkage Canonical Correlation Analysis

Posted on:2019-12-31Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q LongFull Text:PDF
GTID:2370330545983681Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
Polyadenylation(poly(A))is an essential cellular process in eukaryotes,and a poly(A)site marks the end of a gene.If a gene has multiple poly(A)sites,the pre-mRNA will be cleaved at different sites to produce a variety of mRNAs.These sites are called alternative polyadenylation(APA)sites.It has been found that more than 70%of genes in plants or mammals have two or more poly(A)sites,and APA plays a significant role in many aspects such as mRNA stabilization,cell localization,and translation efficiency.In order to reduce the effects of individual differences and technical measurement errors in biological research,a common way is to conduct repeated measurements(replicates).Making full use of the variation among repeated measurements would be helpful to increase the detection power and yield clusters with higher accuracy and stability.Cluster analysis is one of the most common methods to study the association between genes and potential gene clusters from the perspective of molecular structure.Traditional clustering methods for gene expression data,such as confidence interval inferential methodology,can calculate the variation among repeated measurements,but it is not applicable for modeling a gene with multiple poly(A)sites.Canonical correlation analysis(CCA)takes into account different sites or exons within each gene,but it is not capable of making full use of replicate data.Based on canonical correlation analysis,this paper proposes a gene correlation analysis algorithm-polyadenylation shrinkage canonical correlation analysis(PASCCA).It takes full advantage of the variability among repeated measurements of the genome and treats each poly(A)site as an independent feature to calculate inter-gene correlations,which overcomes limitations of traditional gene expression data clustering methods and CCA.PASCCA can perform cluster analysis on gene expression data to dig out information on dynamic regulatory mechanisms between genes and APA sites.The weighted distance matrix generated by PASCCA can be used for downstream cluster analysis,network construction,or as a replacement for other distance metrics.In this paper,the real poly(A)site data from rice japonica and three different types of synthetic poly(A)site data were used for evaluation.Results showed that PASCCA has higher performance and better robustness than other commonly used distances such as CCA and Pearson correlation coefficient(PCC).In addition,APA-specific gene network was constructed using PASCCA and identified gene modules were verified based on a variety of network topology indices.Results showed that PASCCA has higher modularity and average clustering coefficient than CCA and PCC.Several biologically significant pathways and modules were also discovered,which fully demonstrated the efficiency of PASCCA in co-expression network analysis.
Keywords/Search Tags:alternative polyadenylation, repeated measurements, shrinkage canonical correlation analysis
PDF Full Text Request
Related items